Archive for March 23rd, 2016

As I was driving the kids home from daycare yesterday it suddenly dawned on me that a code change I had made a few weeks ago may have introduced a bug into some of my less popular apps. It wasn’t until I was drawing the kids their bath that I had a chance to whip out my phone and confirm the bug’s existence. Drat. At least I was pretty sure it was an easy bug to fix.

Once the kids were in bed I opened up my laptop to get started. This was looking like a minor bug in an set of apps with an embarrassing low number of visitors last week. I decided to chance editing it directly in production rather than setting up a test environment first. I made a single change, before getting called away. I was gone for a minute. Not more than 30 seconds. I returned back down stairs to discover the internet was out. I opened my phone to test the change I had made. Instead of a small bug that only affected an obscure case, my app was now completely unusable. Serves me right for working in production.

I checked in with our internet service provider. They already knew about the outage, and expected to have the internet restored by 2 am. It was 8pm. Not a good. Not good at all.

The internet wasn’t technically down, but experiencing about a 60-75% packet loss. Packets were bouncing around all over our provider’s network before getting out to the internet backbone. It looks like our service provider were experience a major hardware problem and were trying to reroute traffic around it, but the rest of the network couldn’t handle the extra load. What that meant for us was that we could occasional connect for a minute or two. It was usually just enough for me to connect through the web ftp, open a file and make a single change. If I was lucky, I could also test the change on my laptop instead of my phone before I lost the connection again.

By the way, my work environment? I’m working with a web based ftp and IDE. The web based ftp wasn’t anticipating connection interruptions. If the connection timed out while trying to commit a change, the web based ftp showed the change as committed. I discovered this after debugging the same bug and changing the same line the same way multiple times.

It was a night of not fun discoveries. Thankfully, the internet was stable enough at 11:30 for me to return my app to it’s mostly correct state. This morning I took a few extra minutes to squash that bug.

Next task: setting up a better working environment.