Archive for the ‘Internet & Technology’ Category

June 11, 2016

Thwarting Adblockers

When I started tracking adblockers on my site I didn’t have much of an intuition how common adblockers were, or how much it was affecting my bottom line. As a one person company, I have limited time to throw at any one problem so these types of questions always warrant an investigation to see if it’s worth my time and effort. If ad blockers were used by a small enough percentage of my audience, I would ignore the issue and focus on writing new apps.

Initially I came up with an arbitrary threshold of an acceptable amount of ad blocking. As long as adblocking was less than 15% of my traffic, my bottom line would remain mostly intact. Actually, the first number in my head was 10%, but I bumped it up after it appeared 12% of my ads were being blocked. There was no real reason behind either number, just intuition. The first time the percentage of blocked ads rose above 15% I decided to look the other way. Maybe 17% was a more reasonable number. Than I had my first 20% ads blocked day, followed by my first 40% day, and finally a day over 50%. The bandwidth I was paying for to host the webapps was costing me more than the money I was earning from them. Forget earning money, it was costing me money! Ignoring the problem was no longer an option.

Thankfully my Ad blocking detection script was generating a fair amount of data. I had replaced those “console.log” calls with google analytics event recordings, so I could generate a fairly extensive profile of just who was using adblockers.

I wasn’t surprised to see that adblocking was more common on desktop than mobile browsers. I think that’s pretty common knowledge these days. What caught me off guard was the stark divide between weekend behavior and weekday behavior. Even accounting for browser type, adblocking was nearly non existent on the weekends. Digging further I learned some corporate networks block ads as a matter of policy.

Penalize the user for their network administrator’s policy didn’t seem like the right course of action. Yes, blocking ads are against my terms of service, but what choice did they have? They have no control over their coprorate’s network policy and I’m more likely to incure their ire than get any positive benefit from blocking them. I opted to go a different route.

I opted to show different, unblockable ads that address many concerns that advocates of adblocking raise.

When google adsense is blocked, I now serve static image & text ads to Amazon. Because the only javascript running is javascript I wrote, rather than a third party script, there is no additional security concern. Nor is their an extra strain on resources beyond what running my apps would cause anyway. No third party involvement also means no additional privacy concerns. The new ad policy that addresses the objections of most people who use ad blockeres. That sounds like a win-win in my book!

If you want to see the Amazon ads used, but don’t have an adblocker, you can always check them out here. As always, I welcome feedback.

One of the things that I think is holding my business back is the lack of a good name/corporate identity. I track my incoming links pretty closely, and I’ve noticed a tendency for users to be more trusting for websites that look corporate rather than personal. That’s bad news for me as my business name is my name.

Frustration is finding a perfect name, and discovering it had been registered just days before you came with it.

When my search was coming up incredibly short, I started thinking about branching out from the standard dotcom/commercial names. The past couple of years has seen an Explosion in new top level domain names. These are corporate sponsored (application prices started at $185,000). The expansion of new domain names was build as good to companies as they could have greater control over their brand, and good for consumers since there would be more choices. I’ve always been a little skeptical of additional top level domain (gTLD) names – (how often can two corporate identities succeed while have the same name?) – but I now have a new reason to be skeptical. There may be more domain names technically available, but that doesn’t mean there are more functionally available.

As I was thinking about the domain names, I came across the .space extension. How cool would data.space be? And it wasn’t already registered! To my dismay I realized that even though it was unclaimed, it would cost me at least $5,000-$6,5000 A Year.

To back track a little, the price for a new dotcom domain (if you can find one) is relatively low at around $10. That’s because there are thousands of registrars who can offer dotcom names and competition is a consumer’s best friend. Competition puts pressure on registrars to keep their prices low. Some registrars will register what they consider premium domains so they can resell them for a higher fee, but there’s nothing from keeping a customer from transferring between registers once they acquire the domain to keep the future years’ pricing down.

Registrars who wish to sell dotspace domains need to be accredited through Radix. Radix, a for profit entity, can set the price as they see fit, and has decided to set the price relative to what they think a domain is worth. Sarah.space would also be 5k/year. Piano.space would be $1k/year. The much less cool Datam.space would only be $10 a year. As the company who applied for the dotspace gTLD from ICANN, Radix has full control.

This discovery has me a little nervous about trusting new (gTLD). While the expectation is for the price of these domains to come down, there’s no guarantee. There’s no guarantee the price wouldn’t rise in the future. It’s a risk I’m not willing to take when it’s already so difficult to build a brand.

It’s back to the drawing board for me. I’m currently considering a phonetic spelling of a dotcom name I like, but is registered and unused. At least there will be some cost certainty.

I recently came across this new camera roll app for IOS that not only clusters photos based on subject matter, but computes an “ascetic score” and can show you your most ascetically pleasing photos. As someone who has 4,939 photos on her phone, and a love of machine learning, I had to give it a try.

My most aesthetically pleasing photo this month is… drumroll

nail
A nail sticking out of the floor which I have caught my foot on way too many times. I took this photo so I could show the kind folks at home depot what I was talking about. It just will not stay down no matter how many times I hammer it!

I disagree. I mean it’s a fine nail and all, but my most aesthetically pleasing photo? Adding insult to injury, the nail in the floor board photo only received an ascetic score of 71%.

The clustering results were mixed. Most of the time I’d get a several clusters of what I’d consider to be the same subject matter. Admittedly, it’s more handy to scroll through effectively ten photos rather than hundreds. In scrolling through the curated roll I found cute photos I had forgotten I had taken. But the other kind of fail, when photos are clustered that shouldn’t be, is much less forgivable. When the app fails, it fails big time.

Take the following example:

ac

bd

In the above photos, photo A was in a cluster with similar photos, while B, C, and D were clustered together. B & C I can understand; their different children but the same general framing, although A & C are closer in framing. I would prefer A & B to be a cluster, but wouldn’t mind A, B & C. And D? What the heck happened there? Also the respective image ascetic scores are 38%, 38%, 36% and 24%. While I wouldn’t use them to showcase my work, I do think they beat nail-in-floor-board. Except maybe photo D. I have no idea what I was going for there.

Me being me, I set out to reverse engineer the algorithm and see if I could figure out what is happening.

It appears that photos are being clustered based primarily on the timestamp. Cluster B, C and D’s photos were taken from 9:16 – 9:17, where as Cluster A’s photos were taken at 9:18. It appears that a photo is added to the current cluster if it’s tags are similar to the photos in that cluster, otherwise a new cluster is formed. There’s no interleaving. As a result, visually similar photos are sometimes buried distributed in different clusters. “Buried” might be a better term, if visually different photos have a higher ascetic score and are used as the key example from the cluster. For example, the top photo in B, C and D’s cluster is another photo of Alexis. This an undesirable feature is made extra bad by the fact that the app offers to delete “visually similar” (i.e. the less ascetically pleasing photos in the cluster.) That would mean all photos of Nicole by the flowers.

I can’t help but wonder how I would have clustered the photos. My first instinct was also to use the timestamp, however, I would allow for clusters to interleave. When taking photos I don’t decide to take photos of one child in the swing, switch to the other child on the slide, and then call it done. I usually go back and forth between the girls with my attention. Clearly Photos B & A should be in the same cluster, even though D was taken after B but before A.

My next instinct was to cluster photos based on color similarity. Putting the above four photos through my Image Color Pallet app I get the following pallets:

colorcomparephotos

Photos A & B have identical color pallets. Photo C is very similar, but has a blue cluster in place of the pink cluster, as the two girls are wearing different color jackets. Photo D, on the other hand, is very different. Clustering based on color pallets would put A & B in the same cluster, and possibly add photo C depending on the sensitivity threshold.

While aesthetics can be a bit subjective, I was delighted to find a few research papers on the topic of creating an algorithmic base approach to predicting how aesthetically pleasing an image is. This knowledge will come in handy for the photograph apps I’m working on!

February 25, 2016

My Ad Policy

There’s been a bit of an arms race between ad blocking users, content providers and advertisers. As ad blockers rise in popularity, revenue goes down for content providers. Content providers tend to resort to more obtrusive ads, hoping to garner more clicks from those users not utilizing blockers. They may put clickable content (like a next button) too close to the ad so misclicks turn to ad clicks. Or they do growing interstitial ads that block the content until the user is forced to interact with the ad. From the advertisers prospective, misclicks are worse than no clicks. They cost the advertiser money for little gain as the user wasn’t actually interested in the ad in the first place. As a result, the advertiser offers lower rates, reducing the content provider’s revenue further. The more annoying the ads, the more likely users are to turn to ad blockers in the first place. Things have gotten out of hand, and no one wins in this scenarios. The user has a terrible experience. The advertiser gets the wrong king of visitors. While the content provider may make money initially, over time he or she makes less and less as he or she drives more people to use adblockers.

I am not a fan of adblockers, but I recognize why some users are. In my world view, it’s the content owner’s responsibility to ensure a reasonable user experience, and that includes the ads the user sees. I intend to do my part to stay out of the arms race, even if that means less revenue.

With that in mind I’ve been thinking about my own ad policy:

No Misclicks and no Trick Clicks

I used to love the game Robot Unicorn Attack on my phone. I played it daily. Nicki loved it too, referring to it as “the horsie game.”

Aside: A great way for a scrupleless developer to make a quick buck? Write a ad supported toddler game app with ads on the screen the toddler sees. If there’s a button the’ll find it and click on it. You don’t even need to be sneaky with ad placement.

When you inevitably died the game gave you the option of using a fairy tear to continue your life. You clicked “yes” to continue, “no” to move on to the next life. At some point they moved the “no” button to the bottom of the screen and put a “watch a 30 second ad clip instead” option in it’s place. After one too many misclicks I uninstalled the game from my phone and quit cold turkey. Since my terms of service disallow adblockers, I certainly can’t fault anyone from quitting me cold turkey if I tried something similar!

As a content provider I promise to make all advertisements obvious, and will design my apps in such a way that any ad click is more than likely an intentional ad click.

Limited Ads

In order to test my ad blocking detection script, I needed to download an install an adblocker. By default I turned it off, but every time my browser restarted the ad blocker also restarted. This particular ad blocker would show how many ads it blocked on each page. The record was 49 on a news website. Forty-nine. I had no idea.

Of course there’s a big difference between news websites and my apps. Lengthy articles have more space for ads. I try and keep my apps contained within a single screen, so I couldn’t fit that many ads in, evening ignoring my first pledge.

Right now I’m limiting myself to just one or two banner ads.

Possible Revenue Alternatives

A common defense of adblocking users is that they weren’t going to click on ads anyway, so what difference does it make? This argument is based on the inaccurate assumption that adviews don’t earn revenue. They do! To be fair it’s rather small.

Some websites allow for users to donate money instead of viewing ads. Others allow other currency in lue of ads; such as a share of social media or subscribing. When I get larger I may consider the former option. If you want to speed up the process, you can consider Google Contributor. You pay a small monthly fee to contributor. Contributor blocks some of the google adsense ads you would have seen, and uses your monthly fee to pay the content provider as though you had seen the ad.

February 19, 2016

Anti Ad Blocker

As I try to make my way in the world as a ad supported content provider, ad blockers are a bit of a thorn in my side. Recently I’ve been pondering the ethics of ad blockers with a few friends. I’ve noticed that those people firmly in the “pro” ad blocker camp tend to view websites more as public property, such as a town library. In their view, as ‘net citizens they are entitled to the content of each webpage, should they so choose to consume it. They’ll quickly point out that ads are not just annoying, but can run a muck, crashing browsers and, in rare instances, install viruses on the web surfer’s computer. As they see it, sometimes blocking ads is necessary tool for navigating the web.

I tend to view websites more as private property, closer to a book store rather than a public library. Sure, you’re often free to browse the content at your leisure in your local Barns and Noble, but that’s more of a store policy than a requirement. After all, there’s no rule that says content providers cannot use pay walls, or restrict content by requiring registration.

The book store owner typically encourages the leisurely browsing, by providing comfy chairs and sometimes offering a coffee shop or nice music. The hope is that the additional browsing time turns into collateral purchases. There’s no requirement for patrons to purchase, of course, and not all customers do. Enough customers do buy extra books to make it worth the store owner’s while. The display advertisement model is rather similar. Content providers attempt to provide enough interesting content to keep web surfers on their websites, in hopes for a few advertisement clicks.

In the-website-as-book-store view, it’s the responsibility of the content provider to ensure his or her ads are not overly burdensome. You might not like the music playing over the loud speakers in a traditional brick and mortar store. Perhaps it’s just the lyrics you object to, but it’s playing at obnoxiously loud decibels. You still wouldn’t take it upon yourself to rip out the speakers. No jury would accept “potential hearing loss” as justification when you’re free to leave the store at any time. If your experience at the store was unpleasant you may complain, and you’d probably leave. That’s what I do. When ads get too annoying, I leave the website.

It’s not a perfect analogy – (ad blockers are not destruction of another’s physical property) – but it’s the analogy I got.

Many of my favorite techie news sources have been reporting ad blocker use has been on the rise in recent months. As someone trying to make a business with ad supported content, that’s a scary proposition. I decided I couldn’t live in the dark anymore, not knowing how many of my users are utilizing an ad blocker. In a few days I’ll have a good idea whether and how big a problem ad blockers are on my site.

November 14, 2015

Data Loss

Someone once told me the problem with health insurance is you don’t know how good yours is until you need it, and by then it’s too late. The same can be said about computer backup recovery systems.

I’ve been using crash plan for two years now. Up until now I’ve been quite happy with our setup. My data is backed up in triplicate. One external hard drive drive (E:) contains all the raw images, directly off the camera, in the same directory structure the camera creates. Another drive (F:) has all my images organized, so I can easily find the photos I’m looking for. I’m using Crashplan to back up both those drives to another local hard drive (G:) and also to the cloud. More on the cloud later. I figured I had to be safe from data loss, right? How could three copies of my data not be sufficient?

I should mention that before creating the local backup instance on drive G:, I had tested the crash plan’s cloud backup. I say that as though the test was intentional, and not because I accidentally deleted a directory. Regardless, recovering the deleted files was easy peasy lemon squeezy so it never occurred to me to also test the local backup instance. That mistake is on me.

On October 17th my E: drive failed. All the light weight solutions – chkdsk, restarting, etc – where to no avail. My computer happily told me the disk was unreadable and suggested reformatting. I decided to not waste too much time trying to repair the drive. After all this was exactly the use case for the local crash plan backup. I reformated the drive and began the process of restoring over 296,224 files.

I got back 296,224 “Unknown problem” error responses.

At this point I wasn’t expecting to experience much, if any, data loss. I could still pay the $300 and recover from the cloud, and I had my F: drive which should be the same files. I say “should” because the two drives do get out of sink some times. I previously wrote a java program to run through the directories to warn me when this happens, but I couldn’t remember the last time I ran the job.

I couldn’t figure out what went wrong from the logs so I contacted tech support. Tech support theorized it was a known bug that affect NTFS file systems on Windows computer when the drive was reformatted. This was extremely frustrating to hear as NTFS is the default file system on windows, and recovering from a dead drive was, again, one of the primary uses cases for crashplan! Tech support’s suggestions included (1) try a different operating system, (2) reformat the drive to a different file system, and (3) downloand the files piecemeal. Of those, (2) was the least ridiculous. I reformatted the drive to ExFat and tried again.

I got back 411 files. A handful of iPhone photos and a bunch more “Unknown problem” error messages.

But this time I could tell there was still an error with the drive. The 2 TB drive was showing as full with just a handful of data. I reformatted again. Reformat failed. Repeat, repeat, repeat. After the fourth reformatting failed I purchased a new hard drive.

I got back 34k files.

The troubling thing now was crash plan was failing silently. There was no error given. The only indication that something had gone wrong was the fact that I had only gotten back a tenth of my files. I tried again, more silent failures. At this point my confidence that I was going to get back all my data was waning so I started looking into the cloud.

I had two options when it came to restoring from the cloud. The first was to download all 2TBs worth of data in 500 MB chunks – 4000 chunks to be exact! The second was to pay $300 to “Restore to Door”. I had originally thought “Restore to door” meant they send you a hard drive with all your files in tact. Nope, they send you a local crash plan instance you can restore from. Basically I would have another copy of my G: drive. That didn’t leave me much hope that it would fair any better.

I restored again, this time a smaller subset of photos from my local backup. Success! Restored again, more success! I was able to partition my data into five chunks and restore each chunk without issue. The process took two weeks, as I kept getting stuck waiting for crashplan maintenance modes; deep pruning, synchronizing, etc. The deep prune itself took four days to complete.

In the end I lost just 16 photos. I’m not happy about that, but I can live with it.

My lessons learned:

I’m not the typical use case Crash Plan was designed for. Crashplan is sort of a light weight version control system in addition to backup engine. It keeps multiple versions of your files (as many as you specify), and are constantly scanning your file system looking for recent changes. In doing so they make the design decision to focus on recently changed files. That makes a lot of sense if your backing up your working directory. You probably want/need the latest version of any paper your writing, or program your coding. It makes less sense if you’re backing up an archive full of photos. I want at least one version of every photo backed up. I can always re-edit them, but I can’t re-shoot them!

I have more data than the typical Crash Plan user. When I was searching through the forums looking for tips on how to speed the process up, (4 days to deep prune, are you kidding me?!), I found a number of folks with similar problems, each with large data sets. Between all our home computers we had backed up 3TBs worth of data. I had already busted the memory of the app as high as their sample recommendations go, and went even higher during this process. I’m starting to reach the point where crashplan just cannot hold everything in memory it needs to. When that happens with any program performance drops off a cliff. When crash plan runs there’s not a lot else I can do with the computer.

The Verdict on Crashplan:

Obviously the continuing to fail drive was not Crashplan’s fault, and I was able to recover almost all of my files. Still, taking two weeks to recover a hard drive seems a bit excessive. I take a lot photos, and I don’t expect that to change any time soon. Crashplan just may not be right for my use case. I have a little over a year left on my crashplan subscription and I see no reason to jump ship now. I may look into alternative back up options when the end of the subscription nears.

I have just launched my second webapp since starting my own company: a Name Generator.

One of the extensions to my name uniqueness analyzer I was considering was the generation of new, plausible names not currently on the Social Secuirty Administration’s name list. Then an article from the Atlantic prospective parents paying upwards of $31,000 to find unique baby names spurred me into action.

Word generation is a straight forward process with language modeling. Language modeling works by looking at frequencies of commonly occurring terms or characters. A new word is generated by iteratively selecting characters based on how likely they are to follow the part of the word already generated. Lets say the name generate randomly draws ‘M’ for the first character. Almost 61% most names that begin with an ‘M’ have an ‘a’ for the next character (e.g. Mary). An additional 6% of names beginning with ‘M’ have an ‘e’ for the second character (e.g. Melissa). The character ‘b’ never follows ‘M’, at least not in 2014. Thus the language model would then select an ‘a’ to follow ‘M’ with ~61% probability, an ‘e’ with ~6% probability, and a ‘b’ with near 0 probability. The end result is a new sequence of characters that would reasonably follow each other. Some of my favorite generated names so far are Delyn, Alexandrina, and Zanda.

I am proud of the mathematics that went into my app, but I feel like the app itself is still missing something. I’ve been thinking back to the naming process with the girls. We had already chosen ‘Nicole’ as our girl name before becoming pregnant the first time. Alexis was harder to name. We debated between ‘Alexis’, ‘Allison’, and even ‘Alexandra’ for a while. I wanted an ‘A’ name. Its probably pretty common to have a preferred prefix or suffix in a name, so I added that capability. I want to add more features to my name generator, but I’m not sure what will be useful.

Here’s my question for you: What kind of things did you think about when naming your child? Did you have a specific sound you were looking for? Or meaning?

September 12, 2015

Color Me Wrong

Let it be known, I admit my mistakes. One day short of 11 months ago, I blogged about using Living Colors as a night light with a picture of a blue light. When I think of night, I think of blues. As it gets darkers reds, yellows, greens, all other colors tend to disappear. Everything appears caste in blue It’s the Purkinje effect. So a dim blue would be relaxing, right? Looks like I was wrong.

This morning I came across an article in my news feed about yellow lights to help babies sleep. I knew blue light wasn’t good for sleep, but I always thought they were referring to white light on the blue side of the color spectrum, e.g. light generated by monitors and other electronic devices. It wasn’t until I was reading an article explicitly calling out yellow light, not yellowish white light that it occurred to me they could mean actually blue light. Nicki has been wanting her nightlight to be all shades of red and I’ve been resisting thinking it would surely disrupt her sleep and give her nightmares. Sorry sweetheart, mommies do make mistakes some times.

So what about the Purkinje effect? It turns out the blue tint has to do with the color receptors in our eyes. The light sensitive rods are less sensitive to color. Thus in the low light where the rodes are more receptive we perceive less color. It has nothing to do with the color of light, but the quantity of light.

I think it’s time to get Nicki a proper night light. Living Color is a bit too bright, anyway.

After not finding anything to my taste, I’ve decided to 3D print our ‘New Home 2015’ ornament. I found an on demand printing company, so all I need to do is create a CAD file. The design will be a key, with the bow resembling our house. Kind of like this. My intuition is that it will be a fairly easy first 3D project, since it’s mostly a 2D design with beveling.

The more I think about it, the more I like the idea of mixing 3d printing with my ornaments. I have a lot of ornaments. Some are pretty durable, others not. I have back up copies of all the important milestone representations: new home, just married, babies first Christmas, etc. But it would be really nice to back them all up in electronic form. Electronic files require less storage space than physical copies, and I could create as many copies as I need. No more purchasing spares because something might break.

I thought surely this must be a violation of some law. After all, I cannot create a digital copy of a DVD to store as backup – and that’s an electronic medium to start with! It doesn’t seem ethical to buy an item and make copies, if I would have otherwise bought two. I did some digging and at least in terms of trademark law, it’s fine as long as my copies or electronic files are never exposed to potential consumers, it seems I’d be safe. Copyright law is another matter. This use case may fall under fair use, but given how new 3D printing is, there isn’t a lot of case law on it.

Some companies are embrace 3D printing. Hasbro did with it’s My Little Pony brand, and they similarly have large fan base of collectors. They benefit from a licencing fee, without much fear that it will damper interest in their original merchandise. After all die hard collectors will still want the original. Lego, too, is considering allowing users to print their own blocks. Maybe Hallmark will fallow suit. Then again, if my experience with their keepsake club website is any indication, Hallmark is less technologically savvy. We’ll just have to wait what happens.

August 4, 2015

Color Coding

For my startup I’ve been spending every free moment reading about all things photography and image related. (as well as kicking myself for never having taken a graphics course in College.) This week I’m learning about combining colors digitally. Colors are surprisingly more complicated than kindergarten lead me to believe.

Computer monitors typically display in RGB, sRGB to be exact. I’ve had some experience working with RGB before, both negative and positive. In the RGB color model each color is represented as a combination of the red, green, and blue additive primaries. The values of each component range from 0 to 255, meaning 3255, or 16.6 million colors can be represented. This color model works well for computers (and hue!) because displaying a color is a simple matter of displaying red, green and blue light in the right amounts. Three little LEDs is all you need.

The draw back of RGB is that color mixing is not intuitive for us humans. Consider blue & yellow. In grade school we learn blue + yellow = green. In RGB, Yellow is (R:255, G:255, B:0), and Blue is (R:0, G:0, B:255). Combining yellow and blue gives us (R:255, G:255, B:255) or white. The difference is that we learned color combining through pigments, which is a subtractive color model. Light is an additive one.

ybattitive

My startup is a technology based product for humans. I am working with an additive color model, and need it to behave like a subtractive one.

The solution I’m taking is to convert to LAB color space. The LAB model has the very nice property of being closer to “perceptually uniform”. Representing each LAB color as a vector, the euclidean distance between two color vectors corresponds to the differences in color. For small distances, at least. It turns out for large differences I need to compute the slightly more complicated DeltaE. At least I now have an algorithmic approach.

Colors can be surprisingly complex.

« Newer Posts - Older Posts »