January 19, 2017

Index Prunner

After using my Sitemap Index Analyzer to analyze Datayze, I came across something peculiar. Notice anything… odd… about the following search results?

I don’t know about you, but if I ever reach 48 weeks & 6 days pregnant, miscarriage will be one of the last things on my mind.

The above search result is problematic for a couple of reasons. (1) The page linked to has no useful content. The Miscarriage Reassurer only accepts input up to 20 weeks. (2) It’s taking up a valuable spot away from another page that could have useful content. And (3) If no users click on the link, a distinct possibility given it’s unlikely to be relevant to anyone, the search engine could view the lack of clicks as a negative signal and may penalize the entire domain.

One could make an argument that users might be more inclined to click on wacky and clearly ridiculous search results. Curiosity is strong motivator. That could explain why such a weird result was promoted to the first page to begin with. Still, those still aren’t useful clicks, and are likely to lead to lower engagement.

Counterintuitively, it’s better to have a smaller number of high quality pages indexed by the search engine than have a large number of useless pages. I have no idea what led Google to this particular page, or what made Google index it, but it has to go.

Up until now I have largely been using the x-robots-tag header response to signal when pages should expire from the cache. That’s because up until now my impression of useless pages were mostly links to individual Labor Probability Calculators with long past due dates. Since the Labor probability calculator only calculates the probability of a labor for an existing due date, a page associated with a long past due date will have no useful content. The x-robots-tag header isn’t recognized by all major search engines, but right now my Google index far eclipses other major search engines, so it’s the one I tend to focus on. An expires tag isn’t so helpful in this case where the data is invalid from the start. The problem of spurious results might be a little more pervasive than I initially thought, its time to do a more substantial prune.

I had to modify my page set up so I could set the NOINDEX robots meta field. Now I just have to wait for the Googlebot to recrawl my website and do it’s thing. Good thing I already have an app to help me get started!

Posted in Work Life | Tags:


Leave a Reply

Your email address will not be published. Required fields are marked *