February 21, 2013

Writing Sample Analyzer FAQ

In preparation for my re-entering the work force (shameless self plug – I’m job seeking!) I closed down some of my old legacy sites that I’ve been keeping around for posterity. One of which was my consulting site, Even though I haven’t updated it in years, the tools were still in use, especially the Writing Sample Readability Analyzer. So I moved it to my resume website,

Since I’m getting a ton of emails about it lately, I thought I’d answer some of the frequently asked questions.

How does your analyzer work?

The analyzer uses the Flesch, Fog and Flesch-Kincaid metrics to predict reading ease. Each approximation makes the basic assumption that longer sentences are harder to read than shorter sentences, and words with more syllables are harder to read than words with less syllables. Although the underlying principle is the same, each metric is calculated slightly differently.

FleschScore = 206.835 − (1.015 × AverageSentenceLength) − (84.6 × AverageNumSyllablesPerWord)

FogScale = 0.4 x (AverageSentenceLength + PercentageOfWordsWithThreeOrMoreSyllables)

FleschKincadeScore = 0.38 x AverageSentenceLength + 11.8 x AverageNumSyllablesPerWord – 15.59

I’ve used your analyzer and another analyzer and gotten different results, why is that?

The Flesch, Fog, and Flesch-Kincaid are well defined metrics. If another system is reporting a different score for the same metric, then the input variables (either number of sentences, or number of syllables per word) must be calculated differently.

It is surprisingly not as straight forward to calculate sentence boundaries as it seems. As humans, we can identify when a sentence ends pretty easily. Since the computer can’t really parse or understand the sentence**, it can only make an educated guess based on clues like punctuation and capitalization. But not all punctuation (think abbreviations) end sentences, and not all sentences are ended with punctuation. This is especially true online were sentences are often not well formed.

The same goes for computing the number of syllables in a word. It may seem simple to just create a list of syllables per word, but language is infinite and constantly evolving. Such a list is not possible. Pronunciation (and the number of syllables) can differ in different parts of the world. Additionally, heteronyms words that have the same spelling, but different pronunciation, can have different number of syllables. The word ‘learned’ as the past tense of the verb ‘to learn’ is one syllable, but ‘learned’ the adjective to describe someone with scholastic achievement is two. Any method to calculate the number syllables per word will involve some heuristics.

The differences in calculating sentence length and the number of syllables will tend to be more noticeable on shorter samples, rather than longer. Even so, while there may be differences between different analyzers, the differences should be relatively small.

** There is an active area of research in natural language processing which tries to automatically parse and understand sentences.

Which analyzer is the most ‘accurate’?

There are two types of ‘accurate’ we can consider: which analyzer comes closer to the true Flesch, Fog and Flesch-Kincaide metrics, and which one better predicts reading ease. Keep in mind that each metric is just a heuristic based on an assumption that is often true, but not always. For example ‘kiln‘ is one syllable, but harder than the simple, three syllable word ‘together’. Depending on the kind of text you are analyzing, you may find one method or score works better for your application than another.

Let’s consider two Analyzers, one with a very good sentence boundary detection, Analyzer A, and one with a very good syllable per word calculator, Analyzer B. If you were analyzing writing samples from elementary school children, you may prefer A. That’s because young children may not write grammatically correct sentences and typically don’t have a rich vocabulary, so a more complex syllable per word calculator wouldn’t buy you much whereas a better sentence boundary detector may be necessary. On the other hand, if you were analyzing scientific journal articles, you may prefer B.

My suggestion is to use both analyzers to get a feel of which one is better for you and your task.

Will you share the code?

I have in the past, but only for extra special cases.

April 3, 2012

Crooked Perspective

I have discovered a not cool interaction between my blogging software (wordpress), smart phone (iPhone) and camera (Nikon DSLR).

The Nikon, like most cameras, has an internal leveler, and can tell which way the camera is oriented to take a portrait rather than a landscape image. Along with other bits of metadata, the camera sets an orientation flag for each image. Think of the metadata as additional information about the file. The actual data in the image file, however, is still stored as a landscape, since that is what the image sensor ‘sees’. It’s like if you tilt your head and look at a glass of water. The glass appears like it’s on its side to you, but in your mind you know your head is tilted, not the glass, which is why the water isn’t spilling out.

My computer, as do most computers, renders the image according to the data in the image file. As a result, the image appears as a landscape, regardless of how the camera was oriented. I then use image editing software to rotate the image (effectively re-arranging the pixel data). The orientation flag in the metadata remains unchanged. WordPress also ignores the orientation flag. So the uploaded image appears on my blog the same way it appears on my computer.

The iPhone, however, tries to be smart. Since the orientation flag is still set, it assumes the image needs to be rotated again to display correctly. As a result, the image appears rotated an extra 90 degrees on my iPhone. But only on the iPhone, so I didn’t discover the problem until recently!

The only way to fix it is to strip the metadata so the orientation flag isn’t set, but that means going back over all my past entries and uploading a new photo for all the crooked ones. Not Cool.

To be honest, I’m surprised there isn’t an easier way to strip the image metadata. Aside from the orientation issue, metadata can include GPS location information. It’s handy for figuring out what your photos are of years after the fact, but if you upload an image with geographical information, someone can figure out where you’ve been. You can view the geolocation data in images online for yourself.

Metadata does have it’s uses. Some photographers like to store copyright information in the metadata. Camera manufactures and image processing software also like to add their mark to the metadata, as a form of free advertising for anyone looking to see how an image was created. For me, though, I wish there was an easy way to get ride of it.

Edited to add: There is an easy way to bulk strip meta data!. If your using wordpress installed on a unix server:

>cd wp-content/uploads
>exiftool -all= */*/*.jpg

The ‘*’ character is a wild card. The first * matches all the year directories (2011, 2012, etc), the second * matches the months (01, 02, *) and the third star matches the name of the file.

February 24, 2012

One Year Blogiversary

One year ago I made the plunge from my own home grown blogging software to WordPress. At the time I was at a cross roads with my website. I wanted to preserve what I had built as a teenager, but the old website no longer ‘fit’ me. After much back and forth, I decided to give WordPress a try.

It’s been an exciting year, one I’m very happy to have the chance to blog about. Over the summer I had an internship at Bing in Seattle, which meant spending the summer away from my husband. Of course, there’s the exciting news of our pregnancy, something I have been waiting for for so long. I was also pinned on pinterest, which I’m still over the moon about! I love looking back at the highlights.

So far I don’t think I have any regular readers. I hope that changes one day. I’ve had some good advice, both over email and comments about the cake pops, and I know some you have great crafting/travel/saving tips, and I’d love to hear them!

While the change to wordpress has been great in general, there’s one thing that annoys me. I often go back over old entries and fix typos. (I’m dyslexic so there are always quite a few.) Each time I re-edit a blog post all the back links get updated to reflect the date of the edit, not the date of the first post. I guess it makes sense, since the first edition may not have had the reference, but I’d rather pretend the typos never existed.

Interestingly, I’ve gotten three requests for advertisers wanting to buy ad space on certain posts. Three! I’ve turned them down. Sure it would be nice to supplement (or replace!) my grad school stipend with revenue from my blog, but I’m missing the two most important features for a blog from an advertiser’s perspective: an audience and a clear focus. Since so much of my blog is about saving money, I’m also not sure how accepting specific advertisements will affect my credibility. Also, I’m pretty sure these offers were spam and not legitimate requests for ad placement.

I may revisit the idea of accepting advertising or sponsorship sometime in the future, but for now I think I’ll stick with Amazon’s Affiliate Program and Google AdSense. After a year of blogging, I’ve earned a full dollar!

August 28, 2011


Have you heard of It’s an inspiration virtual pin board kind of like delicious, but with picture bookmarks. All serious crafters, decorators, bakers that I know have accounts with pinterest. Well imagine my surprise when I was checking logs and found an incoming link from Pintrest! Turns out someone on pintrest found me, and liked me! I am honored. Whoever you are out there who pinned me (and, dare I hope, maybe even more than one of you?), thank you! You made my day! I’ll have to post more craft projects, so that I can be worthy of more pins.

I have been meaning to post the tutorial for the beaded ornament net. I started doing a different pattern with a St. Petersburg Chain. It feels very Christmas-y. I am already in the holiday spirit, even though it’s only August. I may only be able to hold off on the Christmas carols a little longer.

I have a few other projects I have on the to-do list. I should come up with a better sugar-free cake pop recipe, since that is also one of the most popular posts on my blog. A pudding based icing and cake from a box? Yeah, I can do better than that.

February 24, 2011

Hello world!

Lately I’ve been struggling with what to do with my website.   I started my website back in 1997 (on Geocities no less, anyone remember them?), moved to my own domain in 2000 (remember ?) and started blogging in 2002.   When I started my website I was an awkward teen, painfully shy and didn’t have much self confidence.  My website became my outlet, and my confidence grew as my site grew.  It became a part of my identity.

I feel off the blogging bandwagon in the past few years for a variety of reasons.  Time was a big one.  A bigger one, however, was the site itself.  I had written my own blogging software.  It had all the features that were popular back in the early 2000’s that no one uses today.   Then there was the content itself, outdated and read, unsurprisingly, as though a teenager wrote it.  I was drawn between trying to preserve what I had already done, and wanting to update and more representative as the woman I had become.

As you can tell, I decided to update (and upgrade!).  I have a copy of my old website for posterity, but from now on it’s a new look for a new way of life.  You can still find all the old games I’ve written at, and all the old tools at The fictional writing is now offline, replaced with academic papers. I have new hobbies now, including my photography, baking, crafts, and of course, enjoying my life with my new husband and family.

