Posts Tagged ‘Doing the Math’
Now that I’ve had ads up on my webapps page for a few days now I thought it’s time to analyze the numbers. I assumed the amount of revenue earned each day fallows a normal distribution. (Actually, this is an over simplification that makes the math easier. I’m sure there’s weekly patterns at a minimum, and possibly seasonal factors as well. And really, it should probably be a mixture of Gaussians for each component – revenue from views and revenue from clicks with a hyper parameter of views but I digress…)
My model shows I have a 55% probability of earning at least $100 in a year with a $118 being the expected value.
Not bad for two scripts I basically wrote in my spare time. But not great either. It if was an order of magnitude higher, I’d have great confidence that I could start my own business. With time an energy I could probably turn my webapps into something substantial. On the other hand, if it was an order of magnitude smaller this path wouldn’t seem viable at all.
Of course, it would be possible for me to earn more if I cut out the middle man. Google takes a cut of 32%. Thus if I’m earning $118, Google’s cut is $56, and the total revenue generated from ads on my webapps is $174. This tells me I could charge $14/mo for ad space. Of course, then I’d have to find advertisers and convince them that my webapps are worth it. No, the cut Google takes is worth it.
On the job front, I did find a local data scientist job to apply for. It’s not the same data I’m used to, but I’d love to branch out. Domingo and I have often joked how at home I’d be in a “Black Friday war room”, analyzing real time shopping data. So why try consider something new?
A little over 2 months ago I decided to start tracking Nicki’s sleep. At the time she wasn’t sleeping very well and I wanted to have a dataset I could analyze.
Histogram of the number of hours Nicki spends sleeping.
It may appear like a left skew distribution, but that’s because I was using a sub optimal bedtime during the initial few weeks of my study. Without those weeks her histogram shows a normal distribution with mean 11:30-12:00.
For this analysis I mostly looked at correlation. Correlation shows the statistical relationship between two sets of numbers. It ranges from -1 to 1. Negative correlation [-1,0) shows two variables are inversely related. As one increases, the other decreases. Positive correlation shows two variables tend to increase or decrease together. The closer to 0, the weaker the correlation.
Correlation(Time put down, Time spent asleep) = -.72
When I put her to bed earlier, she tends to sleep longer.
The time I put Nicki down for bed is correlated with how long she sleeps – earlier bed times mean more sleeping! That makes intuitive sense. My circadian rhythm wakes me up at certain points, provided I’ve slept a decent amount. I’m now in the habbit of waking up at 7:00 am, regardless of what time Nicki wakes up. (Mommy misses sleeping in until noon on the weekend.) Nicki could be the same way. Earlier bedtimes mean there’s more hours between when she goes down and when she typically gets up, which could correspond to longer sleep intervals.
Every baby book I own says “Early to Bed Late To Rise“. In other words, put the baby to sleep early and she will sleep in longer. What do my numbers show?
Correlation(Time put down, Time Woken Up) = -.21
When I put her to bed earlier, she tends to wake up later.
So yes, she does tend to sleep in longer on days she goes down earlier, but it’s weak correlation. It could be that the relationship is weak, or that there are other factors at play. One possible factor is day light savings time. Specifically the position of the sun. We’re in the middle of Spring, sunrise is getting earlier and Nicki tends to wake up around sunrise. If I take a weekly average of her wake up time, I see it inching forward for the first three weeks.
Another aspect of sleep I care about is how long it takes her to fall asleep. The books all say over tired babies have a harder time falling asleep. Was it true for Nicki?
Correlation(Time put down, Number of Minutes needed to fall asleep) = 0.4
When I put her to bed earlier, she takes less time to fall asleep.
My analysis shows that, at least for Nicki, earlier bed times lead to better sleeping.
Still asleep after sunrise. Love the bear on her butt!
Of course, Correlation does not imply Causation. There could be other factors at play. Our bed time is between 7/7:30. On days she’s extra tired, she goes down a little earlier. Less sleepy, and bed time is closer to 7:30. A tired baby is more likely to fall asleep quickly and sleep longer.
If I were to do a true study I’d have to randomize her bedtime. That means some nights putting a wide awake baby down, and some nights trying to keep a tired baby awake. I may love data, but even I’m not that crazy. Still, it’s neat to see Nicki’s sleep numbers.
In preparation for my re-entering the work force (shameless self plug – I’m job seeking!) I closed down some of my old legacy sites that I’ve been keeping around for posterity. One of which was my consulting site, bluecentauri.com. Even though I haven’t updated it in years, the tools were still in use, especially the Writing Sample Readability Analyzer. So I moved it to my resume website, sarahktyler.com.
Since I’m getting a ton of emails about it lately, I thought I’d answer some of the frequently asked questions.
How does your analyzer work?
The analyzer uses the Flesch, Fog and Flesch-Kincaid metrics to predict reading ease. Each approximation makes the basic assumption that longer sentences are harder to read than shorter sentences, and words with more syllables are harder to read than words with less syllables. Although the underlying principle is the same, each metric is calculated slightly differently.
FleschScore = 206.835 − (1.015 × AverageSentenceLength) − (84.6 × AverageNumSyllablesPerWord)
FogScale = 0.4 x (AverageSentenceLength + PercentageOfWordsWithThreeOrMoreSyllables)
FleschKincadeScore = 0.38 x AverageSentenceLength + 11.8 x AverageNumSyllablesPerWord – 15.59
I’ve used your analyzer and another analyzer and gotten different results, why is that?
The Flesch, Fog, and Flesch-Kincaid are well defined metrics. If another system is reporting a different score for the same metric, then the input variables (either number of sentences, or number of syllables per word) must be calculated differently.
It is surprisingly not as straight forward to calculate sentence boundaries as it seems. As humans, we can identify when a sentence ends pretty easily. Since the computer can’t really parse or understand the sentence**, it can only make an educated guess based on clues like punctuation and capitalization. But not all punctuation (think abbreviations) end sentences, and not all sentences are ended with punctuation. This is especially true online were sentences are often not well formed.
The same goes for computing the number of syllables in a word. It may seem simple to just create a list of syllables per word, but language is infinite and constantly evolving. Such a list is not possible. Pronunciation (and the number of syllables) can differ in different parts of the world. Additionally, heteronyms words that have the same spelling, but different pronunciation, can have different number of syllables. The word ‘learned’ as the past tense of the verb ‘to learn’ is one syllable, but ‘learned’ the adjective to describe someone with scholastic achievement is two. Any method to calculate the number syllables per word will involve some heuristics.
The differences in calculating sentence length and the number of syllables will tend to be more noticeable on shorter samples, rather than longer. Even so, while there may be differences between different analyzers, the differences should be relatively small.
** There is an active area of research in natural language processing which tries to automatically parse and understand sentences.
Which analyzer is the most ‘accurate’?
There are two types of ‘accurate’ we can consider: which analyzer comes closer to the true Flesch, Fog and Flesch-Kincaide metrics, and which one better predicts reading ease. Keep in mind that each metric is just a heuristic based on an assumption that is often true, but not always. For example ‘kiln‘ is one syllable, but harder than the simple, three syllable word ‘together’. Depending on the kind of text you are analyzing, you may find one method or score works better for your application than another.
Let’s consider two Analyzers, one with a very good sentence boundary detection, Analyzer A, and one with a very good syllable per word calculator, Analyzer B. If you were analyzing writing samples from elementary school children, you may prefer A. That’s because young children may not write grammatically correct sentences and typically don’t have a rich vocabulary, so a more complex syllable per word calculator wouldn’t buy you much whereas a better sentence boundary detector may be necessary. On the other hand, if you were analyzing scientific journal articles, you may prefer B.
My suggestion is to use both analyzers to get a feel of which one is better for you and your task.
Will you share the code?
I have in the past, but only for extra special cases.
I used to hate it when someone told me how much Nicki’s changed or how big she’s gotten. The words feel like a dagger to my mom-tographer heart who never feels like I’ve taken enough photos of Nicki. Yes, I take a ton of photos, but they’re typically the same photo with slightly different angles and I rely on my iPhone way too much. So a week or so so ago I decided to sit down and finally organize the photos I have, to see if my fears were justified. (Aside: nothing makes a mom-tographer panic like missing baby photos. I couldn’t find our Thanksgiving day photos for about 2 hours. Not fun.)
I had been thinking I only use my DSLR for formal photos. The good news is that’s not really true. I loosely defined formal photos are ones that I do against a back drop or involve some amount of setup other than moving clutter out of view (e.g. the newborn photos, the Christmas lights photo and the Halloween photo). I took 3532 non-formal and 5219 formal photos.
What about Duplicates?
Well, 2084 of those photos were newborn photos. About 3/5ths of those are sleeping baby photos and there’s only so many different sleeping baby photos one can take. There were 392 Nicki and Phia photos. In my defense, I wanted a canvas print, and I wanted to be sure I had a photo that would work.
I’ve also been working on Nicki’s baby book. Of the 1191 photos I’ve taken just for the book, I plan to use 6. That means I take an average of 199 photos in order to get ONE for the book. Again, I’m obsessed with perfection.
I take monthly photos of her to track her growth, and I’m pretty good about taking Holiday photos. But how do I do on an every day bases? For this analysis I’m only including non-holiday and non-event (like the day she was born).
The number of informal photos I’ve taken of Nicki each month.
My initially thinking was that I wasn’t taking enough photos of Nicki when she was 2 and 3 months old, but as we can see, that’s not the case. I did pretty good for months 1 & 2 when I was on “Maternity Leave” from grad school over the summer. Month 3 was also pretty decent at 195 photos. For months 4 and 5 that would be 45 and 25 photos of Nicki the entire month. Most of those photos are also duplicates, so in month 5 I effectively only have 2 photos! On the other hand, Month 5 was our trip back east, and I took 406 informal photos at Christmas time. The ironic thing: I found two different set of Monthlies for Five months. Maybe subconsciously I knew I wasn’t taking enough photos and forgot which ones I had already done?
The bump at month 6 is from the 52 week project. Looks like I just needed some motivation.
The numbers work out to 17 non-formal and 25 formal photos a day from my DSLR. Whenever I feel like I’m not taking enough, I should remind myself of those numbers. Yes, they’re not all perfect, and yes there are duplicates, but that is still quite a bit. Do I wish there were more? Sure. But I don’t think having more would actually alleviate that desire. You can never have too many baby photos in my book!
Now that I’ve organized the photos a bit better, I can more easily go back and see what photos I do have. In the process of organizing my photos I also discovered some I had forgotten about.
I always get so close to focus on the face, but sometimes it’s nice to step back and show scale. I can’t believe how tiny she was! She’s filling the rock n’ play these days!
My biggest wish, however, is that I could go back to the hospital. I took 54 photos of Nicki in the hospital. This is one time when I wish I had more duplicates!
Baby center released its list of the top 100 baby boy and girl names for 2012. What’s not on the list? Nicole. For most of the eighties Nicole was one of the top ten, but lately it’s been on the decline.
Of my friends who gave birth in the past year, three strived for unique names. I thought they succeeded. If I had to guess, all three picked names less common than ‘Nicole’. In reality each of those names all made the top 100 list. In fact, two of those friends, who don’t even know each other, ended up picking the same exact name (though one used it as a middle name).
This got me thinking. How unique is unique these days?
Babycenter’s list is generated from their members, which isn’t necessarily a representative sample of all babies. To get a broader perspective I turned to US census data. Here’s what I found.
What does the distribution of names look like?
We all know the popularity of given names changes all the time. In 1995 the most popular girls name was ‘Jessica’, but in 2011 ‘Jessica’ was ranked 120th. I wondered if parents looking for rare names may end up like my two friends, and settle on the same “rare” name. Maybe ‘Kendall’ (currently just three ranks below ‘Jessica’, and six below ‘Nicole’) will become the new ‘Emma’.
I looked at a variety of years, but ultimately decided to compare 2011 to 1995 as they have a similar birth rate and 2012 data isn’t available yet.
There are more names to choose from
The US census data only lists names that were given to at least 5 babies. Using the birth rate data I can extrapolate that approximately 7% of babies in 1995 and 8% of babies in 2011 had a name given to less than 5 babies. So in addition to the two years having the approximately the same birth rate, the US census lists for both years represents approximately the same number of babies.
Here’s the interesting thing: In 2011 there were approximately 34 thousand unique names on the US census list whereas there were only 26 thousand unique names on the list for 1995 – for approximately the same number of babies!
The ultra common names are the names on the most decline
In 1995, 44% of girls and 58% boys (or roughly 1 in 2 babies) were given a name that made the top 100 list for that year. For 2011, only 31% of girls and 44% of boys (a little less than 2 in 5 babies) had a name that made the top 100. Thus the use of a name in the top 100 is declining.
Yet, most parents still pick names that in the top 1000. Despite there being over 19 thousand different girl names and 14 thousand different boy names given to babies in 2011, 67% of girls and 79% of boys (or roughly 7 in 10 babies) had a name in the top 1000. If we subtract out the top 100 baby names, we find 3 in 10 babies had a name ranked in the top 101-1000. That’s the same rate as in 1995!
What about Names Collisions?
Parents often state they are striving for unique names out of a desire that their child to be the only child with a given name in school. In other words, they want to avoid a name collision.
How common are name collisions?
Name collisions have been on the decline. Using a Monte Carlo simulation I was able to compute the probability of a name collision for a group of babies. Thirty years ago a group of 40 babies would be 97 to 99% likely to have a name collision. In 1995, there’s a 79.9% probability of a name collision in a group of 40 babies. In 2011, there is only a 56% probability that at least two babies in a group of 40 will have the same name.
The probability of name collisions has actually been on a decline since 1990. (Coincidently the world wide web was created in 90’s.)
Picking a name with low probability of collision
This is actually fairly easy to calculate. Let’s use the name Kaitlyn, the 100 most popular girls name for 2011, as an example.
In 2011, there were 2893 Kaitlyns born (roughly 0.15% of all girl baby births for that year). Let’s say Kaitlyn is going to go to school with just one other girl baby also born in 2011. If we pick that girl baby baby at random, she has a 0.15% chance of also being named Kaitlyn and 99.85% probability of having a different name. Let’s say Kaitlyn is going to go to school with two other girls. Each of those girls has a probability of 99.85% of not being named Kaitlyn. The probability of a name collision is one minus the probability of neither girl having the name Kaitlyn. Mathematically that’s expressed as
p(name_collision)= 1 – (1 – popularity_of_name)number_of_students
According to the National Center for Educational Statistics the average elementary school has 482 students. According to the US census data, 48.8% of babies born in 2011 are girls. That would mean there are 234 other girls in addition to our Kaitlyn. We compute the probability of a name collision as follows.
p(name_collision)= 1 – (1 – 0.15)234
p(name_collision) = 1 –
(1 – 0.15)234
p(name_collision) = 29.7%
Thus there is only a 29.7% probability that our Kaitlyn will go to a school with another Kaitlyn.
You would have to pick the 38th most popular girl’s name (Anna) before there’s a 50/50 chance that another child at the same elementary school having the same name. For boys you’d need to pick the 72 most popular name (Ian) to have a 50/50 chance at a name collision. What if you pick a name below the top 1000? Then there’s only a 3% chance of a name collision for girls and a 2.3% chance for boys!
Of course, this analysis is only considering a name collision with another child having the same spelling of the name. There could also be a Caitlyn, Katelynn, etc. According to NameNerds. There are an additional 6938 girls named with a spelling variants of Kaitlyn in 2011. Including these spelling variants, the chance of a name collision increases to 70%.
So what about spelling variants?
Using the new counts from NameNerds I found the following names are likely to have a name collision at an averaged sized elementary school. The probability of the name collision is in parentheses.
For Girls: Sophia (97.2%), Isabella (94.1%), Olivia (90.9%), Emma (90.1%), Chloe (88.0%), Emily (87.8%), Ava (86.0%), Abigail (84.6%), Madison (84.5%), Kaylee (81.1%), Zoey (81.0%), Mia (78.9%), Madelyn (78.5%), Addison (78.2%), Hailey (78.1%), Lily (77.3%), Aubrey (76.0%), Riley (75.6%), Aaliyah (74.9%), Layla (74.7%), Natalie (74.3%), Arianna (73.6%), Elizabeth (72.6%), Brooklyn (71.0%), Kaitlyn (69.9%), Ella (69.4%), Makayla (68.6%), Allison (68.1%), Mackenzie (67.4%), Peyton (67.2%), Kylie (67.2%), Brianna (66.3%), Lillian (65.4%), Avery (65.1%), Leah (64.4%), Maya (63.2%), Alyssa (62.8%), Amelia (62.8%), Gabriella (62.4%), Sarah (62.3%), Katherine (62.0%), Evelyn (61.8%), Jocelyn (61.7%), Grace (60.6%), Hannah (60.0%), Jasmine (59.8%), Samantha (59.4%), Alaina (59.3%), Anna (57.8%), Nevaeh (57.6%), Victoria (57.5%), Alexis (57.0%), Camila (56.3%), Savannah (56.1%), Charlotte (54.7%), Liliana (52.9%), Ashley (52.6%), Isabelle (52.0%), Kaelyn (51.4%), Lyla (51.3%), and Kayla (50.4%)
For Boys: Aiden (97.5%), Jayden (95.6%), Jacob (92.8%), Jackson (92.3%), Mason (91.9%), Kayden (89.3%), Michael (88.5%), William (87.7%), Ethan (87.5%), Noah (87.2%), Alexander (86.5%), Daniel (84.6%), Elijah (83.6%), Matthew (83.5%), Anthony (83.0%), Christopher (82.5%), Caleb (81.4%), Joshua (81.2%), Liam (80.9%), Brayden (80.2%), James (80.1%), Andrew (80.1%), David (79.9%), Benjamin (79.8%), Joseph (79.7%), Logan (79.7%), Christian (79.7%), Jonathan (78.4%), Gabriel (78.1%), Landon (77.7%), Nicholas (77.0%), Lucas (76.4%), Ryan (76.3%), John (74.9%), Samuel (74.8%), Dylan (74.7%), Isaac (74.1%), Cameron (74.0%), Nathan (73.0%), Connor (72.5%), Isaiah (71.1%), Gavin (68.5%), Carter (67.8%), Jordan (67.1%), Tyler (66.1%), Evan (65.6%), Luke (65.5%), Owen (63.9%), Aaron (63.8%), Julian (63.6%), Jeremiah (63.5%), Brandon (63.4%), Zachary (63.4%), Jack (63.0%), Colton (61.5%), Adrian (61.5%), Wyatt (61.0%), Dominic (60.3%), Angel (60.1%), Eli (59.6%), Austin (59.2%), Hunter (58.9%), Justin (58.5%), Henry (58.4%), Jason (58.2%), Robert (56.9%), Charles (56.9%), Sebastian (56.6%), Thomas (56.6%), Brian (56.4%), Eric (56.3%), Tristan (56.1%), Jose (56.0%), Kevin (55.8%), Chase (55.7%), Levi (55.6%), Josiah (54.2%), Bentley (54.1%), Grayson (54.0%), Giovanni (53.8%), Carson (53.5%), Xavier (52.8%), Ian (51.7%), Jace (51.5%) and Brody (50.0%)
Some obvious ones on the list, but there are definitely some surprises, including two of my three friend’s pick! In 2011, 33% of girls were named a variant of these 61 girls’ names and 46% of boys were named a variant of these boys’ names.
My intuition of what constitutes a common baby name was clearly off. I fell into the trap of thinking names that were common when I was young are still common, and names that were rare are still rare. Even if you made the same mistake I did, the good news is it’s less likely to have a name collision today than 17 years ago.
One last thought. Throughout this analysis there was an implicit assumption that names were rarer because parents were deliberately choosing rarer names. But there are other possible explanations. With an increase in globalization prospective parents get exposed to new names. In the past two years I’ve worked with more Nikhils than Johns, and more Yis that Matts. I know people who picked exotic names for the children purely because they loved the sound of the name, and not because of any ethnicity or ancestry reasons. Perhaps this trend to uniqueness was inevitable, whether intentional or not.
One question I have on my mind a lot lately, as I’m sure every pregnant woman starts asking, is “what are the odds of my baby coming today?” or “in the next couple of days?”. The trouble is, it’s really hard to find any kind of answer to that question online. Some babies come early, some come late. Any that come between 37 weeks and 42 are considered ‘right on time’. Well, the math nerd in me wasn’t satisfied with that answer.
I previously found this chart online, which uses a normal distribution of mean 40 weeks (or 280 days) and standard deviation of 10 days to estimate the probability of going into labor. Or
A skewed normal and normal distribution are very similar when you’re close to the middle (ie close to the due date.) The two distributions are less similar when you get further from the middle (ie further from the due date.) I was really interested in knowing how likely labor was TODAY, approximately 6 weeks before my due date, so the normal distribution wasn’t going to cut it.
I wanted to estimate a skewed distribution, but how to do that without any data? Fortunately spacefem.com cites several studies which indicates the true likelihood is approximately normal, so I need a skewed normal distribution that is close to
Five hours later…
I wanted to create my model using excel, rather than Matlab or R, two programs especially designed for statistics. I haven’t touched either in a while, and didn’t want to re-learn them. Excel has support for doing normal distributions, but nothing for skewed normal. That meant I had to implement the functions on my own, and my calculus skills are only slightly less rusty than my Matlab or R skills. At some point I probably should have given up and switched over to Matlab, but I was stubborn and determined to get it! It was a matter of pride.
In the end I came up with a skewed normal with location 295, scale 21 and shape -4. This distribution shows approximately 10% of babies will be premature, half of all pregnancies will be early while half will be late, and the squared error between the two distributions is less than
My model (blue) as compared to the normal distribution (red). I plotted them both assuming ’0 days’ as the due date instead of 280 to make it easier to read.
Interesting side note: while the model shows half of women go into labor before their due date, the day with the highest probability of spontaneous labor is 7 days after her due date, which matches conventional wisdom!
So what does this mean for me? Given that zippy isn’t here yet, I have a 0.1% chance of going into labor today and a 1.36% chance of going into labor in the next seven days! That’s 30 times higher than the prediction I was getting with the normal distribution! She also has a probability of 27% of being a Gemini.
Of course this is just an estimate, and all meant to be in good fun. Without data, my model is only a guesstimate. Nevertheless, my math nerd itch has been scratched.
You can try the tool out for yourself here.
A recently engaged friend and I were discussing engagement rings, and she was shocked when I told her we did not insure our ring. “But doesn’t it have sentimental value to you?” She asked. Why yes, yes it does. But you can’t insure sentimental value, you can only insure monetary value. The insurance company won’t break out a search party should you lose your ring, or help in the police investigation if your ring gets stolen. They will write you a check to buy a new one.
Mathematically speaking, insurance doesn’t always make sense.
Insurance for possessions is similar extended warranties. Back in 2007 I purchased a Wii for $250. The store offered me a $10 extended warranty. Most consoles will either fail right away (and be covered by the manufacturer’s warranty) or won’t fail for an extended period of time. Let’s say the store estimates the failure rate was 1 in a 100. Then for every 100 warranties sold, the store expects one customer’s Wii to break and to have to pay that customer $250. The store’s expected profit from selling 100 warranties is then $10×100 –$250×1, or $750. This expectation is called the expected value. When the store sells tens of thousands of warranties, the statistical property called the Law of Large Numbers shows it is unlikely for enough products to fail that the store loses money by paying out on the warranties. The store is a business, after all, not a charity and the goal of a business is revenue.
The expected value (EV) can be calculated for an individual warranty. This value represents the monetary worth of the warranty. The equation is: probability of failure x monetary value of failure + probability of no failure x monetary value of no failure
For the store:
EV(Store’s Value of the Warranty)
= 1%x(-$250+$10) + 99%*($10) = -$2.40 + $9.90 = $7.50
In the first expression, the term -$250+$10 is the payout minus the revenue gained from the sale of the warranty. The monetary value of no failure, in the second expression, is simply the revenue from the sale of the warranty. Thus the store expects to earn $7.50 per warranty sold. Not coincidently, it’s 1/100 of the expected value of selling 100 warranties.
We can also compute an individual consumer’s expected value of purchasing the warranty:
EV(Consumer’s Value of the Warranty)
= 1%x($250-$10) + 99%*($10) = $2.40 + -$9.90 = -$7.50
The store’s expected gain is the customer’s expected loss. Each dollar the customer loses is a dollar the store gains.
We can also compare the expected value of the consumer not purchasing the warranty. In this case, the consumer does not pay the $10 fee, but is out of luck and must pay an additional $250 to replace the console, should it break. The consumer’s expected value of no warranty is:
EV(Consumer Value of No Warranty)
= 1%x(-$250) + 99%*($0) = -$2.50
The expected value still negative, but the consumer’s expected value of not purchasing the warranty is less negative than the expected value of purchasing the warranty. Mathematically speaking, this means the consumer is expected to lose less money by not purchasing the warranty.
Engagement ring insurance works mostly the same way. A jewelry insurance company computes the probability that my ring will get lost, damaged or stolen. Many factors go into the calculation, including facts like the crime rate where I live, or whether or not I’ve ever reported a claim. The insurance company sets their rate accordingly, so their expected value is positive. In fact, they set their rate high enough that they can expect to be profitable and still pay their staffs wages, fixed costs for operating their business, and the women who file a claim on their rings. Again, they’re a business, not a charity. As before, the money the company is expected to make from me as a customer is equal to the money I am expected to lose.
This isn’t to say that insurance is never a good idea. In fact, it’s often a very good idea! Insurance and extended warranties are designed to provide protection for the worst case scenario, not the average case. Home, Auto, Health Insurance all have expensive worst case scenarios, beyond the financial capabilities of most people, which is why they are considered necessary. The worst case scenario when not purchasing the extended warranty is the Wii breaks, which at the time cost $250. If you can afford the worst case (ie purchase a new Wii), you are usually better off not insuring. This is sometimes referred to as self insuring meaning you are setting money aside and relying on yourself to cover the financial burden.
Should the worst case occur, and I lose my engagement ring, I will be okay financially speaking. Therefore, I expect to come out financially ahead by not insuring.
There was a really good article about the profitability of Etsy. Simply put, Etsy‘s profitability is dependent on the individual stores profitability. Creating a profitable Etsy store, however, is no easy feat. The biggest hurdle is determining how much to charge for a given item: too much and the shop keeper loses customers, yet too little and the shop keepers profits may not out way costs.
So how much should one charge? It’s easier to think in terms of profits and work backwards. A simplified formula for profits is
YearlyIncome = NumberOfSales * Price – Expenses
Solving for price, you get
Price = (YearlyIncome + Expenses)/NumberOfSales
You can substitute DesiredYearlyIncome to get a price target. To give a concrete example. Let’s assume that I’m striving to be one of those $30,000 a year etsy merchants as a jewelry maker, and I think I can reasonably sell 1,000 pieces a year. Then I will need to charge $30 above the materials cost for each piece I sell. This $30 is basically the cost of labor. But as Max Chafkin points out, “The vast majority of Etsy sellers are hobbyists who aren’t in it for the money and, consequently, end up charging rates for their labor that would make even a Walmart buyer blush.”.
With similar products being offered at barely above the materials costs, it’s difficult to raise the price too much without impacting sales. Why buy a necklace from me if someone is selling a comparable one for $30 less? As a result, the price I can charge is basically fixed. This means the only two variables in the equation left that can change are expenses, and number of sales.
Expenses can be difficult to change. If you’re already buying whole sale, or in bulk, you’re not going to find much wiggle room. You can also substitute cheaper supplies, but you run the risk of losing customers who want jewelry made of higher quality materials.
Increasing the number of sales is also not easy. In my example, I assumed 1,000 sales and needed to charge $30 per piece in labor costs. If I only want to charge $5 in labor costs, I would need to sell 6000 pieces. That’s roughly 16.4 sales a day. Some visitors to my store front will not purchase anything. Some visitors may be other shop owners or crafters looking for inspiration. In order to get enough potential customers to my store I will have devote time to advertising, time that won’t be spent crafting.
The way I see it, the best bet to be profitable is to reduce competition, primarily by changing the product you offer. One way to do this is by offering a product that few others can. Fill a niche. There may be many wire jewelery makers on etsy, but there are far fewer casters. There will always be crafters who will mimic cool designs they see, so you can’t just differentiate yourself by style alone. Another approach I’ve seen recently, is to take advantage of the fact that many of your shop visitors will themselves be crafters. A few etsy shops owners offer instructions for how to create their crafts for very small sums of money. There’s no materials cost, once the instructions have been created, so the $1-$2 is pure profit. It’s also a small enough sum of money that you’re unlikely to be greatly undercut.
Despite the difficulties, there are those who do manage a thriving business out of etsy. You can either look at the numbers Max Chafkin points out and either lament that only 1,000 etsy shops make 30k a year, or rejoice because 1,000 shops make 30k a year. If you’re of the latter camp, and are eager to give your business a try, there’s lots of advice about selling on etsy, including custom work to help maximize your chances of being successful.
If you’re curious, there’s also a consulting rule of thumb for determining your hourly wage that also applies to crafters doing custom work. The formula is:
(DesiredYearlyIncome + YearlyExpenses)/2000.
The 2000 comes from 40 hour work week, 50 weeks a year. Why not 52? Well you’ll need some (paid) time off, for sick days or even just to recharge. We all need a break some times.