Archive for the ‘Internet’ Category
A while back I discovered that someone was trying to pass off my photos of Nicki as photos of her own child. I immediatly turned to Google reverse image search to see if anyone else was using my photos without permission. The process was so slow and tedious to enter the URL of each image from my blog that I gave up after checking just a few. There had to be a better way.
I know I’m not the first blogger who has had images of her child appear elsewhere. I’m not even the first blogger in my twitter stream this year that this has happened to. Internet strangers doing inappropriate things with baby photos is something that keeps some of bloggers up at night, and makes some of us hang up our blogging hat altogether. To make it easier to detect this sort of thing, I wrote a Duplicate Image Search utility script.
Simply give the script the URL of a webpage you want to check. The script finds all images on the page and displays them. The images are hotlinked, meaning I am not caching them or saving them to my server. My script is basically just a proxy service. Clicking on an image will open Google’s reverse image search and you can verify that only authorized sites are the ones displaying your images.
I had bloggers in particular in mind when creating this utility script. In order to make it more useful I attempt to atomically parse the page looking for a “next” button. You could start with your main index page, and slowly comb through your entire blog.
The script isn’t perfect. I wanted to parse out the search results and just return the web address of any unusual domains that might be using your photos without permissions. Alas the only APIs I could find that would let me do this are prohibitively expensive. If I get enough interest in this script that I can amortize the cost, I’ll consider making the improvements in the future. In the mean time, I hope it helps you stop anyone from using your photos without your permission.
I feel a bit like a grumpy ole curmudgeon for doing this, but I put up an official copyright notice. I really hate to do that, but after finding a second person using my baby photos of Nicki in as many days, I felt I had to do something. At least this time I believe it was an honest mistake, and not another person pretending photos of Nicki are photos of her child.
As I was writing the copyright notice, I kept thinking about when I first got online, and my very first websites. Like most of my peers, I would occasional use an image I didn’t own the rights to. I wouldn’t use any photos with an obvious copyright, but the non-obvious ones? Sure, I’ve done that. I had an inkling it was wrong, but I figured I wasn’t harming anyone. I wasn’t profiting off it, and everyone does it. I figured if I was caught I could just say “I didn’t know it was copyrighted” (half true) or “my friend sent it to me to use, I thought it was hers” (a total copout.) The ultimate irony: I’ve heard both excuses from people using my images without permission.
It’s easy to make mistakes. There’s a misconception that a link back to the original source of a photo counts as fair use, or that just because a website grants Google permission to store and show its images the same privilege extends to anyone using Google. That’s like saying if you credit George R. R. Martin you can post the full contents of A Song of Ice and Fire on your website, or because George R.R. Martin granted HBO the rights to produce Game of Thrones, and you subscribe to HBO, you can make a Game of Thrones as well. Copyright law does not work that way.
But as difficult as it was to write my copyright notice, there is some good that come from it.
Good for you: Less ambiguity. Some people have pointed out possible copyright issues with pinning on pinterest. In the past I’ve implied I’m ok with pinterest, now I’ve explicitly stated it. I’ve also decided to reserve only some rights, not all! That means I’ve given permission to post my images/photos on your website/blog under certain circumstances. When in doubt, you can always ask. I’ll probably be so tickled pink that you want to use my work that I’ll say yes.
Good for me: Consistency. I agonized over what to do with this latest copyright violation. I started filling out a DMCA take down request when I started having flash backs to my younger self. How would I have felt had my webhost informed me that I was in violation of someone else’s copyright and they had temporary suspended my website as a result? Make no mistake, the copyright owner and my webhost would have been well within their collective rights to do so (as would I in this case.) I truly feel most people are honest, just unaware. My policy gives everyone 3 days to respond when I email them about a copyright violation. I understand that someone might be on vacation, or travel, or otherwise occupied, so I’m just looking for a response in that time that implies their taking my request seriously. Absent that, or a way to contact them, I’m afraid I will have no choice but to follow through with the DMCA take down request. Seems fair, right? I feel less bad about filing the complaint if it’s a uniform policy I apply to everyone.
Will the new policy be effective? Probably not. I suspect in both cases the copyright violator found the photos using Google image search, and never visited my blog. If they don’t visit my blog, they won’t see my copyright notice. Regardless, I will feel better about taking action.
Someone asked me why not block Google from archiving my images. For the most part Google is my friend. I actually get a fair amount of traffic to my blog through Google image search. I want potential readers to be able to find me. Still, it’s a valid point. Copyright laws are only work in countries that agree enforce them, and I can’t stop someone if I don’t know their using my images. The only way to truly prevent copyright violations is to not let anyone have access to the photos in the first place.
Since I still intend to blog, the most effective strategy I can think of is to post small images – big enough to display in a blog post, but too small to be useful elsewhere. Actually, that’s not true. The best strategy would be to post terrible photos that no one wants to steal, but that’s not a path I want to go down. At least not intentionally.
Updated 4/13: The second person violating my copyright has voluntarily removed my content when it was pointed out to her. I did not have to file a DMCA take down request.
Yesterday evening I noticed a large bump in traffic forum on topix.com. Someone had posted a link to my blog on a public forum there which was generating a ton of traffic. This happens from time to time (normally it’s a link to my labor predictor) but this time it was to Newborn Photography page. I was so excited to see what strangers thought of my photography. External validation, how I love you.
Alas, it was not what I first thought.
The forum thread was dedicating to ‘outing’ one of their members. Apparently she had lifted some photos of babies online, one (some?) of which was mine. One of the forum members posted a link to my newborn photography page, as well as four other websites, as proof.
I was confused, to say the least. From what I could tell this person, “crazy mom”, had collected a number of photos and posted them to her facebook page. My guess is she did a google image search and downloaded a few photos she liked (so I guess I got that external validation afterall?). Crazy mom probably never visited my blog. I don’t even think all the baby pictures were the same gender. The only thing they had in common was each baby appeared to have been born with a full head of hair. I think she was claiming she had a boy. About thirty minutes into my investigation, the forum post was deleted. I never did see which photo she was pretending was hers.
Now, I’ve been online a long time – back before the days of Google, back when being a webmaster meant singling static HTML code onto geocities. Back then I built a fantasy website under the pen name ‘Aella Lei’. At the time I was into Greek mythology, and picked the name after one of Amazons Hercules fought during his 12 labors. It turns out that ‘Aella’ is also a french name and one day someone of that name happened upon my website. This Aella took a liking to my website, and wanted credit for it. Her theory was that since I was using the name as a pen name, and it was her given name, I owed her at least partial credit. I disagreed and ignored her.
Back then I was using a internet messaging client called ICQ. ICQ is similar to Yahoo Messenger and AOL Instant messenger except everyone had a unique number id that identifies them, rather than a unique screen name. That way there could be many ‘Aella’s. Aella set her screen name to ‘Aella Lei’, her profile website to my website, and for her bio she lifted sentences straight from the “about me” page on my website. I know this because one day she forgot she wasn’t me, and messaged me thinking I was the imposter.
Compared to Aella, this crazy mom’s brand of crazy is pretty tame. I found no evidence to suggest crazy mom was obsessed with Nicki. Still, this experience does serve to remind me that there are people out there who will fixate on individuals. Nicki is too young to have an opinion to have an opinion on how much information about her is online. It’s my job to make sure she’s protected. Perhaps it’s time to rethink how much I share online.
If by chance you happened to stumble onto my blog from topix.com, and you know who this person is who stole my photos, I’d really appreciate it if you could fill me in.
In preparation for my re-entering the work force (shameless self plug – I’m job seeking!) I closed down some of my old legacy sites that I’ve been keeping around for posterity. One of which was my consulting site, bluecentauri.com. Even though I haven’t updated it in years, the tools were still in use, especially the Writing Sample Readability Analyzer. So I moved it to my resume website, sarahktyler.com.
Since I’m getting a ton of emails about it lately, I thought I’d answer some of the frequently asked questions.
How does your analyzer work?
The analyzer uses the Flesch, Fog and Flesch-Kincaid metrics to predict reading ease. Each approximation makes the basic assumption that longer sentences are harder to read than shorter sentences, and words with more syllables are harder to read than words with less syllables. Although the underlying principle is the same, each metric is calculated slightly differently.
FleschScore = 206.835 − (1.015 × AverageSentenceLength) − (84.6 × AverageNumSyllablesPerWord)
FogScale = 0.4 x (AverageSentenceLength + PercentageOfWordsWithThreeOrMoreSyllables)
FleschKincadeScore = 0.38 x AverageSentenceLength + 11.8 x AverageNumSyllablesPerWord – 15.59
I’ve used your analyzer and another analyzer and gotten different results, why is that?
The Flesch, Fog, and Flesch-Kincaid are well defined metrics. If another system is reporting a different score for the same metric, then the input variables (either number of sentences, or number of syllables per word) must be calculated differently.
It is surprisingly not as straight forward to calculate sentence boundaries as it seems. As humans, we can identify when a sentence ends pretty easily. Since the computer can’t really parse or understand the sentence**, it can only make an educated guess based on clues like punctuation and capitalization. But not all punctuation (think abbreviations) end sentences, and not all sentences are ended with punctuation. This is especially true online were sentences are often not well formed.
The same goes for computing the number of syllables in a word. It may seem simple to just create a list of syllables per word, but language is infinite and constantly evolving. Such a list is not possible. Pronunciation (and the number of syllables) can differ in different parts of the world. Additionally, heteronyms words that have the same spelling, but different pronunciation, can have different number of syllables. The word ‘learned’ as the past tense of the verb ‘to learn’ is one syllable, but ‘learned’ the adjective to describe someone with scholastic achievement is two. Any method to calculate the number syllables per word will involve some heuristics.
The differences in calculating sentence length and the number of syllables will tend to be more noticeable on shorter samples, rather than longer. Even so, while there may be differences between different analyzers, the differences should be relatively small.
** There is an active area of research in natural language processing which tries to automatically parse and understand sentences.
Which analyzer is the most ‘accurate’?
There are two types of ‘accurate’ we can consider: which analyzer comes closer to the true Flesch, Fog and Flesch-Kincaide metrics, and which one better predicts reading ease. Keep in mind that each metric is just a heuristic based on an assumption that is often true, but not always. For example ‘kiln‘ is one syllable, but harder than the simple, three syllable word ‘together’. Depending on the kind of text you are analyzing, you may find one method or score works better for your application than another.
Let’s consider two Analyzers, one with a very good sentence boundary detection, Analyzer A, and one with a very good syllable per word calculator, Analyzer B. If you were analyzing writing samples from elementary school children, you may prefer A. That’s because young children may not write grammatically correct sentences and typically don’t have a rich vocabulary, so a more complex syllable per word calculator wouldn’t buy you much whereas a better sentence boundary detector may be necessary. On the other hand, if you were analyzing scientific journal articles, you may prefer B.
My suggestion is to use both analyzers to get a feel of which one is better for you and your task.
Will you share the code?
I have in the past, but only for extra special cases.
A few months ago one of my mom’s groups discussed technology. It started as a conversation of elementary kids and cell phones, which everyone seemed to be against, but quickly grew to other forms of technology like personal computers and tablets. I was shocked to learn not a single one would allow their child to have a personal computer in their room prior to high school. Not a one. Many wouldn’t even allow it then. My fellow moms wanted to protect their kids from the Big Bad out there on the internet. At a time when even some elementary schools are considering giving their students pads as a learning device, this kind of concern seems short sighted.
The Internet has become the great equalizer. Don’t have access to the best schools? You can take classes online like Khadijah Niazi, the eleven year old Pakistani girl who was the youngest ever to pass an online college-level physics class. She learned the material by watching youtube videos at the same time the Pakistani government decided to block youtube.
The bar for achievement keeps claiming higher each year. The kids at the intel science completion writing computer simulators to solve complex mathematical problems at 16 and 17. Like it or not, this is the level of competition. You can’t write a simulator at 16 if you’re just learning to use the computer at fifteen.
A few days ago I took a photo of Nicki looking at my dad’s tablet. On the surface this seems in direct conflict with AAP recommendation of no screen time for children under 2. If you read the press release they’re largely talking about TV watching. Studies have failed to prove “educational” toddler and infant programs have any real benefit. Even ambient TV can distract a parent from engaging in a child. Of course a TV as a baby sitter isn’t good long term, but that’s not what’s what we’re doing. Nicki is learning. Am I naive enough to think Nicki is learning shapes from the iPad app she’s looking at? Of course not. She is learning that she can interact with the iPad, that it reacts to her touches. We don’t do it every day, or even ever week, but over time she’ll learn how to control the tablet. When she’s old enough to actually benefit from the educational programs, she will have already mastered the tool. That’s not a bad thing in my book.
This isn’t to say you can’t be successful without growing up with a computer. Of course you can. And it’s not to say there aren’t scary things and scary people on the Internet. Of course there are. Technology is a tool that can be used for good and bad. I personally feel the benefits out way the risks.
I have discovered a not cool interaction between my blogging software (wordpress), smart phone (iPhone) and camera (Nikon DSLR).
The Nikon, like most cameras, has an internal leveler, and can tell which way the camera is oriented to take a portrait rather than a landscape image. Along with other bits of metadata, the camera sets an orientation flag for each image. Think of the metadata as additional information about the file. The actual data in the image file, however, is still stored as a landscape, since that is what the image sensor ‘sees’. It’s like if you tilt your head and look at a glass of water. The glass appears like it’s on its side to you, but in your mind you know your head is tilted, not the glass, which is why the water isn’t spilling out.
My computer, as do most computers, renders the image according to the data in the image file. As a result, the image appears as a landscape, regardless of how the camera was oriented. I then use image editing software to rotate the image (effectively re-arranging the pixel data). The orientation flag in the metadata remains unchanged. WordPress also ignores the orientation flag. So the uploaded image appears on my blog the same way it appears on my computer.
The iPhone, however, tries to be smart. Since the orientation flag is still set, it assumes the image needs to be rotated again to display correctly. As a result, the image appears rotated an extra 90 degrees on my iPhone. But only on the iPhone, so I didn’t discover the problem until recently!
The only way to fix it is to strip the metadata so the orientation flag isn’t set, but that means going back over all my past entries and uploading a new photo for all the crooked ones. Not Cool.
To be honest, I’m surprised there isn’t an easier way to strip the image metadata. Aside from the orientation issue, metadata can include GPS location information. It’s handy for figuring out what your photos are of years after the fact, but if you upload an image with geographical information, someone can figure out where you’ve been. You can view the geolocation data in images online for yourself.
Metadata does have it’s uses. Some photographers like to store copyright information in the metadata. Camera manufactures and image processing software also like to add their mark to the metadata, as a form of free advertising for anyone looking to see how an image was created. For me, though, I wish there was an easy way to get ride of it.