Nathan Bransford, Author


Tuesday, December 21, 2010

Fun With Google Ngrams

First up, if you haven't left a comment on yesterday's post, please continue to do so! Every comment means $1.00 more for Heifer International, and please also consider helping out the needy by making your own pledge.

Transition.

Ever wonder how popular a word or phrase or person or thing is over time? Well, wonder no more. Google recently released an incredible new tool that searches across millions of books published in the last 200 years and graphs a word or phrase's popularity of time.

It is called ngram, and it is awesome.

Want to see how it works?

Here's vampires (blue) vs. werewolves (red) vs. zombies (green):


Want to track inventions?

Here's telegraph (blue) vs. telephone (red) vs. Internet (green):


Want to see if your slang matches the time?

Here's mama (blue) vs. papa (red) vs. mom (green) vs. dad (yellow):


Authors?

Here's Faulkner (blue) vs. Fitzgerald (red) vs. Hemingway (red):


Preparation of the potato?

Here's baked potatoes (blue) vs. mashed potatoes (red) vs. french fries (green) vs. potato chips (yellow):


Honestly, I could go on like this all day. And if you want to play along, link to your favorite ngrams in the comment section!

Google ngrams






53 comments:

crow productions said...

I'd wonder about "actually" and "awesome".

Jessica said...

Now I have to try it simply for the word "like" and phrase "you know"

JohnO said...

I saw a post by a romance blogger who had tracked bosom vs. chest. Chest had swelled in recent years, while bosom ... ah well.

Anonymous said...

"Heaven" descending; "Hell" holding steady.

Alec said...

I've been using Google Ngrams to get a rough sense of how the reputations of certain writers (John Updike, Toni Morrison, Stephen King, Jacqueline Susann, and more) have risen and fallen over time. Some of the results are pretty surprising. If you're curious, I've been blogging about it here and here.

Nicole L Rivera said...

Heres the link for "write" vs. "text" vs "blog" vs "tweet":http://ngrams.googlelabs.com/graph?content=write%2Ctext%2Cblog%2Ctweet&year_start=1800&year_end=2008&corpus=0&smoothing=3

Munk Davis said...

Thanks Nathan, I hadn't seen this and love this kind of thing.

Matthew Rush said...

Oh my god that looks awesome. Could it be even more fun than analytics? I'm off to find out ...

... oh but before I go, why does that one graph make it look like the internet started getting slightly less popular recently? Am I trippin?

Brooke Johnson said...

looks like 1920 was a good year for potatoes.

D.G. Hudson said...

I checked mysteries, science fiction and thrillers and found that mysteries topped the charts by a WIDE margin. Interesting. Science fiction picked up in the sixties, and thrillers is just starting to increase. Sounds about right.

Thanks Nathan, ngrams are a little something for the wordsmiths.

Mira said...

Wait a minute. It's Tuesday, and Nathan is posting?!!

Okay, now you're just messing with my head, Mr. Bransford.

But I'll take it!! Yay! :)

So, Ngrams. This is the most awesome thing I've ever seen in my entire life. How do they think of these things? Obviously I'm going to have to quit my job, because I need to devote my life to playing with this new internet toy 24-7.

Speaking of which, I'm off to the Ngram site. Thanks for the early Christmas present, Nathan. :)

David said...

It's hard to foresee the impact of this on various kinds of cultural and historical studies, but I bet it will be immense.

As more and more books are published originally in digital form, the power of this technique will grow.

Anonymous said...

At first I thought, too much info.

But, if you've ever had a dispute with an editor over a word or phrase, this sort of thing can come in handy.

Sarah said...

Vampire vs. Vampyre:

http://ngrams.googlelabs.com/graph?content=vampire%2C+vampyre&year_start=1800&year_end=2000&corpus=0&smoothing=3

Sierra McConnell said...

The Nephilim are on the rise...

http://ngrams.googlelabs.com/graph?content=Nephilim&year_start=1800&year_end=2000&corpus=0&smoothing=3

>:3

The Red Angel said...

Haha, this is awesome! And very interesting. I love the vampires v.s. werewolves one...

~TRA

http://xtheredangelx.blogspot.com

JES said...

Here's violin vs. guitar, 1920-2000. I was kinda surprised to see they didn't cross until almost 1990!

Heidi said...

Elves seem to be a popular 19th century topic, above fairies and unicorns: http://tinyurl.com/3ychel3

and imps spiked in 1860 way above trolls and goblins: http://tinyurl.com/2c2hqjp

Who knew?

Ghenet said...

This is cool! Thanks for sharing. :)

swampfox said...

I still say vampires are supposed to DIE in the sunlight!

Even Anne Rice had that right!

Alan Jones said...

Life is getting easier http://ngrams.googlelabs.com/graph?content=laborious&year_start=1800&year_end=2000&corpus=0&smoothing=3

Anonymous said...

Although improved from earlier attempts by Google to do this, the Ngrams database is still considered too statistically flawed to use for serious research. For instance, even though the Ngram database includes sources as far back as 1500, anything earlier than 1800 is considered unreliable as there were too few books to provide enough statistical power. In The New York Times: Five-Million-Book Google Database Gets a Workout - and Debate - in Its First Days. Ngrams might be a fun distraction, but serious writers and scholars are warning against using it for any more than that.

Anonymous said...

Oh, goody! Ngrams provides another source of flawed information in the modern world. But who cares, right? Who wants to be an informed writer or reader? Distractions are so much more fun ... and so much easier than doing rigorous research!

Philip Isles said...

Currenlty doign a writeup for my blog about combining Wordle and Ngrams. Basically: pump your manuscript through Wordle, then do an NGrams for your most commonly reoccurring words: http://ngrams.googlelabs.com/graph?content=back%2Cturn%2Clook%2Clooked%2Caround%2Clike%2Cthought%2Ceven%2Cknow%2Cjust%2Csomething%2Csee&year_start=1930&year_end=2000&corpus=4&smoothing=0. You can now assess whether your reoccurring words naturally occur more than others, and therefore zone in on what words you really are using too much as opposed to those which just naturally reoccur more than others in the English language.

Nathan Bransford said...

anon@2:11pm-

Ha - better to spend one's time writing snarky comments, I assume?

Doth not a dull blade still cut if one isn't concerned about exactness?

Mira said...

Anon 2:11: "A little nonsense, now and then, is relished by the wisest men" Roald Dahl.

So, did you know you could do just one word? Here's the graph for 'awesome':

http://ngrams.googlelabs.com/graph?content=awesome&year_start=1800&year_end=2000&corpus=0&smoothing=3

That's... awesome. :)

Mira said...

Nathan - ha! we're posting in sync. :)

Anonymous said...

Nathan,

Believe it or not, some people in the publishing world are still interested in correct and valid information. Some people feel that's important. If that's snarky, so be it.

Nathan Bransford said...

anon-

It was a matter of style rather than content. But I was snarky back so my leg is not much to stand on. All in good fun.

Anonymous said...

Mira, that's fine, if it's promoted as nonsense, rather than knowledge!

Mira said...

Anon - I think it's being promoted as an awesome 'item of interest.' Which it is.

Although, down the road, it will, most likely, eventually be fiddled with until it's an amazing and accurate research tool. The possiblities are exciting.

Well, I'm off, so I'll just leave you with this Christmas Quote:

"God Bless us, everyone"

- Tiny Tim

Have a happy.

Anonymous said...

Nathan, LOL. Yes, snarky comment back to snarky comment does not undo snark.

Between the drivel on the news and all the incorrect information on the Internet, I seriously feel that snarky intelligent commenting is important. You used to have a lot of intensely intellectual commenters here, with incredibly indepth discussions. I guess that's not the type of discussion that's welcomed here anymore. I apologize for not realizing that sooner. Enjoy the Ngrams. Have a great holiday!

- Anon @2:11 PM

Nathan Bransford said...

anon-

All are welcome, I'm not upset or anything and didn't delete your comment. But I humbly suggest if deep intelligent discussion is what you're after I'd consider leading by example on that. Hope you have a great holiday as well.

Anonymous said...

Mira, that's not how Google announced its N-gram program: here. They announced it as a serious research tool. Enough said. Obviously, the discussion here was meant to be silly, not to discuss the actual N-gram program. My mistake. I apologize. Have a good holiday!

Anonymous said...

It turns out the Beatles never held a candle to Jesus.

Kristin Laughtin said...

Whoa, that sounds like it could be addicting, but awesomely fun. I'm surprised by potato chips vs. french fries!

J. T. Shea said...

Stop gloating, you bloodsuckers and rotters! We werewolves know there are lies, damned lies, and statistics. We'll be back. Just like telegraphs and papa and Hemingway (also called 'Papa') and baked potatoes. Just you wait for the next full moon.

Mira, my only regret is that I have but one life to give for the Internet. We'll have to get ourselves cloned, like Nathan. (You don't really believe there's just one of him, do you?)

Philip Isles, what's wrong with using a word more it 'naturally' reoccurs in the English language? Why write anything if you just want to write the same as anybody else?

Anonymous 2:11 pm, you're right, distractions ARE so much more fun! What do you think we're all doing here? On the blog of someone whose forums have a whole section devoted to distractions? BTW, those intensely intellectual commenters were all eaten by werewolves, with baked potatoes, fava beans, and a nice Chianti.

Nathan, yes, a dull blade doth indeed still cut if one isn't concerned about exactness, but a spoon would be even better, as the Sheriff of Nottingham once said. Ok, it was really Alan Rickman pretending to be the Sheriff of Nottingham in Kevin Costner's Robin Hood movie. My research isn't very exact.

dana said...

This has nothing to to with an engram.

What does a person do when they have two good queries...but "thinks" one is better - and everyone else says "NO!" What criteria do I use to separate the wheat from the shaff?

I doubting myself now.

Laura said...

could anyone explain the use of 'internet' around 1900?

Anonymous said...

Man vs. Woman
http://ngrams.googlelabs.com/graph?content=woman%2C+man&year_start=1800&year_end=2000&corpus=0&smoothing=3

Nathan Bransford said...

Laura-

Google explains here

Kristi Helvig said...

I'm a little disappointed to find there is no data for space monkeys in Ngram. Other than that, it's pretty fun. :)

Anonymous said...

Oh for cripes' sake, mass culture proving mass culture is a runaway supertanker? Another duh-huh moment in the annals of human accomplishment.

Watcher55 said...

MS Word corrected me for using mankind instead of humanity. Interesting result; mankind starts out way ahead then they run neck and neck from about the turn of the century then switch around 1980. Social evolution and the advent of PC perhaps?

Lauren said...

Did usa vs. america. LARGE spike of USA somewhere in the '80s. Wish Google went past 2000, it'd be interesting to see if it went back up again after 2001.

Lauren said...

Oh, silly me, it does! And nope, no spike again. Curiouser and curiouser.

Anonymous said...

Ngrams is too much fun. “OMG” shows a huge spike (maybe due to rising popularity of YA and MG texts?), “heart” and “soul” are taking a parallel dive, and “Merry Christmas,” though showing a brief drop recently, is rising again and has consistently reigned over “Happy Holidays.”

~Merry Christmas and Happy Holidays to All~

Teralyn Rose Pilgrim said...

I've argued with many English collegues over whether "irrgardless" is a word (or should be). When I compared it to "regardless," the word "irregardless" didn't even show up. It's official: no one says it, it doesn't make sense, it is not a word. That was fun.

Jordan Summers said...

I'm genuinely surprised that zombies have surpassed werewolves. Scratches head.

Peter Dudley said...

This is definitely fun and has some legitimate uses. But before you use it for serious research, make sure you know what you're getting.

For example: My understanding is that each text was included once, and that there was no weighting for popularity. Thus, all the words in Shogun would be counted equally to all the words in OJ Simpson: Football's Record Rusher (both published in 1975).

Similarly, take the Harry Potter series. I know a ton of Americans who have started using the term "snogging" in the past few years. Presumably, the data in ngram will work itself out over time as "snogging" appears in more books in the future. But it's hard to say that the data are really clean when Harry Potter and a book that sells 300 copies are considered equal, from a cultural phenomenon standpoint. At the very least, we should step back and say the data are good only for big-picture trending.

Another example: Compare basketball, baseball, and football. Now throw in soccer. Oops! Much of the world refers to soccer as football. Can't be sure exactly what we're looking at.

Still, it's a lot of fun and in some ways instructional. And a great way to avoid working on that WIP.

David said...

Those are excellent points, Peter.

Dave said...

Here's a Facebook page we made to share interesting ngrams:

http://www.facebook.com/nteresting.ngrams

There's some really cool ones there - check them out!

Philip Isles said...

Combining Wordle.net with N-Grams, you can figure out which words in your manuscripts need to be cut, and which are common enough in the rest of historical literature to ignore.

The Writeup is here: http://philipisles.blogspot.com/2011/01/combining-ngrams-and-worldle.html

Related Posts with Thumbnails