Friday, February 14, 2014

IT'S A SNAKE

I have no insight or wisdom for you today. But I have something better: statistics!

I've examined the content of all posts to The Listserve since April 2012 - thanks to Simon Weber for the archive! I'm not sure whether this is the beginning - if anyone knows how old The Listserve is, or who is behind it, I'd love to know. Anyway, I spent a couple hours hacking on the dataset with Python, so let me hit you with some SCIENCE FACTS:

First, the average Listserve post has 384 words, and the median has 356. The longest of all time was a 1,950 word rant called "Volatile Software" in April, 2012. The shortest of all time was a 2 word post in August 2013, which simply said "Be kind."

Here's a list of the top 10 words appearing in the Listserve, and the number of times they've appeared:
the: 9215
to: 7654
i: 6820
and: 6742
a: 6208
of: 5041
in: 3720
you: 3623
that: 3076
it: 2795

Well, that's not very interesting. Let's filter it to words that are 4 letters or longer and I'll pick out the top nouns:
life: 874
people: 788
time: 709
love: 623
there: 591
things: 540
something: 432
world: 400
good: 378
years: 366

Next I tried to parse the top locations out of the end of the email. There will be a lot that were missed due to formatting (emails that put the location somewhere besides the end, or just formatted the location differently). Here are the top 30 by city:
New York: 24
San Francisco: 20
London: 14
Chicago: 12
Los Angeles: 11
Portland: 10
Washington: 9
Brooklyn: 9
Toronto: 8
Seattle: 6
Boston: 6
Sydney: 5
California: 5
New Jersey: 4
Auckland: 4
New York City: 4
Cambridge: 4
Philadelphia: 4
Canada: 3
Minneapolis: 3
New Orleans: 3
Pennsylvania: 3
Stockholm: 3
Austin: 3
Montreal: 3
USA: 3
Vancouver: 3
Baltimore: 3
United Kingdom: 3
NYC: 3

Thanks to everyone who has written to the Listserve - I love reading your stories and [The Listserve] emails are the only ones I look forward to checking every day. Now, here's a random selection of words which have been used only once on The Listserve:
antidisanthropomorphizationism
ohgeezohgeezohcrapohgeez
antidisestablishmentarianism
evenmorefuckedupistan
inconsequentialities
electromagneticism
conglomerations
synergistically
regurgitations
interstitial
ponchartrain
scherpenzeel
chunks
duluth
weiner
perked
genome
sketch
trendy
quinoa

If you'd like to see more of the results or the code I used, you can find it on my GitHub profile, if you search for my name.

And now I will end with the most common 5-word phrases, which apply to this email as well:
I would love to hear: 18
I'd love to hear from: 13
love to hear from you: 11
I have a lot of: 7
I'd love to hear about: 7
would love to hear from: 7
[...]
best water skier in Luxembourg: 3


Love,

Rob
roblourens[AT]gmail.com
Seattle, WA

No comments:

Post a Comment