Xamarin is Zipfian
Michael, from the popular YouTube channel VSauce, recently published a video titled The Zipf Mystery, wherein he talks about several aspects of Zipf's Law:
Zipf's law states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. Thus the most frequent word will occur approximately twice as often as the second most frequent word, three times as often as the third most frequent word, etc.
It was a pretty interesting video, and I realized I have access to a large repository of words in the form of Xamarin's Developer Portal. So I wondered, will the contents of Xamarin's documentation [1] adhere to Zipf's Law?
As I had recently been learning the R programming language, I figured doing a bit of analysis would be some good practice. To start, I knew that I would need a way to tokenize each piece of content into individual words. The following function simply takes a string, and breaks it into a vector of individual words [2]:
tokenize <- function(str) {
str <- gsub("[^[:alnum:]'’]", " ", tolower(str))
str <- strsplit(str, " ")
}
Next is a function that would tokenize an entire file:
getwords <- function(path) {
lines <- readLines(path)
tokenized <- sapply(lines, tokenize)
unlisted <- unlist(tokenized, recursive=T)
words <- unlisted[unlisted != ""]
return (words)
}
Finally, we just get a list of all markdown files in a given directory, apply the getwords function, and use the table and sort functions to count the number of individual words, and sort them accordingly.
files <- list.files("~/dev/xamarin/documentation", "index.md", recursive=T, full.names=T)
allwords <- unlist(sapply(files, getwords), recursive=T)
t <- table(allwords)
tsorted <- sort(t, decreasing=T)
Lastly, we produce a log plot of the data:
plot(tsorted,log = "xy")

The word usage in our documentation (as of this post) perfectly follows the power-law distribution described in Zipf's Law! Ok, so what else could I pull from this data ... well, according to VSauce, the top 5 most commonly used english words are: "the," "of," "and," "to," "a,". And in our corpus?
> head(tsorted)
allwords
the a to and in of
65009 30381 29563 18613 16440 12840
Stripping out all the "normal" words like "the" and "and" to find the first domain-specific word gives us a bit of a conundrum ... starting at rank # 8 we have the following three words:
for class xamarin
10427 10216 10079
"for" doesn't appear in the top 20 words as listed by VSauce, so either that's so highly ranked because it is also a programming keyword (ie. for loops), or the first domain-specific word is class followed closely by Xamarin :)
Anyways, this was just a bit of fun ... if you have any other ideas for different insights that could be gleaned, feel free to let me know on twitter!
[1]: technically, only conceptual documentation, recipes, and some release notes (as some older ones are still in HTML format). doesn't include samples and api documentation for simplicity's sake.
[2]: yes, this tokenize function could probably stand to be a bit more sophisticated. But some spot-testing showed that it was roughly good enough for this light analysis :)
Archives
2015
-
Bootstrapping Community (April)
Ten years of the Orlando CodeCamp! That was the message that I was priviledged enough to deliver as the first keynote speaker at the 10th Annual Orlando CodeCamp...
2014
-
Better Know a Xamarin - Joel Martinez (October)
I had a short interview with James Montemagno at a company summit earlier this year :) Thanks James!
-
F# And Functional Programming (October)
I had the privilege to co-present a Session at Xamarin Evolve 2014 with the incomparable Larry O'Brien. A video of the talk should be online soon at: https://evolve.xamarin.com. In the session, ...
2013
-
How Work is Changing (September)
As a Software Engineer/Programmer/Developer/NinjaPirateWhateverHipstersAreCallingThemselvesTheseDays, it’s hard not to see the slowly changing tide of work culture in our particular industry....
-
Xamarin (August)
I'm incredibly excited to announce that I've joined Xamarin! When I wrote my book, I remarked to a few friends at the time that it was basically a love letter to C#; a language that is powerful, ...
- Exceptionally Simple Writing Tips (August)
- Apps for Your Data (August)
- Static Site Generator (July)
- Latest Open Source Projects (June)
2012
- C# 5 First Look (December)
- Introducing Viewer for Khan Academy (Windows 8) (September)
- Universal Subtitles C# API Wrapper (August)
- Back in Orlando (July)
- Twilio Request Parameters in ASP.NET MVC (July)
- Multi-Armed Bandit in C# (May)
- DarkSky API Wrapper for C# (May)
- The Problem with C# 5's async/await Pattern (May)
- Twilio-CSharp for MonoTouch and Android (May)
- Parse an iOS plist on Android (May)
- GoogleAnalyticsTracker for Windows Phone (February)
2011
- JavaScript Engine for Windows Phone (October)
- AI and Machine Learning (October)
- R.I.P. Steve Jobs (October)
- Calorie Count @ NYTM (September)
- Conway's Game of Life in C# (September)
- The Droids I’m Looking For (August)
- SteamBirds for Windows Phone (August)
- Udder Chaos for Windows Phone (July)
- SequentialActionQueue in C# (July)
- Exec-Sql PowerShell Function (July)
- Minor PowerShell Prompt Customization (July)
- MS Tech-Ed 2011/Udder Chaos on XBLIG (May)
- Udder Chaos in Peer Review (April)
- Lego/Snap-Circuits Mashup (April)
- WebHelper for Desktop CLR (April)
- Udder Chaos for XBox Live Indie Games in PlayTest (March)
- Lemonade Stand for WP7 (March)
- Khan Academy for WP7 Review (March)
- Lmnd.st for wp7? (March)
- Game Hack Day (February)
- P2P Lending Data (February)
2010
- Determining "place" Location by Averaging User Data (December)
- Khan Academy for Windows Phone 7 is Live! (November)
- Introducing Khan Academy for Windows Phone 7 (November)
- Reading LendingClub Data in C# (September)
- Windows Phone 7 WebHelper (September)
- XNA for the EveryDay Developer (August)
- Back in the Saddle (August)
- Moving to New York! (June)
- ASP.NET MVC Charts (June)
- Smooth Control with Touch (April)
- Enslaving Twitter (March)
- XNA Slides from Orlando CodeCamp 2010 (March)
- Windows Phone 7 Game Archetypes (March)
- Approximating Touch Points (March)
- Windows Phone 7 Flashlight (March)
- Get Googling Orlando! (March)
- Of Choppers, Physics, and Challenge (March)
- Acer t230h Review (March)
- Reusing PHPBB's Authentication System (February)
- Which MultiTouch Monitor To Buy? (February)
- Steam and XNA Redux (February)
- Will Steam Publish XNA Games? (February)
- State of Multitouch with XNA (February)
- Custom Transactions (February)
- Scurvy.Test v1.2 Released (January)
- ScurvyTest v.next under way (January)
- Resource Pool in F# (January)
- XNA Resource Pool (January)
2009
- The Next Decade in Software (December)
- Cleaning up after MSTest (December)
- Snap-Circuits: Review (December)
- Semi-Literate Programming with C# (December)
- Qizmt: MapReduce Framework in C# (November)
- Executing PowerShell Scripts via C# (November)
- Static Access to Request-Specific Data (November)
- Simple Pipeline Event model with C# (August)
- IServiceProvider Extension Method (August)
- Blogging from the iPhone (August)
- Peeking Under the Hood: Unit Tests (July)
- Cross-Platform Javascript WebWorker (July)
- On Personal Productivity (July)
- MVC Pattern with Javascript (June)
- Avoid Caching of Ajax Requests (May)
- VSClean Command Line Tool (May)
- Bayesian Filtering with C# (May)
- Computing for Children (May)
- Twitter Search via C# (April)
- Innovation and Startups in Orlando, FL (April)
- XNA GS 3.1 and Scurvy.Media (March)
- MapReduce in C# using LINQ (February)
- InstantRails: First Impression (February)
- Converting a Lifecam VX-5000 to see IR (February)
- ONETUG XNA Presentation Files (February)
- Scurvy.Media Now Supports XNA GS 3.0 (January)
2008
- It's almost that time of year (December)
- Using the LateBinder (December)
- New LateBinder (December)
- Handy LinQ Extension Methods (December)
- Finite State Machine (November)
- In LA for PDC Next Week (October)
- Microsoft Should Buy Pandora (October)
- Scurvy.Media Question (September)
- Self-Improvement Through Creation (September)
- May Their Stack Overfloweth (September)
- Rethrowing Exceptions in .NET (September)
- Installing XNA 3.0 from Scratch (September)
- XNA Presentation Source (September)
- Boolean Magic (August)
- Zune Review (August)
- XNA Presentation This Week (August)
- Introducing Scurvy.Test (July)
- Creating Great Community Games (July)
- XNA Input Guidelines? (July)
- Exception Handling in XNA (July)
- XNA Secure Unlock System (July)
- XNA MVP Award for 2008 (July)
- Adapting one Content Pipeline to Another (June)
- Socially Aware XBox Live Games (June)
- Why all the XACT hate? (June)
- XNA Presentation (June)
- XBox and the Case of the Mysterious Color order (June)
- I'm @ ]InBetween[, where are you? (June)
- Scurvy.Media v0.7.2008.0525 (May)
- New XNA Site Online (May)
- XNA Development on an iMac? (May)
- Game State Management Designer (May)
- Minor update checked in (May)
- Scurvy Media Logo Contest (April)
- Scurvy Media v0.7.2008.0427 Released! (April)
- XNA Wiki (April)
- New Version of Scurvy Media (April)
- All dressed up, nowhere to go (April)
- XNA Game launcher should show on my gamercard (April)
- Triumphant Nerds (April)
- Ain't life grand (February)
- The Difficulties of Audio (January)
- XNA/XACT Pre-Mortem (January)
- bug posted on XNA's Connect site (January)
- Sample Video Project (January)
- Scurvy Media v0.6.2008.0120 Released (January)
- Scurvy Media: InvalidOperationException (January)
- Scurvy Media finally works in XNA 2.0 (January)
- Scurvy Media: AVI file must be writable (January)
2007
- DBP 2.0 Challenge Announced (December)
- DBP 2.0 is around the corner (December)
- Next build of Scurvy.Media almost done (December)
- Scurvy Media v.next (November)
- Scurvy.Media upgraded to XNA 2.0 beta (November)
- XNA Game Studio 2.0 Beta (November)
- Scurvy Media v0.5.2007.1104 (November)
- Comments Disabled (October)
- VS Color Scheme (October)
- .NET Source Code to be Released (October)
- Happy Talk like a Pirate Day (September)
- Scurvy Media is now Open Source (September)
- XSI Mod Tool (September)
- XNA Video Announcement ... coming soon (September)
- EA Skate. Awesome! (September)
- Collaborative Multiplayer Game (September)
- Using Windows Live Writer (September)
- Trackballs, 3D Modelling (August)
- RSS link back up (August)
- XNA Video Library (August)
- Further Updates (August)
- RSS Link (August)
- Site Moved! (August)