I have been on a look out for opportunities to learn and develop as many skills as possible lately. I then came across the Data visualization hacknight conducted by @HasGeek seemed like the prefect opportunity for it. I could meet great people and learn some useful skills on the way as well.
The hack-night was to be preceded by workshops on D3.js, R and Pandas (by @sanand0 which I was looking forward to especially after his amazing session at PyCon 2012). I wanted to learn R to know a bit about statistical analysis. So, I learnt a bit of it on my own.
I went to the venue a bit early, and I am really thankful for that, for I got an opportunity not only to meet the organizers Haris and Zainab, but also make a great friend in Mr. Govind Kanshi. Inspite of a slow start, I was blown away by the visualizations using D3. Though the session on R was a bit of a bore, I was hell-bent on using it later during the night.
But the high point of the workshop, as expected was by Anand. The simplicity of his approach was almost breath-taking. Though it was limited in its documentation, its power as demonstrated by Anand, ensured that we stuck with it for most of out analysis.
Let the hacking begin…
We moved to the cosy hack-space in the evening around 5PM. After a brief round of introductions, I teamed up with Ashwin, Niket, Kartik, and Indraneel to work on the Million Song Dataset. We kept messing around with the data with Pandas and R.
Unfortunately, due to the huge size of the dataset and the not so amazing configurations of our laptops, we decided to work on a portion of the columns of the total dataset rather than parts of it. So, we went about saying
We would love to work on danceability but hotness will have to do for now.
After preliminary data analysis on song-hotness, where it was more-or-less the same through out, we moved on to the duration which led us to the discovery that the average song-duration increased, by over a minute in the mid-1960s which coincided bang-on with the advent of Pink Floyd on the scene and hence the name Pink-Floyd Effect.
The average song duration increased by over a minute in the mid 1960s which concided bang-on with the advent of Pink Floyd on to the scene and hence the name – Pink Floyd Effect
As the night progressed, we moved into exploring D3 as well, and also moved on th a new dataset, this time on movies. While out initial analysis pointed out some obscure movies to be the top-rated ones, we decided that we need to have cut-oof number of votes to allow a movie to considered. But by this time all of us were pretty sleepy. So, we ran across to the nearest beer-store and bought some Red Bulls and continued our work.
We then re-validated that Shawshank redemption is indeed the top rated movie with atleast 20k votes. Hurrah! Atleast, our analysis was not completely off. But most of the night was invested in filtering down the data set with Pandas, especially given its lack of documentation.
However, after an fun and learning but exhausting night, we gave out demo and took leave, with amazing experiences and great connections. I would love to be a part of many more hack-nights and conferences in the future and kudos to the HasGeek team for a great event. And here below are some of the other visualizations we come up with at the end of the night.