Last Updated on June 7, 2016
Data Origami is a new website by Cameron Davidson-Pilon that provides data science screencasts. It is a cool idea and a cool site.
Cameron was kind enough to give me access to the site so that I could review it. I watched all of the videos I could and wrote up all my notes, and in this post you will get a sneak peek into Cameron’s new site Data Origami.
Data Origami is a simple idea. It provides screencasts on topics relevant to a data scientist.
Each screencast is 9-13 minutes in length on a narrow and specific topic. Screencasts all use Python and are presented in an IPython notebook including text, mathematical equations, code and plots. The notebooks are available as well as downloads of the videos themselves for desktop and mobile and links to further resources and relevant datasets.
At the time of writing it is a paid service at $9 a month for access to all of the screencast, although there is one screencast available for free.
The videos assume you know how to program (Python) and that you know statistics.
The site is clean and has a Heroku feeling to it (maybe it’s the purple and the line drawings). The videos are large and good quality and the screens are not cluttered with distractions.
Who is Cameron?
If you’re looking for indicators of authority in the domain, Cameron has them.
Cam works on Data Analytics at Shopify. He’s crunching data for a big company, 9-5.
Cameron is the author of the self-published technical book Bayesian Methods for Hackers which teaches an introduction to Bayesian using Python. It is all available on GitHub (and nbviewer IPython viewer) and has been popularized many times on technical news sites such as Hacker News and Reddit (multiple times, social proof++).
Finally, Cameron is the author of lifelines, a Python package that supports survival analysis.
Both the topics of Bayesian Methods and Survival Analysis feature in his screencasts on Data Origami.
Data Science Screencasts
I slammed through all 7 screencasts and took notes. I want to respect Cam and his resource, so here is just a summary of the videos currently available:
- Bayesian Beta-Binomial Model: More maths than the others, focuses on introducing the Beta distribution and using it to model posterior distributions.
- Intro to PCA: What is Principal Component Analysis, what it is trying to achieve and what the results mean.
- Visualizing PCA’s information loss: Clever way to demonstrate this reversible projection method.
- Sorting Colours using PCA (the free one): A clever way to demonstrate a useful application of PCA.
- A/B testing conversion rates: A quantified approach to communicating uncertainty in the context of A/B test results. A must watch!
- Why should I be interested in Survival Analysis? Setting the scene for Survival Analysis.
- Estimating the Survival Function: Using Kaplan Meier Estimator to model the survival function for a clever example problem.
Note I used clever a few times. His examples are very well thought out, very cool.
UPDATE: There is a new screencast that appeared since I wrote the review.
Cameron knows his stuff. I found the PCA videos less interesting personally, either because I was familiar with the content or perhaps the delivery was less polished. Diving into Bayesian uncertainty and survival analysis was awesome.
Cameron’s the boss of Bayesian. He could easily divide his book up into 10-minute chunks and I would eat it all up (hint, hint).
The videos seem to be hosted on Amazon S3, but I suffered some lag while watching. It is very possible it was the time of day I decided to watch the videos, but it was annoying at the time. Not a big deal, I could have just downloaded them and watched and I’m sure Cam will sort this out as he grows.
He is still finding his feat in terms of format. The more recent videos are a lot more polished than the early ones and a great sign of what is to come. Personally, I’d really like more “this is what we’re going to do” at the start and “this is what we did” at the end. I have to be highly caffeinated to absorb one of these videos on a first watch, even with rapid note taking. Having the screencast remind me of what we covered would be cool.
I maybe somewhat of a power user. I watch all youtube videos on 2x and take lots of notes. It would be cool if the built in player had a 2x feature and if the account supported note taking or comments. Not a big deal, just power user features that might increment happiness.
Once he gets a lot more content in there, I can imagine checkboxes for “I’ve watched this” and even bundling of videos into content-streams.
There does not appear to be a roadmap for content at this time, really just whatever takes Cam’s fancy. This is good, in that he is passionate on whatever he’s sharing, but bad initially because we have to snap to his interests. There’s no hand holding.
Cam notes that he is releasing 2 per month, so growth of the library is bounded. This might curb burn-out (like Ryan Bates from railscasts), but is only 24 per year. I power-slammed all 7 videos in one night. I expect some appetites may not be sated.
Finally, the content is pro. Some screencasts are tagged as beginners. They’re not. You will want to know your way around data and some algorithms before diving in. If you’re still deciding what tool or library to use to run your first classifier on the iris dataset, this resources is not for you.
This is a great resource with all the signs of being a must-have, with time.
- It’s created by a real pro, a Bayesian boss.
- It’s too cheap (raise your prices, consider offering a year/lifetime pass for a few hundred/thousand bucks).
- It is really for intermediate level (or higher) practitioners, say peers of Cameron or close to it.
- It only a dozen videos, but will be added to monthly.
- It does not have a “follow me from a to b” roadmap, but he’s providing peaks at upcoming ‘casts.
If data is your day job, check out Data Origami and get in early to support Cameron and his vision for amazing world-class data science screencasts.
I agree the Data Origami screencasts are excellent. The sample convinced me and I’m a paying subscriber even though most of the material is familiar. I disagree that the payment is too low. $4.50/10 minute video is not inexpensive, IMHO, compared to, say, Murphy’s “Machine Learning.”