Last Updated on
The Python conference PyCon2014 has held recently and the videos for the conference are online.
I have been working my way through the interesting machine learning ones and will share a few on this over the coming weeks.
A great talk if you are starting out in data science or machine learning in python was given by Melanie Warrick titled How to Get Started with Machine Learning. It’s about 25 minutes long. The abstract of the talk is:
Provide an introduction to machine learning to clarify what it is, what it’s not and how it fits into this picture of all the hot topics around data analytics and big data.
Computers…ability to learn without… explicit programming
She positions machine learning as the toolkit used in Artificial Intelligence and Data Science. Relatedly, she describes big data as data beyond the ability of common technology to capture and curate. This definition sits well with me. Although the talk is an introduction to machine learning, the focus is on the application of machine learning in data science.
Melanie describes the four main data science roles as data lead, data creative, data developer and data researcher and uses a graph to indicate the amount of machine learning performed by each role. She also describes a data science project workflow.
Discover how to prepare data with pandas, fit and evaluate models with scikit-learn, and more in my new book, with 16 step-by-step tutorials, 3 projects, and full python code.
She provides a cute example of linear regression on a 2d dataset (head size vs brain weight) using scikit-learn. Usefully, she summarizes Python tools in categories:
- Explore data: pandas, statsmodels, matplotlib, numpy, unix
- Build model: scikit-learn, numpy, pandas, scipy
- Test model: scikit-learn, matplotlib
- Data products: API, Flask, Django
- Visualize: D3, Matplotplib, vincent and vega, ggplot
There is also a question at the end about contracting Python and R and she makes the apt comment of sticking with one language (i.e. Python) so you don’t need to change languages between research and production.
The talk is on youtube and on the pyvideo archive. You can review the slides from the talk and download the sample code from github. Melanie maintains a blog at nyghtowl.io and you can review the post on her talk here.