Last Updated on September 27, 2016
The Ladder Approach That You Can Use To Become a
Machine Learning Consultant
Do you want to do machine learning and get paid for it?
Be careful what you wish for.
In this post I outline a blueprint that you can use to learn enough machine learning to help small businesses and start-ups with their general data needs.
It’s not easy, you will have to work hard outside of your comfort zone. You will have to talk to real people in the real world!
The blueprint presented in this post will take you from a passionate interest in machine learning and the dedication to learn through to being capable and confident to work through the general data problems in a small to medium business or start-up and deliver a solution.
The blueprint for this path is as follows:
- Build a foundation
- Build a portfolio
- Deliver solutions
Given your background and interests, you can tailor the roadmap to your needs.
To be clear, we are only interested in applied machine learning. We are only interested in theory and tools as much as they allow you to better understand your problem and achieve better results on the problem you are working.
This is a counter-intuitive but very productive view. Learn what you need just-in-time and focus on delivering results. It is about reliably achieving good results, not perfection.
1. Build a Foundation
You need to learn enough applied machine learning to have the confidence to work a problem from start to finish. To define it accurately and deliver a model or report required as an outcome for the project.
- Pick and learn a process. Learn a step-by-step process that you can follow that will take you from problem definition through to delivering a result. Some examples include KDD, Crisp-DM, OSEMN, and others.
- Pick and learn a tool. Learn a tool or libraries that you can use to complete your selected process. I recommend one of Weka, scikit-learn, R depending on your interests and preference.
- Practice on small datasets. Download small datasets on which you can practice. Spend a lot of time on the UCI ML repository.
You are ready to move on when you are confident and capable enough to pick an arbitrary in-memory problem and use your tool to work it from start to finish.
2. Build a Portfolio
Once you have a foundation capability to work problems you need objective indicators that others can use to evaluate your capability. You need completed projects that demonstrate your ability to deliver.
You can do this by building a portfolio of completed machine learning projects.
Interlude on Mindset
Pause for a moment and take on the mindset of a manager or small business owner with a data problem.
As such a person, you are hiring programmers based on their ability to deliver results on project at other companies and in open source. You are hiring marketers based on their ability to lift conversions to attack the bottomline. If such a manager needed a “data guy” to deliver a report or a model, what would they look at to evaluate that a candidate could deliver a result?
Me in that position, I would want to see evidence of completed projects. More than that, I would want to see evidence of completed projects that are very close to the result I am looking for.
- Pick a theme. This is the type of projects that you want to work on. A no-brainer would be reports on customer data (high-value customers, predictions of prospects that convert, etc.).
- Find open datasets. You need to locate datasets that you can practice on that are close to or on your theme. Look on competition websites like Kaggle and KDDCup as a starting point. There are a lot of public access datasets these days that you can practice on!
- Complete projects. Treat each dataset like a project with a client and apply your process to it in order to deliver a result. This may require you to assume the role of the client and take an educated guess as to the outcome they are looking for (model or report on a specific question, etc.)
- Write-up. Write-up your findings as a semi-formal work product and host it publicly online.
This last point is key and I will elaborate it.
Ideally, make each part of your process scripted so that you can re-execute it any time as you find bugs or gain insight. Consider uploading all of your code and scripts to a public github account for the project.
Write up the result of each project as a technical report or a power point. Consider recording a short video presenting your findings. Host the report on github, your blog, or somewhere. Write up the project on your public LinkedIn profile.
Your goal is to have a place that you can point someone and they can see all of the projects you have completed at a glance, and dive down into one and see what you did and what you delivered.
You are ready to move on when you can objectively convince someone that you are able to deliver results on your theme. I think 3-5 modest sized completed projects would be reasonable.
Learn more about building a portfolio of machine learning projects in the post “Build a Machine Learning Portfolio: Complete Small Focused Projects and Demonstrate Your Skills“.
3. Deliver Solutions
Now that you have the capability to deliver and evidence to that fact, it is time to seek out projects in the wild for you to complete.
You are going to have to get out there and talk to people. This step will be the great filter. This step may be a little scary and a little difficult and it will be your true test.
- Find someone that you can help. Use your social network. Attend meet-ups, get introductions, etc. Look for a small company or start-up that you can meet with face to face (ideally) find out about their problems and get access to their data.
- Be honest. Tell the truth. Explain where you have come from, what you have done and what you can do for them. Consider doing the first piece of work for free or cheap to get your first project under your belt. Your path is an advantage, it shows you are hungry, eager to deliver and driven. We all want to work people that present this way.
- Deliver. Do the work. Specify the project accurately, keep the scope small and clear and deliver what you say you will deliver. Again, don’t promise something you have not done before or don’t know how to do.
Keep projects small in scope and short in time. Ideally, deliver in 1-2 weeks. You need momentum, fast results and fast learnings for your client.
As you complete real projects, add them to your portfolio (in a muted form respecting the privacy of your clients).
In this post you discovered a roadmap that you can use to take your passionate interest in machine learning and turn it into a consulting gig.
There is a not a lot of hand-holding in this approach. This makes it exciting and empowering. You can execute this approach to your level of comfort and take on some moonlighting work or a whole new career.
If you have followed this path or know someone that has, leave a comment and share your experiences.
So what portfolio’s are big now? Which portfolio’s get you hired? I am in Silicon Valley and maybe there are geography and demographic related issues to answer this question.
What about doing a blog in addition to GitHub?
to add to Jason’s nice roadmap and try to address your questions:
It just requires a handful of nice projects to get attention. In my opinion, it’s more about the quality rather than the quantity of projects. In this context, I may disagree a little bit with Jason here: UCI’s repository is nice for personal practice, but I think it’s more important and interesting to go beyond that and tackle “something new” or more “innovative” — an important skill of a “data scientist” or machine learning practitioner is also to come up with new, interesting hypothesis to solve real-world problems and don’t forget that data collection and cleaning is also makes up a big portion in the typical KDD/ML/data mining pipeline.
About blogging: I think this goes hand in hand with GitHub. GitHub is the place were you dump your code and documentation, and the blog article is eventually the opportunity to really present your work, explain your approach, and draw your conclusions. I see GitHub more as the “methods” section of your work which you’d want to refer to in your blog post.
Thanks for the nice article as always. How do you think someone could display his portfolio?
Thanks Jason. I am interested in that idea. It seems unlikely for small business owners to have the resources to pay a data scientist or am I mistaken?
Depends if they can work on a problem that impacts the bottom line directly or not.