Last Updated on June 7, 2016
Machine learning tools save you time by automating aspects of a machine learning project.
There are platforms that you can use to work through a machine learning project end-to-end. There are also libraries that provide capabilities for one piece of a machine learning project.
Using the right machine learning tools is as important as using the right machine learning algorithms. But there are so many machine learning tools to choose from.
- How do you know which tool to use?
- How do you know you are getting the most out of your tool?
- How can you show that you know how to use a tool well?
In this post you will discover 5 tactics that you can use to learn and master any machine learning tool.
What Tools Are Out There
If you don’t know what tools are out there, you may just pick the first one that you come across and it might be a terrible fit for you or your project.
You need to know what tools are out there. There are so many tools to choose from for a given problem. Different tools for different programming languages, different problem types, even tools for the same platform that offer completely different types of modeling algorithms.
It is important to take stock and know exactly what is available, and to do so often.
How To Use A Tool Well
If you don’t know how to use a tool well, you could waste a lot of time figuring it out as you go.
We have all seen the developer who even after years cannot drive his editor effectively. You need to know the best practices for using the tool. You need to buy into the way it manages data or lays out a project. You need to learn the keyboard shortcuts or the API quirks for the most common features.
Knowing how to drive a tool expertly will save you a lot of time. You can use this time to make more accurate predictions or move onto a new project.
Keep Track of New Tools
If you don’t keep on top of new tools you may miss big opportunities.
There are always new tools being released. New tools may include better automation for common tasks. They also almost certainly will include access to new and more powerful machine learning algorithms.
It makes a lot of sense to keep track of both updates to your tools and the arrival of new machine learning tools.
Don’t Waste Your Time
A lot of machine learning tools are side projects or not ready for prime time. You probably do not want to waste your time with these.
You need to be able to sum up the quality and power of a tool very quickly to help you decide whether or not to invest the time to learn about it.
Once you do invest time into learning how to use a tool well, you need to be laser focused on collecting on the details of the tool that you can actually use in practice to build better models and make more accurate predictions.
Once you get really good at driving a specific tool, you need a way to demonstrate your skills. You can explain to someone all day long that you’re good at using this or that tool, but it easier to use simple indicators that show (rather than tell) that you have mastered the tool. This can be useful for interviews.
Use a Systematic Process
You need a systematic process to discover what machine learning tools are out there that you could use or learn more about.
You need a methodology that you can use to work through all the documentation, examples and fluff for a machine learning tool and figure out quickly what it can do for you and whether you can trust it.
You need a structured way of gathering the usage information for a tool in such a way that it maps onto your process for working through a machine learning project so that you can use it efficiently and effectively on your next project.
You can learn any machine learning tool very quickly and to expert-level skill for your specific domains. You just need to do so using a step-by-step systematic process.
Learn Any Machine Learning Tool
There are 5 tactics that you can use to learn any machine learning tool:
1. List Tools
Create lists of machine learning tools.
Use a spreadsheet and create column heading for the details that you need to know about each tool such as the name, URL, programing language and types of data sources it supports. Use your favorite search engine and scour the web for candidate tools that you could use. Rank each tool against your requirements and make a shortlist of tools that you may want to investigate further.
This simple and time tested method can very quickly remove uncertainty, narrow scope and get you started. The list does not have to be complete, it just has to be useful. The tools that are hard to find are not being talked about and probably not as desirable or useful to you.
For an example that is perhaps not as simple as I would advise:
2. Describe Tools
Create a description of a machine learning tool tailored to your needs.
Open a blank text document and create headings for your key questions regarding one tool. Questions such as what algorithms does is support? How does it load data? What languages does it support? Can it save models? How long ago was the last release? Dive into the documentation, samples, forums, reviews and APIs for the tool and quickly gather answers to your questions. Limit your tailored description to one page. Repeat for other tools and compare and contrast.
This simple tactic avoids the trap of spending days (or longer) reading all of the documentation on a tool to decide if it is appropriate for your needs when you could be using that time evaluating other tools or getting started with your project.
For example, here are some descriptions of tools:
3. Proceduralize Tools
Capture tool usage information into a jump-start guide that you can use to get results very quickly.
Open a blank text document and out headings for the major tasks of a machine learning project that the tools supports. This may include loading data, analyzing data, transforming data, building a model, evaluating a model and so on. Write procedures for exactly how to use the tool to get a result for each heading. Use dummy data (such as datasets from the UCI Machine Learning repository). List multiple procedures if the tool provides multiple techniques.
You will be amazed at how valuable short recipes are when starting a new project. Copy and paste them and modify to use your dataset.
For example, here are some procedures:
- Non-Linear Classification in R with Decision Trees
- How to Tune Algorithm Parameters with Scikit-Learn
4. Investigate Tools
Create demos or mini-tutorials demonstrating how to use a specific feature or capability of a tool.
Pick a feature or capability of the tool that is interesting or generally useful. Create a short post, video or tutorial on how a practitioner or beginner could use the feature. Provide step-by-step procedures, point out limitations and best practice heuristics. Post it publicly so that it can be used to help others, such as on your blog, GitHub, a forum, or YouTube.
You can use a small number of mini investigations to show credibility that you know how to use the tool to get results. Don’t worry if there are similar tutorials, use your voice and give your spin on how to use the feature.
For example, here are some tutorials:
- How to improve machine learning results in Weka using ensembles
- How to run your first experiment in Weka
- How to build your first classifier in Weka
5. Augment Tools:
Extend or create plug-ins for tools to further automate, fill feature gaps and demonstrate mastery.
Once you have used a tool a lot in practice, you will become aware of limitations and missing features. For libraries and command line tools, you may even create wrapper-scripts and helper-functions. Gather up this information and create a extension, wrapper or plug-in for the tool. Make it small, well documented with examples and release it as open source on a platform like GitHub.
Creating an extension to a tool often will formalize processes you need or already use privately, give back allowing others to make use of these features and demonstrate that your deep knowledge and even mastery of the tool.
Back in the day, I created a bunch of plug-ins for Weka and then later cleaned them up and released them as an open source project. They may or may not still work, but here’s the link:
- WEKA Classification Algorithms Plug-in: Includes LVQ, SOM, Neural Networks and Artificial Immune System algorithms.
You Can Learn Any Machine Learning Tool
You do not need to be a programmer. There are many machine learning tools that provide graphical user interfaces or command line interfaces allowing you to build models and make accurate predictions without writing a line of code.
You do not need to know any math. Just like you don’t need math to drive Microsoft Excel, you do not need a background in mathematics to drive many if not most of the machine learning tools available. Figure out what capabilities you need, pick a tool and see for yourself.
You do not need to learn a specific programming language. Pick a programming language and you will discover that there are machine learning libraries available. Some libraries have been around longer and are more mature. There are also web service APIs for machine learning as a service that support a range of different languages. You don’t even need to write code to do machine learning if you don’t want to. In the end you should choose a language that best suits your project or your background if you are doing self study.
You do not need to be an expert at machine learning. You do not need to be a machine learning expert to use machine learning tools. In fact I recommend that you use machine learning platforms like WEKA when getting started to accelerate your learning and rapidly deliver results and build your confidence.
You do not need to be an expert in the tool. I see a lot of expert programmers that do not know how to use the editor or IDE very well. It slows them down. You can learn to drive a tool better than expert in machine learning or in the tool when you make the tool the subject of study. Few people do, and if you do it will give you a huge advantage. You could even start answering questions on expert forums on how to use the tool well because you bothered to study it when other practitioners didn’t.
Do you have a question? Post it in the comments below.
You can learn any machine learning tool. From discovering what tools are out there, choosing which tool to use and demonstrating that you can use it well.
The 5 tactics that you can use to learn any machine learning tool are:
- List Tools: Make lists of tools that meet your needs.
- Describe Tools: Make customized descriptions of tools to answer your questions.
- Proceduralize Tools: Create recipes for common machine learning project tasks that you can use as jump-start guides.
- Investigate Tools: Create mini-tutorials and demos of tool features and capabilities as practice and to demonstrate expertise.
- Augment Tools: Create extensions and plug-ins for tools to formalize your usage, wrappers or fill feature gaps and demonstrate mastery.
If you would like to know more about the types of machine learning tools, see the post Machine Learning Tools.
Your Next Step
Is there a machine learning tool that you would like to study?
- Commit. Start studying a tool Right Now!
- Pick and apply one of the tactics described above.
- Spend no more than 1 hour.
- Report back in the comments, I’d love to see what you discovered.
Wonderful article to jump start on the machine learning journey in a systematic manner. Yet, those who want could bypass few steps and move on too..
I following you since few months ago.
Before to choose spend time in a particular tool, I am spending, probably too much time, analysing which one.
Do you know BigML platform? I am doubting between this one and Weka.
Could you give a hand?
Thanks a lot
I would recommend Weka as it is free open source and provides a suite of methods.
You can get started here:
We bought your entire library last November. Hadn’t gotten to it yet, then started RapidMiner (I talked to their board on Twitter for relevance to business or setting up events etc) then decided I needed to learn this stuff.. so I started with the tool yesterday!
Then found this post – we love your posts btw b/c you distill things to their essence, then list them.
Will follow your advice and intend to learn it well. Do you think its better to do a few hours/day or – if time’s available – total immersion?
I recommend following a process that has worked well for you in the past.
For me, I prefer immersion/obsession.
Dear Dr Jason,
In another posting you mentioned the “Orange” software package from the University of Ljubljana. It is a HUGE package of over 480MB. I have limited bandwidth internet.
Do you have brief comments on this package?
Anthony of Sydney
Sorry, I don’t. I have not used it in many years.