Tools are a big part of machine learning and choosing the right tool can be as important as working with the best algorithms.
In this post you will take a closer look at machine learning tools. Discover why they are important and the types of tools that you could choose from.
Why Use Tools
Machine learning tools make applied machine learning faster, easier and more fun.
- Faster: Good tools can automate each step in the applied machine learning process. This means that the time from ideas to results is greatly shortened. The alternative is that you have to implement each capability yourself. From scratch. This can take significantly longer than choosing a tool to use off the shelf.
- Easier: You can spend your time choosing the good tools instead of researching and implementing techniques to implement. The alternative is that you have to be an expert in every step of the process in order to implement it. This requires research, deeper exercise in order to understand the techniques, and a higher level of engineering to ensure it is implemented efficiently.
- Fun: There is a lower barrier for beginners to get good results. You can use the extra time to get better results or work on more projects. The alternative is that you will spend most of your time building your tools rather than on getting results.
Tools With a Purpose
You do not want to study and use machine learning tools for their own sake. They must serve a strong purpose.
Machine learning learning tools provide capabilities that you can use to deliver results in a machine learning project. You can use this as a filter when you are trying to decide whether or not to learn a new tool or new feature on your tool. You can ask the question:
How does this serve me in delivering results in a machine learning project?
Machine learning tools are not just implementations of machine learning algorithms. They can be, but they can also provide capabilities that you can use at any step in the process of working through a machine learning problem.
Good Versus Great Tools
You want to use the best tools for the problems that you are working on. How to do tell the difference between good and great machine learning tools?
- Intuitive Interface: Great machine learning tools provide an intuitive interface onto the sub-tasks of the applied machine learning process. There’s a good mapping and suitability in the interface for the task.
- Best Practice: Great machine learning tools embody best practices for process, configuration and implementation. Examples include automatic configuration of machine learning algorithms and good process built into the structure of the tool.
- Trusted Resource: Great machine learning tools are well maintained, updated frequently and have a community of people around it. Look for activity around a tool as a sign it is being used.
When To Use Machine Learning Tools
Machine learning tools can save you time and help you consistency deliver good results across projects. Some examples of when you may get the most benefit from using machine learning tools include:
- Getting Starting: When you are just getting started machine learning tools guide you through the process of delivering good results quickly and give you confidence to continue on with your next project.
- Day-to-Day: When you need to get good results to a question quickly machine learning tools can allow you to focus on the specifics of your problem rather than on the depths of the techniques you need to use to get an answer.
- Project Work: When you are working on a large project, machine learning tools can help you to prototype a solution, figure out the requirements and give you a template for the system that you may want to implement.
Platforms Versus Libraries
There are a lot of machine learning tools. Enough that a google search can leave you feeling overwhelmed.
One useful way to think about machine learning tools it so separate them into Platforms and Libraries. A platform provides all you need to run a project, whereas a library only provides discrete capabilities or parts of what you need to complete a project.
This is not a perfect distinction because some machine learning platforms are also libraries or some libraries provide a graphical user interface. Nevertheless, this provides a good point of comparison to differentiate genera case purpose from specific purpose tools.
Machine Learning Platform
A machine learning platform provides capabilities to complete a machine learning project from beginning to end. Namely, some data analysis, data preparation, modeling and algorithm evaluation and selection.
Features of machine learning platforms are:
- They provide capabilities required at each step in a machine learning project.
- The interface may be graphical, command line, programming all of these or some combination.
- They provide a lose coupling of features, requiring that you tie the pieces together for your specific project.
- They are tailored for general purpose use and exploration rather than speed, scalability or accuracy.
Examples of machine learning platforms are:
- WEKA Machine Learning Workbench.
- R Platform.
- Subset of the Python SciPy (e.g. Pandas and scikit-learn).
Machine Learning Library
A machine learning library provides capabilities for completing part of a machine learning project. For example a library may provide a collection of modeling algorithms.
Features of machine learning libraries are:
- They provide a specific capability for one or more steps in a machine learning project.
- The interface is typically an application programming interface requiring programming.
- They are tailored for a specific use case, problem type or environment.
Examples of machine learning libraries are:
- scikit-learn in Python.
- JSAT in Java.
- Accord Framework in .NET
Machine Learning Tool Interfaces
Another useful way to think about machine learning tools is by the interface they provide.
This can be confusing, because some tools provide multiple interfaces. Nevertheless, it provides a starting point and perhaps a point of differentiation to help you pick and choose a machine learning tool.
Below are some examples of common interfaces.
Graphical User Interface
Machine learning tools provide a graphical user interface including windows, point and click and a focus on visualization. The benefits of a graphical user interface are:
- Allows less-technical users to work through machine learning.
- Focus on process and how to get the most from machine learning techniques.
- Structured process imposed on the user by the interface.
- Stronger focus on graphical presentations of information such as visualization.
Some examples of machine learning tools with a graphical interface include:
Command Line Interface
Machine learning tools provide a command line interface including command line programs, command line parameterization and a focus on input and output. The benefits of command line user interface are:
- Allows technical users that are not programmers to work through machine learning projects.
- Provides many small focused programs or program modes for specific sub-tasks of a machine learning project.
- Frames machine learning tasks in terms of the input required and output to be generated.
- Promotes reproducible results by recording or scripting commands and command line arguments.
Some examples of machine learning tools for a command line interface include:
If you like working on the command like, checkout the great book on how to work through machine learning problems on the command line titled “Data Science at the Command Line: Facing the Future with Time-Tested Tools“.
Application Programming Interface
Machine learning tools can provide an application programming interface giving you the flexibility to decide what elements to use and exactly how to use them within your own programs. The benefits of application programming interface are:
- You can incorporate machine learning into your own software projects.
- You can create your own machine learning tools.
- Gives you the flexibility to use your own processes and automations on machine learning projects.
- Allows to to combine your own methods with those provided by the library as well as extend provided methods.
Some examples of machine learning tools with application programming interfaces include:
- Pylearn2 for Python
- Deeplearning4j for Java
- LIBSVM for C
Local Versus Remote Machine Learning Tools
A final way to compare machine learning tools is to consider whether the tool is local or remote.
A local tool is one that you download, install and use locally where as a remote tool is run on a third party server.
This distinction can also be muddy as some tools can be run in a local or remote manner. Also, if you are good engineer, you can configure almost any tool to be a hosted solution on your own servers.
Nevertheless, this might be a useful distinction to help you understand and choose a machine learning tool.
Local Tools
A local tool is downloaded, installed and run on your local environment.
- Tailored for in-memory data and algorithms.
- Control over run configuration and parameterization.
- Integrate into your own systems to meet your needs
Examples of local tools include:
- Shogun Library for C++
- GoLearn for Go
Remote Tools
A remote tool is hosted on a server and called from your local environment. These tools are often referred to as Machine Learning as a Service (MLaaS).
- Tailored for scale to be run on larger datasets.
- Run across multiple systems, multiple cores and shared memory.
- Fewer algorithms because of the modifications required for running at scale.
- Simpler interfaces providing less control over run configuration and algorithm parametrization.
- Integrated into your local environment via remote procedure calls.
Examples of remote tools:
There are tools that you can use to set-up your own remote solution and integrate into your environment as a service. Examples include:
- Apache Mahout for Hadoop
- MLlib for Spark
- PredictionIO
Summary
In this post you discovered why tools are so important in applied machine learning.
You learned that without good machine learning tools you would have to implement all of the techniques from scratch requiring expertise in the techniques and in efficient engineering practices.
You learned three structured ways to think about machine learning tools:
- Platforms versus Libraries
- Graphical User Interfaces versus Command-Line Interface versus Application Programming Interfaces
- Local versus Remote
What machine learning tools are you using?
Leave a comment and share which machine tools you are currently using.
thank you!
It is really nice to know what type of tools are available out there..
No problem Faisal.
tools are the ones helping as do ml
Thank you mugo for your support! We wish you the best on your machine learning journey!
A very well written overview. Thank you. Ashi
I’m glad to hear that Ashi. Thanks.
Thanks for this very interresting post. According to you, in wich category is SQL server data mining?
I don’t know much about SQL server and its data mining capabilities, sorry.
Hi I am an Undergraduate computer science student starting out with Machine Learning. What should I start with and which platforms should I go for? I have basics in Python and I am undergoing a course in ML.
My best advice for getting started is here:
https://machinelearningmastery.com/start-here/#getstarted
I think the best platform for getting started with predictive modeling with machine learning for beginners is Weka:
https://machinelearningmastery.com/start-here/#weka
I have tons of info on getting started with Python for machine learning here:
https://machinelearningmastery.com/start-here/#python
I hope that helps.
Please where can i get Microsoft word documents (benign and malicious) dataset. I’m working on a machine learning project to determing if its benign and malicious
I don’t know, sorry Vicky.
Hi Jason,
Congratulations, a really good post. I want to start working with Machine Learning and I don’t know very well which is the best platform for it (i know program in Matlab with a high level, because i have made investigation research in my math degree, so i think that maybe R is the best for me).
Thanks in advance 🙂
I’d recommend picking one and doubling down.
Hi,
What are the different model file types that are available in industry? For example SAS, SPSS, PMML.
I don’t know. CSV is common.
Hi Jason, Thanks for writing about Machine Learning and your time. I see this article written in 2015 and after 3 years/2018.. many tools/libraries go obsolete many new arrive. Wish there was github for these articles too :-).
Is this article is outdated?
Thanks the advice still generally applies.
What are you looking to do exactly?
Hi Jason,
I am a beginner in the programming language but I have a lot of interest in learning machine language. So from where should I start with? Is there any static tools available for building up some new ideas?
You can start with Weka, as no programming is required:
https://machinelearningmastery.com/start-here/#weka
Hi Jason,
Great website, I am glad I found it. I started toying with some tools (Orange, and Weka mostly). At first I was stuck with the latter but thanks to one of your posts, I have learned the basics and I have to admit that I find it very useful and I am willing to explore all the tutorials that you’re providing on this tool. Is there a particular reason why you’re focusing on this platform (other than path dependency)?
Other than that I quickly searched and I could not found articles about hardware / computers / OS platforms for ML. Although there are many articles tackling the topic, I think that it would be helpful to some of your readers to get your take on that.
Thanks again!
Perhaps is will help:
https://machinelearningmastery.com/computer-hardware-for-machine-learning/
Hi Jason,
First of all congratulations for creating 1000s of blogs and Thanks for creating with such clean explanation. Every time i search for an issue, your blogs come in Top 5 recommendation. You really write very well.
This post is very meaningful. But i want to ask something else. I have started working on Azure machine learning where I have created Multi Label Decision forest classifier model with categorical dataset, Its really good but now i am looking for some trick or predefined module which can give me ‘feature-importance’ for predicting the target value of EACH testing data. ‘Permutation Feature Importance’ does give feature importance value for complete testing dataset but i want for individual testing data row.
The reason of doing this is, I want to show user for his every entry which field entries contributed most in predicting the result.
I wrote this issue at so many places but didn’t get any positive response, Hope i get my answer here
Thanks
Thanks!
Perhaps you can test feature selection methods:
https://machinelearningmastery.com/an-introduction-to-feature-selection/
Jason,
Seu site/blog é incrível. Muito bem explicado e intuitivo. Parabéns.
Thanks!