You should pick the right tool for the job.
The specific predictive modeling problem that you are working on should dictate the specific programming language, libraries and even machine learning algorithms to use.
But, what if you are just getting started and looking for a platform to learn and practice machine learning?
In this post, you will discover that Python is the growing platform for applied machine learning, likely to outpace and topple R in terms of adoption and perhaps capability.
After reading this post you will know:
- That search volume for Python machine learning is growing fast and has already outpaced R.
- That the percentage of Python machine learning jobs is growing and has already outpaced R.
- That Python is used by nearly 50% of polled practitioners and growing.
Kick-start your project with my new book Machine Learning Mastery With Python, including step-by-step tutorials and the Python source code files for all examples.
Let’s get started.
Python for Machine Learning is Growing
Let’s look at 3 areas where we can see Python for machine learning growing:
- Search Volume.
- Job Ads.
- Professional Tool Usage.
Python Machine Learning Search Volume is Growing
Search volume is probably indicative of students, engineers and other practitioners searching for information to get started or go deeper into the topic.
Google provides a tool called Google Trends that gives insight into the search volume of keywords over time.
We can investigate the growth of “Python machine learning” from 2004 to 2016 (the last 12 years). Below is a graph of the change in search volume for this period:
We can see that the trend upward started in Perhaps 2012 with a steeper rise starting in 2015, likely boosted by Python Deep Learning tools like TensorFlow.
We can also contrast this to the search volume for R machine learning and we can see that from about the middle of 2015, Python machine learning has been beating out R.
Blue denotes “Python Machine Learning” and red denotes “R Machine Learning”.
Python Machine Learning Jobs are Growing
Indeed is a job search website and like Google trends, they show the volume of job ads that match keywords.
We can investigate the demand for “python machine learning jobs” for the last 4 years.
We can see time along the x-axis and the percentage of job postings that match the keyword. The graph shows almost linear growth from 2012 to 2015 with a hockey-stick like increase in 2016.
We can also compare the job ads for python and R.
Blue shows “Python machine learning” and orange shows “R machine learning”.
We see a more pronounced story compared to Google search volume. The percentage of job ads available to indeed.com shows that demand for Python machine learning skills has been dominating R machine learning skills since at least 2012 with the gap only widening in recent years.
KDNuggets Survey Results: More People Using Python for Machine Learning
We can get some insight into the tools used by machine learning practitioners by reviewing the results for the KDnuggets Software Poll Results.
Here’s a quote from the 2016 results:
R remains the leading tool, with 49% share, but Python grows faster and almost catches up to R.
— Gregory Piatetsky
The poll tracks the tools used by machine learning and data science professionals, where a participant can select more than one tool (which is the norm I would expect)
Here is the growth of Python for machine learning over the last 4 years:
1 2 3 4 |
2016 45.8% 2015 30.3% 2014 19.5% 2013 13.3% |
Below is a plot of this growth.
We can see a near linear growth trend where Python s used by just under 50% of profesionals in 2016.
It is important to note that the number of participants in the poll has also grown from many hundreds to thousands in recent years and participants are self-selected.
What is interesting is that scikit-learn also appears separately on the poll, accounting for 17.2%.
For more information see: KDnuggets 2016 Software Poll Results.
O’Reilly Survey Results: More People Using Python for Machine Learning
O’Reilly performs an annual Data Science Salary Survey.
They collect a lot of data from professional data scientists and machine learning practitioners and present the results in very nice reports. For example, here is the 2016 Data Science Salary Survey report [View the PDF Report].
The survey tracks tool usage of practitioners, and as with the KDNuggets data.
Quoting from the key findings from the 2016 report, we can see that Python plays an important role in data science salary.
Python and Spark are among the tools that contribute most to salary.
— Page 1, 2016 Data Science Salary Survey report.
Reviewing the survey results, we can see a similar growth trend in use of the use of the Python ecosystem for machine learning over the last 4 years.
1 2 3 4 |
2016 54% 2015 51% 2014 42% (interpreted from graph) 2013 40% |
Again, we can plot this growth.
It’s interesting that the 2016 results are very similar to those from the KDNuggets poll.
Quotes
You can find quotes to support any position on the Internet.
Take quotes with a grain of salt. Nevertheless, quotes can be insightful, raising and supporting points.
Let’s first take a look at some cherry-picked quotes from news sites and blogs about the growth of Python for machine learning.
News Quotes
Python has emerged over the past few years as a leader in data science programming. While there are still plenty of folks using R, SPSS, Julia or several other popular languages, Python’s growing popularity in the field is evident in the growth of its data science libraries.
— Katharine Jarmul, Introduction To Data Science: How To “big Data” With Python, Dataconomy
Our research shows that Python is one of the most popular languages for data science analyses, in use by more than one-third (36%) of organizations.
— Dave Menninger, Big Data Grows Up at Strata+Hadoop World 2016, SmartDataCollective
… the last few years have seen a proliferation of cutting-edge, commercially usable machine learning frameworks, including the highly successful scikit-learn Python library and well-publicized releases of libraries like Tensorflow by Google and CNTK by Microsoft Research.
— Josh Schwartz, Machine Learning Is No Longer Just for Experts, Harvard Business Review
Note that scikit-learn, TensorFlow and CNTK are all Python machine learning libraries.
Python is versatile, simple, easier to learn, and powerful because of its usefulness in a variety of contexts, some of which have nothing to do with data science. R is a specialized environment that looks to optimize for data analysis, but which is harder to learn. You’ll get paid more if you stick it out with R rather than working with Python
— Roger Huang, Data science sexiness: Your guide to Python and R, and which one is best, TheNextWeb
Quora Quotes
Below are some cherry picked quotes regarding the use of Python for machine learning taken from Quora questions.
Python if a popular scientific language and a rising star for machine learning. I’d be surprised if it can take the data analysis mantle from R, but matrix handling in NumPy may challenge MATLAB and communication tools like IPython are very attractive and a step into the future of reproducibility. I think the SciPy stack for machine learning and data analysis can be used for one-off projects (like papers), and frameworks like scikit-learn may be mature enough to be used in production systems.
— Aswath Muralidharan, Production Engineer. In response to the Quora question “What are the top 5 programming languages for Machine Learning?”
I’d also recommend Python as it is a fantastic all-round programming language that is incredibly useful for drafting code fragments and exploring data (with the IPython shell), great for documenting steps and results in the analytical process chain (IPython Notebook), has a huge selection of libraries for almost any machine learning objective and can even be optimized for production system implementation. In my opinions there are languages that are superior to Python in any of these categories – but none of them offers this versatility.
— Benedikt Koehler, Founder & CEO DataLion. In response to the Quora question “What is the best language to use while learning machine learning for the first time?”
[…] It is because the language can make a productive environment for people that just want to get something done quickly. It is fairly easy to wrap C libraries, and C++ is doable. This gives Python access to a wide range of existing code. Also the language doesn’t get in the way when it comes time to implement things. In many ways it makes coding “fun again” for a wide range of tasks.
— Shawn Masters, VP of Engineering. In response to the Quora question “Will Python become as popular as Java, given that Python is used in Machine Learning?”
In my opinion, Python truly dominates this category. A quick search of almost any artificial intelligence, machine learning, NLP, or data analytics topic, plus ‘Python’, will return examples of useful, actively maintained libraries.
— Ryan Hill, programmer. In response to the Quora question “Which programming language has the best repository of machine learning libraries?”
Summary
In this post, you discovered that Python is the growing platform for applied machine learning.
Specifically, you learned that:
- The number of people interested in Python for machine learning is larger than R and is growing.
- The number of jobs posted for Python machine learning skills is larger than R and growing.
- The number of polled data science professionals that use Python is growing year over year.
Has this influenced your decision to get started with the
Python ecosystem for machine learning?
Share your thoughts in the comments below.
I agree that for the moment Python and R are the 2 platforms that you really need for machine learning . However , there is fierce competition growing in the Julia corner . I recently had to code from scratch a customised k-means++ algo for a warehouse network location problem in the US . Using the flexclust package in R would have led to severe performance problems , Ditto for coding it in Python . The JIT compiler in Julia however proved to be nearly as fast as C : about 2 orders of magnitude faster than R (probably also than Python) and Julia is a lot easier (higher level) and compact to code in . Sure , Julia is still in beta (0.5) but already there are more than 1000 high quality packages available for it . As it stands , Julia is not ready yet for commercial apps , but certainly ready for internal projects . Tensorflow has already been ported and Keras will follow I hope . Mxnet is available too . Julia also was designed with parallel computing as a standard feature and will port from a laptop to a super computer cluster . This is the future of scientific computing . I intend to have Julia replacing C, Python and R asap in my job
Thanks Gerrit, some good points.
It might be the ecosystem of choice in 3-5 years, but today/this year/next year you will get a job if you can deliver results with Python.
Hi Jason, I would like to ask what’s your personal take of view in regards to python vs R. For instance instead of showing me the trends and all the stats, I would like to ask, if I may, how do you use them in your daily job, which one do you prefer and why? For instance are different tasks accomplished better with one than the other? Which are those in your experience and why?
In one of your posts I saw that your wrote that python is for intermediate tasks and R for advanced. Would you mind elaborating on this a little bit?
Great question Kirk.
Here’s my opinion and usage, and others may not agree.
R has more techniques and is more powerful, but is harder to use. Sometimes it’s a real pain, but I still love it.
Python is easy and fun to use but has fewer methods. It’s a “real” programming language so any thoughts can quickly become programs – and we like this as programmers.
I use both. I use Python for quick models more and more and for production/ops models. I use R for deep R&D one-off projects.
That being said, the market is demanding skills in Python, so I try to lean that way more and more.
Does that help?
Thanks for sharing the information. It really helps using python in applied machine language.
I’m glad to hear it.
Are there websites listing newly invented algorithms for Python or R?
I’m also interested in leaked, formerly secret algorithms :-).
Not that I am aware.
With keras and tensorflow available for R, my guess is that it’s going to get tough for Python to maintain it’s number one position.
Maybe. I think Python might be easier to work with which might make the difference.
Thanks a lot, I think I’m sure now that Python is good tool for me to learn the machine learning ,
I didn’t study computer science or something related to programing languages.
However Python syntax is readable, and if something get difficult a little bit, nothing wrong with quick search to understand more and clarify the things.
I think there is a huge Python community they are supporting each others through the web; you are best example of supportive Python community.
Thanks
I agree.
is Python still considered go to language for applied machine learning or R has grown much?
Yes.