Best Programming Language for Machine Learning

A question I get asked a lot is:

What is the best programming language for machine learning?

I’ve replied to this question many times now it’s about time to explore this further in a blog post.

Ultimately, the programming language you use for machine learning should consider your own requirements and predilections. No one can meaningfully address those concerns for you.

No one can meaningfully address those concerns for you.

What Languages Are Being Used

Before I give you my opinion, it is good to have  a look around to see what languages and platforms are popular in self-selected communities of data analysis and machine learning professionals.

KDnuggets has had language polls forever. A recent poll is titled “What programming/statistics languages you used for an analytics / data mining / data science work in 2013“. The trends are almost identical to the previous year. The results suggest heavy use of R and Python and SQL for data access. SAS and MATLAB rank higher than I would have expected. I’d expect SAS accounts for larger corporate (Fortune 500) data analysis and MATLAB for engineering, research and student use.

kdnuggets popular programming languages

The most popular platforms for machine learning, taken from the KDnuggets 2013 poll.

Kaggle offer machine learning competitions and have polled their user base as to the tools and programming languages used by participants in competitions. They posted results in 2011 titled Kagglers’ Favorite Tools (also see the forum discussion). The results suggested the abundant use of R. The results also show good use of MATLAB and SAS with much lower Python representation. I can attest that I prefer R over Python for competition work. It just feels though it has more on offer in terms of data analysis and algorithm selection.

kaggle most popular tools

The most popular tools used on Kaggle, the machine learning competition website.

Ben Hamner, Kaggle Admin and author of the blog post above on the Kaggle blog goes into more detail on the options when it comes to programming languages for machine learning in a forum post titled “What tools do people generally use to solve problems“.

Ben comments that MATLAB/Octave is a good language for matrix operations and can be good when working with a well defined feature matrix. Python is fragmented by comprehensive and can be very slow unless you drop into C. He prefers Python when not working with a well defined feature matrix and uses Pandas and NLTK. Ben comments that “As a general rule, if it’s found to be interesting for statisticians, it’s been implemented in R” (well said). He also complains about the language itself being ugly and painful to work with. Finally, Ben comments on Julia that doesn’t have much to offer in the way of libraries but is his new favorite language. He comments that it has the conciseness of languages like MATLAB and Python with the speed of C.

Anthony Goldbloom, the CEO of Kaggle gave a presentation to the Bay Area R user group in 2011 on the popularity of R in Kaggle competitions titled Predictive modeling competitions: making data science a sport (see the powerpoint slides). The presentation slides give more detail on the use of programming languages and suggest an Other category that is as close to as large as large as the usage of R. It would be nice to have the raw data that was collected (why didn’t they release it to their own data community, seriously!?).

popular languages on kaggle

Popular programming languages on Kaggle, taken from Kaggle presentation.

John Langford on his blog Hunch has an excellent article on the properties of a programming language to consider when working with machine learning algorithms titled “Programming Languages for Machine Learning Implementations“. He divides the properties into concerns of speed and the concerns of programability (programming ease). He points to powerful industry standard implementations of algorithms, all in C and comments that he has not used R or MATLAB (the post was written 8 years ago). Take some time and read some of the comments by academics and industry specialists alike. This is a deep and nuanced problem that really comes down to the specifics of the problem you are solving and the environment in which you are solving it.

Machine Learning Languages

I think of programming languages in the context of the machine learning activities I want to perform.

MATLAB/Octave

I think MATLAB is excellent for representing and working with matrices. As such, I think it’s an excellent language or platform to use when climbing into the linear algebra of a given method. I think it’s suited to learning about algorithms both superficially the first time around and deeply when you are trying to figure something out or go deep into the method. For example, it’s popular in university courses for beginners, like Andrew Ng’s Coursera Machine Learning course.

R

R is a workhorse for statistical analysis and by extension machine learning. Much talk is given to the learning curve, I didn’t really see the problem. It is the platform to use to understand and explore your data using statistical methods and graphs. It has an enormous number of machine learning algorithms, and advanced implementations too written by the developers of the algorithm.

I think you can explore, model and prototype with R. I think it suits one-off projects with an artifact like a set of predictions, report or research paper. For example, it is the most popular platform for machine learning competitors such as Kaggle.

Python

Python if a popular scientific language and a rising star for machine learning. I’d be surprised if it can take the data analysis mantle from R, but matrix handling in NumPy may challenge MATLAB and communication tools like IPython are very attractive and a step into the future of reproducibility.

I think the SciPy stack for machine learning and data analysis can be used for one-off projects (like papers), and frameworks like scikit-learn are mature enough to be used in production systems.

Java-family/C-family

Implementing a system that uses machine learning is an engineering challenge like any other. You need good design and developed requirements. Machine learning is algorithms, not magic. When it comes to serious production implementations, you need a robust library or you customize an implementation of the algorithm for your needs.

There are robust libraries, for example, Java has Weka and Mahout. Also, note that the deeper implementations of core algorithms like regression (LIBLINEAR) and SVM (LIBSVM) are written in C and leveraged by Python and other toolkits. I think you are serious you may prototype in R or Python, but you will implement in a heavier language for reasons such as execution speed and system reliability. For example, the backend of BigML is implemented in Clojure.

Other Concerns

  • Not a Programmer: If you are not a programmer (or not a confident programmer) I recommend playing machine learning via a GUI interface like Weka.
  • One Language for Research and Ops: You may want to use the same language for prototyping and for production to reduce risk of not effectively transferring the results.
  • Pet Language: You may have a pet language of favorite language and want to stick to that. You can implement algorithms yourself or leverage libraries. Most languages have some form of machine learning package, however primitive.

The question of machine learning programming language is popular on blogs and question and answer sites. A few choice discussions include:

What programming language do you use for machine learning and data analysis why do you recommend it?

I’m keen to hear your thoughts, leave a comment.

69 Responses to Best Programming Language for Machine Learning

  1. Avatar
    jmgore75 June 6, 2014 at 11:49 pm #

    I am admittedly new to ML but have recently had the opportunity to try it with R, python, and Matlab. You can divide up the problem into different parts. In all cases, it’s a good idea to go beyond the basic installation: for R, you want RStudio as an IDE; for python, IPython notebooks and several major libraries are a must; and Matlab is much nicer to work in than Octave.

    1. Data input, output, preprocessing, and postprocessing: Python, hands down. It’s all fine and good if you are just dealing with CSVs but that is often not the case, so in the real world python is quite handy. Frankly, there are few languages better at this than python, and it is surely a big part of its popularity.

    2. Pre-built algorithms: Looks like R, although python’s scikit-learn is better organized.

    3. Novel algorithms: Still probably R.

    4. Plotting: All have multiple excellent plotting packages. R is particularly broad.

    5. Exploration: R (with RStudio) or IPython are both very good. R is probably a bit better, since it handles matrices better. IPython makes it easy to record and rerun your efforts.

    6. Teaching: Matlab/octave has the most concise expression of matrix operations, so for many algorithms it is the one of choice. I kind of wonder about tree structures though.

    7. Sharing and dissemination: IPython notebooks are pretty nice and don’t require viewers to install anything. R vignettes are good if they have R and the proper libraries installed.

    8. Performance: I can’t really say for sure, as I have not properly tested. Python is the only one of the three in which out-of-core or online processing is particularly natural to express, thanks to generators, as far as I can tell. There are many interesting code performance initiatives in place for Python. Other languages should obviously perform better (C, java; as noted Julia is particularly interesting).

    • Avatar
      jasonb June 7, 2014 at 7:03 am #

      Really great comments, thanks. R is my go-to platform when I’m looking to get the most out of a problem.

      I’ve explored using theano with Python on GPUs and played a lot with various parallel packages on R to get speed-ups. In the end, I’ve found rolling my own implementation the best when speed is the highest priority.

  2. Avatar
    Lifestyle Service Agency November 18, 2014 at 4:39 pm #

    Thanks for the article! Weka is now a part of our toolkit.

  3. Avatar
    Mark Szlazak March 1, 2015 at 8:25 am #

    Another language to consider is Lua. Specifically the LuaJIT implementation with Touch7. This is what Google and Facebook AI groups use, probably because they hired a folks from Yann LaCun’s lab. Torch7 has been extended further with more ML stuff produced at Facebook and they have made it available to the public. Probably check out stuff on why Lua/LuaJit over Python and LuaJIT’s interface with c-code. Also, LuaJIT is used a lot by gamers and I heard LuaJIT (or was it Lua) will replace Action Script in Adobe’s products.

  4. Avatar
    Frank August 26, 2015 at 1:26 pm #

    Hi Jason, thanks for your nice introduction. Do you have any good books on machine learning in C?

    • Avatar
      Jason Brownlee August 26, 2015 at 6:58 pm #

      Sorry, no, not off the top of my head. I can say that there are great libs written in c like libsvm that are often used via wrappers in python or R. Learning the native lib in c might be a fun experience!

  5. Avatar
    Portella October 8, 2015 at 12:18 am #

    Hi Jason. I’m new to machine learning. I’ve gone through the AI online course from Berkeley and plan to go through Yaser Abu-Mostafa “Learning from Data”. It is a language agnostic course however, which, according to what was stated in some reviews, demands intense effort in implementing algorithms by ourselves, without guidance. I like this approach, since it really forces one to research and deal with real challenges of implementation, not just concepts. The problem is that my language of choice, for other reasons, is C#, which I don’t see listed, here and elsewhere, among used languages for machine learning. I have limited experience with python, from the AI and linear algebra courses, which made most of the framework available.
    The question is: how far apart is C# from Python, in terms of libraries useful for machine learning? How would it compare to Java, in the same terms?
    Should I use a language like Python to develop machine learning code and make it interact with C# code, considering it will continue to be my main developing language? What about Accord.Net? Is it any good?

  6. Avatar
    Will Dwinnell April 2, 2016 at 3:11 pm #

    You make several good points about context. I would add that there is a dimension which runs from “scripting” (summoning existing machine learning routines) to “programming” (writing the machine learning routines oneself). Some languages lend themselves more to one of these operations more than the other. In SAS, for instance, analysts tend to call existing SAS “procs”: They are not writing logistic regression from scratch.

    If a script-writing analyst and I fit such the same model form to the same data, we will get the same model parameters. The differences are that I know how and why that modeling process works (and when it won’t), and I can modify it directly when needed.

  7. Avatar
    Victor October 21, 2016 at 7:11 pm #

    Without a doubt – Python.

    • Avatar
      Jason Brownlee October 22, 2016 at 6:56 am #

      The Python ecosystem is growing fast and seeing great adoption.

      I tend to agree that Python is a force Victor.

  8. Avatar
    Steeve Brechmann January 24, 2017 at 1:23 am #

    A little update to this question 😉

    Python is leading the way.

    http://www.kdnuggets.com/2017/01/most-popular-language-machine-learning-data-science.html

  9. Avatar
    Nandhini February 19, 2017 at 11:59 am #

    Hi Jason,
    When I started a Data Science course, I had two choices Python or R. As always I have a passion on programming, I chose Python and worked on it through out the course. Though in the course series, they preferred R for Time series, I was following your Blog on Time Series using Python.
    Some friends are suggesting AndrewsNg course in Coursera as a next step. But I felt as a newbie to Machine Learning field, I would stick to one language and get used to various Algorithms using it. Once comfortable, then i can explore more into R and MatLab.

    What do you Suggest?

    • Avatar
      Jason Brownlee February 20, 2017 at 9:26 am #

      Sounds fine Nandhini, generally, you should get comfortable jumping from tool to tool or platform to platform, but not when starting out.

      Regardless of tool, the skill to focus on is working through predictive modeling problems end to end and delivering a result (model or set of predictions).

  10. Avatar
    Paulo August 18, 2017 at 8:23 pm #

    Hi Jason,

    At this moment I’m using scikit-learn in production and it’s working with very good performance.

    I recommend scikit-learn.

  11. Avatar
    César Souza August 19, 2017 at 7:38 am #

    Even me (I am the author of Accord.NET mentioned a few comments above) I use scikit-learn on a daily basis for production use at work. However, if for any reason you or any of your blog readers would like to use machine learning in contexts where Python just wouldn’t be available (such as embedded devices through Xamarin, UWP apps or even Java), please give Accord.NET a try.

    If you find issues in your application, or something that you believe should have been done better, register it at the project’s issue tracker and it should be taken care of in no time. The goal of this project is also to address platforms which have not been historically been served very well by Python-only implementations.

  12. Avatar
    dell September 3, 2017 at 11:03 pm #

    Hi Jason, how to implement competitive learning algorithm using R? thanks for your time.

  13. Avatar
    Levi September 27, 2017 at 6:15 pm #

    Hi Jason,

    I guess we should take a look at the latest poll (to my best knowledge) from Kaggle: http://www.kdnuggets.com/2017/01/most-popular-language-machine-learning-data-science.html

    And notice that, yes, Python did take the lead.

  14. Avatar
    Roberto mariani November 17, 2017 at 10:55 pm #

    In c+*, I found dlib and it comes with tons of examples very well commented. You can also run them on GPU.

  15. Avatar
    Ritesh January 7, 2018 at 10:00 am #

    What about the Microsoft Azure Machine language? I am new in the ML domain. How about if I start with Azure ML. I do not have any knowledge of R or python. Please suggest

  16. Avatar
    Jesús Martínez April 4, 2018 at 12:19 am #

    I think that as happens with many other computer science fields, there’s an excessive focus on the language and the tools when the important stuff, really, is to know the theory well. I agree that there are languages with a richer ecosystem for data science and machine learning, as is the case of Python and R, but I think the domain of the project you want to start will lead you to a particular set of tools that are better suited for the demands of that endeavor. For instance, if you need to work with massive amounts of data, you’ll be better off using Apache Spark and Spark MLLib than, say, sklearn 🙂

    What do you think? I’d love to know!

    Thanks for the article!

  17. Avatar
    naima April 26, 2018 at 4:58 pm #

    shall i implement machine learning algorithms in java script? plz guide me .. how can i implement it?

    • Avatar
      Jason Brownlee April 27, 2018 at 6:01 am #

      Perhaps for fun and for learning, but I would not recommend using Java Script to implement machine learning to solve business problems. I cannot see how it could be justified.

      • Avatar
        Michael Vogt May 7, 2018 at 11:34 am #

        I think this depends on the business. When you’re talking about Enterprise, I absolutely agree.

        But with tensorflow.js now available, having AI functionality available on phones without network dependency opens up lots of new applications for machine learning.

        • Avatar
          Jason Brownlee May 8, 2018 at 6:08 am #

          Heavy compute is not something you want on a hand held. It kills battery.

  18. Avatar
    Ravi Salunkhe May 12, 2018 at 12:38 pm #

    I’d probably stick to R and Java.

  19. Avatar
    Shalmali Bapat June 9, 2018 at 4:00 pm #

    As a software developer i know IT world is itself quite dynamic. With new and upcoming changes in computer programming languages, frameworks and technologies language trends are ever changing. We developers must remain with updated changes. So i was looking to learn some languages which will be beneficial for me future.Thank You.

  20. Avatar
    priya June 27, 2018 at 3:28 pm #

    Java, Python, Lisp, Prolog, and C++ are major AI programming language used for artificial intelligence capable of satisfying different needs in development and designing of different software. It is up to a developer to choose which of the AI languages will gratify the desired functionality and features of the application requirements.

  21. Avatar
    Vivek June 28, 2018 at 11:39 pm #

    Hi Jason,

    I have just started exploring ML but i am planning to prepare for the new offering from Oracle AI Platform for which many details are not available but it is mentioned that it will be supporting Keras, Caffe and TensorFlow.

    Shall I start exploring Python or R?

  22. Avatar
    Aritra Chatterjee July 6, 2018 at 2:08 pm #

    its been 5 years, entirely different picture now.Python has replaced R in the above images.

  23. Avatar
    ramakrishnan July 12, 2018 at 4:43 pm #

    Thanks for sharing this information.The programming languages are very important to improve machine learning.

  24. Avatar
    shubhangi July 17, 2018 at 2:16 am #

    can you plz give me detailed infomation of machine learning and data scientist using python which one is better

  25. Avatar
    Julia January 10, 2019 at 12:23 am #

    Julia, She crushes them all. 🙂

  26. Avatar
    Abbas April 24, 2019 at 1:27 pm #

    Hi, your blogs are really help full. I have a question I want to know how can i compare the machine learning results out put from different plateforms. For example suppose i have results of some models in python, I dont know python but I want to compare my own model results which is written in Java. Is there any way to do that?

    • Avatar
      Jason Brownlee April 24, 2019 at 1:59 pm #

      Perhaps output predictions to a file, then use a new application to load predictions from each model/platform and perform comparisons?

      • Avatar
        Abbas April 26, 2019 at 11:29 am #

        Thanks for your answer but here I wanted to ask different question.I mean when we use ML models we need to use random numbers for sampling or initialization purpose. Is it okay to just use different languages then. Is there any big role of random number generators?. If yes how can we achieve the same results on different plateforms. Thanks in advance.

        • Avatar
          Jason Brownlee April 26, 2019 at 1:58 pm #

          Small differences in the implementation of an algorithm across libraries can result in differences of results.

          I recommend using one tool to prepare the data that is then used across different languages, for consistency.

          • Avatar
            Abbas April 26, 2019 at 4:44 pm #

            Thank you so much Jason. Have a nice day.

  27. Avatar
    Patxi Funes May 18, 2019 at 11:33 pm #

    Hi Jason,

    Just a little request: five years later since you did this article there have be many changes. For instance Matlab has evolved substantially, python has become a standard (even, I think, in your blog)… Do you think that this article needs a new version? Many thanks!!

  28. Avatar
    Anthony The Koala July 26, 2019 at 1:46 pm #

    Dear Dr Jason,
    Suppose you write a program in Python involving libraries for machine learning. Is there a way of converting a Python-written program into another language like C or Java in order to improve the execution speed of the program?
    Thank you,
    Anthony of Sydney

    • Avatar
      Jason Brownlee July 26, 2019 at 2:20 pm #

      There are many options.

      You can use cython to speed up python code.

      You can use c against the same or similar enough backend libs.

      You can re-write everything from scratch.

  29. Avatar
    Aravind February 21, 2020 at 9:57 pm #

    I am a bioinformatician, i m interested to learn machine learning through genomics, but i have to know where am i need to begin, is biopython packages are best or not? And is this the valuable one to have a job in genomics via machine learning?

  30. Avatar
    Gaston March 5, 2020 at 7:13 am #

    Swift is becoming big with S4TF becoming popular.

  31. Avatar
    Jean-Christophe Chouinard February 25, 2021 at 11:51 am #

    Not sure about machine learning, but I know that learning Python was far more useful to me than R.

    R was limited in available resources whereas Python’s community has produce more reusable code that you can build upon.

    Also, Python is more flexible. Once you learn it you can build websites, automate processes, even build robots if you want. R might let you do that, but you’ll have to build most of it because the community is not as active.

    Also, Python forces structure upon you. It might seems annoying at first, but it will help write better code over time.

    Having learned R first, and Python after, I highly recommend Python.

  32. Avatar
    Shaun March 11, 2023 at 9:02 am #

    Is Python still the undisputed King? How do all these automated and NoCode ML tools fit into everything now?

    • Avatar
      James Carmichael March 11, 2023 at 10:23 am #

      Hi Shaun…Python is still the way to go. We beleive that it is important to continue to build skill in Python for machine learning, however it is also beneficial to understand emerging tools to increase efficiency.

Leave a Reply