[New Book] Click to get The Beginner's Guide to Data Science!
Use the offer code 20offearlybird to get 20% off. Hurry, sale ends soon!

4 Self-Study Machine Learning Projects

There are many paths into the field of machine learning and most start with theory.

If you are a programmer then you already have the skills to decompose problems into their constituent parts and to prototype small projects in order to learn new technologies, libraries and methods. These are important skills for any professional programmer and these skills can be used to get started in machine learning, today.

These are important skills for any professional programmer and these skills can be used to get started in machine learning, today.

Self Study

Self Study
Photo by gfairchild, some rights reserved

You must learn the theory to be effective in machine learning, but you can use your interests and thirst for knowledge motivate you from working examples into mathematical understandings of algorithms.

In this post you will learn four strategies a programmer can follow to get started in machine learning. This is the path of the technician, which is practical and empirical and will require you to perform research and complete experiments in order to build up your own intuitions.

The four strategies are:

  1. Study a Machine Learning Tool
  2. Study a Machine Learning Dataset
  3. Study a Machine Learning Algorithm
  4. Implement a Machine Learning Algorithm

Read through these strategies and select one that you feel suits you the best, then execute with abandon.

1. Study a Machine Learning Tool

Select a tool or library that you like and learn how to use it well.

I recommend you start with an environment that provides tools for data preparation, machine learning algorithms and the presentation of results. Learning an environment like this will allow you to get good at the process of machine learning end-to-end which is more valuable to you than learning a specific data preparation technique or machine learning algorithm.

Alternatively, perhaps you are interested in a specific technique of family of techniques. You could use this as an opportunity to deep dive into a library or tool that offers these methods and master the technique by mastering the library that supplies access to the technique.

Study a Machine Learning Tool

Study a Machine Learning Tool
Photo by zzpza, some rights reserved

Some tactics you could follow for this strategy are:

  • Compare and contrast candidate tools from which you could choose.
  • Summarize the capabilities of your chosen tool.
  • Read and summarize the documentation for the tool.
  • Complete text or video tutorials for the tool and summarize the key learning points for each tutorial you complete.
  • Create tutorials for features or capabilities of the tool. Select things that you don’t know much about and create write a process for getting a result or record a 5-minute screencast on how to use the feature.

Some environments you should consider include: R, Weka, scikit-learn, waffles, and orange.

2. Study a Machine Learning Dataset

Select a dataset and understand it intimately and discover which algorithm class or type addresses it the best.

I recommend you select a modest sized dataset that fits into memory that may have been well studied before. There are excellent libraries of data sources available for you to browse and choose. Your objective is to understand the underlying problem that the data source represents, the structure in the dataset and the types of solutions that are most suited to the problem.

Use a machine learning or statistical environment to study the dataset. This will allow you to focus on the questions you are seeking to answer about the dataset rather than being distracted with learning about a given technique and learning how to implement it in code.

Study a Machine Learning Dataset

Study a Machine Learning Dataset
Photo by abhidg, some rights reserved

Some tactics that can help you with your study of an experimental machine learning dataset are:

  • Clearly describe the problem that the dataset represents.
  • Summarize the data using descriptive statistics.
  • Describe the structures you observe in the data and hypothesize about the relationships in the data.
  • Spot test a handful of popular machine learning algorithms on the dataset and discover which general class performs better than others
  • Tune well-performing algorithms and discover the algorithm and algorithm configuration that performs well on the problem

Some repositories of high-quality datasets you may like to consider are: UCI ML Repository, Kaggle and data.gov.

3. Study a Machine Learning Algorithm

Select an algorithm and understand it intimately and discover parameter configurations that are stable across different datasets.

I recommend that you start with an algorithm of modest complexity. Select an algorithm that is well understood, has many open source implementations from you to choose from and has few parameters for you to explore. Your objective is to build up intuitions for how the algorithm performs across a range of problems and parameter configurations.

Use a machine learning environment or library. This will allow you to focus on the behaviors of the algorithm as a “system” as opposed to concerning yourself with canonical mathematical descriptions and reference literature.

Study a Machine Learning Algorithm

Study a Machine Learning Algorithm
Photo by Unhindered by Talent, some rights reserved

Some tactics you can use when studying your chosen machine learning algorithm are:

  • Summarize the parameters of the system and the expected influences they have on the algorithm.
  • Select a range of datasets suited to the algorithm that are likely to elicit varied behaviors.
  • Select algorithm parameter configurations that you believe will elicit varied behaviors from the system and list the behaviors you may expect from the system.
  • Consider the behaviors of an algorithm that could be monitored as the algorithm is run over iterations of the algorithms update process or other interval of time.
  • Design small experiments using one or more combinations of datasets, algorithm configurations and behavior measures in order to answer a specific question and report results.

Your studies can be as simple or as complex as you like. At the higher-end you can explore so-called heuristics or rules of thumb for applying algorithms and empirically demonstrate whether they have merit and if so under what circumstances they correlate with successful outcomes.

Some algorithms you may consider to start with include: least squares linear regression, logistic regression, k-nearest neighbor classification, perceptron

4. Implement a Machine Learning Algorithm

Select an algorithm and implement or port an existing implementation to a language of your choice.

Select an algorithm of modest complexity to implement. I recommend performing some detailed research on the algorithm you which to implement, or select an implementation you like and port it to your chosen target programming language.

Implementing an algorithm by hand from scratch is a great way to learn about the myriad of micro-decisions that have to be made in transforming an algorithm description into a functioning system. By repeating this process with multiple algorithms you will quickly gain an intuition for how to read the mathematical descriptions of algorithms in research papers and books.

Implement a Machine Leaning Algorithm

Implement a Machine Learning Algorithm
Photo by Nic’s events, some rights reserved

Five tactics that may help you when implementing machine learning algorithms from scratch are:

  • Start by porting. Porting an open source algorithm implementation from one language to another will teach you how the algorithm is implemented and make it your own. It is the fastest way to get started and is highly recommended.
  • Select one algorithm description to work from and collect other algorithm descriptions to support your disambiguation of the primary reference material
  • Do not be afraid to reach out to algorithm authors, paper authors or even algorithm implementation authors to ask questions to help you disambiguate your understanding of the algorithm description.
  • Read lots of implementations of your target algorithm. Learn how different programmers interpret the algorithm description and turned it into code.
  • Do not get caught up on advanced methods. Many machine learning algorithms use advanced optimization methods in their core. Do not try to reimplement these methods unless that is the point of your project. Use a library that provides an optimization algorithm or use a simpler optimization algorithm that is easy to implement (like gradient descent) or is available to you in a library.

Small Projects Methodology

The four strategies being to a methodology I call “small projects”. It is an approach you can use to very quickly build up practical skills in technical fields of study, like machine learning. The general idea is that you design and execute on small projects that target a specific question you want to answer.

Small projects are small in a few dimensions to ensure that they completed and that you extract the learning benefits and move onto the next project. Below are constraints you should consider imposing on your projects:

  • Small in time: A project should not take any longer than 5-15 hours from inception to presentation of results. This will allow you to complete a small project in a week of nights and weekend time away from your 9-5 job.
  • Small in scope: A project should address the most narrow version of the question you are interested in and still be meaningful. For example, rather than addressing the problem “write a program that will tell me if tweet will be retweeted” in the general case, address the problem just for a specific twitter account for a given time period.
  • Small in resources: A project should be able to be completed on your desktop or laptop with a connection to the internet. You should not need exotic software, web infrastructure, or third party data or service. Collect the data you need to file, load it into memory and attack your narrow question using open source tools.

Additional Project Tips

The principle of these strategies is to take action and make use of your programmer skill set. Below are three tips to help you adjust your mindset in order to take action:

  • Write down what you learn. I recommend that you have a tangible work product for every step you take. This could be a note in a journal, a tweet, a blog post or an open source project. Each work product acts as an anchor and a milestone.
  • Do not write code unless that is the purpose of the project. This tip is not obvious but may be the biggest in terms of accelerating your understanding of machine learning.
  • The goal is for you to learn something not to create a unique resource. No one will read your studies or tutorials or notes on an algorithm, ignore this for now. They are your perspective and your work product to demonstrate that you now know something.

Summary

Here are the size strategies again with a clear one-liner for each to help you choose the one that is right for you.

  1. Study a Machine Learning Tool: Select a tool or library that you like and learn how to use it well.
  2. Study a Machine Learning Dataset: Select a dataset and understand it intimately and discover which algorithm class or type addresses it the best.
  3. Study a Machine Learning Algorithm: Select an algorithm and understand it intimately and discover parameter configurations that are stable across different datasets.
  4. Implement a Machine Learning Algorithm: Select an algorithm and implement or port an existing implementation to a language of your choice.

Pick One!

Which strategy would you choose and what will be your first step? Pick one and declare your intentions in a comment below.

115 Responses to 4 Self-Study Machine Learning Projects

  1. Avatar
    Christal-yhy January 11, 2014 at 11:43 pm #

    I am trying to learn the machine learning, but i do not know how to start it

  2. Avatar
    halfcrazy February 21, 2014 at 8:10 am #

    Brilliant

    • Avatar
      jasonb February 21, 2014 at 8:24 am #

      Thanks, I’m glad you like it. Let me know if you take on a project.

      • Avatar
        Bright April 14, 2014 at 10:29 pm #

        i like it, i pick 3

  3. Avatar
    Ben May 8, 2014 at 10:45 am #

    Very useful post, Thanks json!!!

    • Avatar
      jasonb May 8, 2014 at 12:03 pm #

      Glad to here it Ben, thanks.

  4. Avatar
    Muhammad Masood May 21, 2014 at 10:36 am #

    Hi,

    Very useful and informative. I usually follow the similar pattern and always try to port first because it gives me confidence.

    Thanks

    • Avatar
      jasonb May 21, 2014 at 1:47 pm #

      Great tip, thanks Muhammad.

  5. Avatar
    Laurent June 28, 2014 at 4:59 pm #

    Great post Jason. Very useful set of tips. Thanks

  6. Avatar
    anthony August 17, 2014 at 10:18 pm #

    Good stuff…

    Another small project is to scale a section of code — recently ran into scenario where sample/training/experimenting data set worked “fine” but when working with full data was too slow to be of use. So, focused simply on efficiency of something known to work and produce desired outcomes.

    cheers

  7. Avatar
    Anshul August 24, 2014 at 3:01 am #

    Thanks Jason for such lucid description of such an interesting domain like Machine Learning. I’m mulling on all the four ways but hopefully I’m feeling starting with 1 and transitioning into 4 will be really interesting… Thanks a lot for the insight!

  8. Avatar
    Andreas February 6, 2015 at 11:26 pm #

    Thanks for the reference in your email. I really like the small project methodology, I will definitely try it.

    Your insight that you should not write code unless it is the purpose of the project might get me over my obsession to always create reusable code. I hope it works!

    Thanks for the tips.

    • Avatar
      Jason Brownlee February 7, 2015 at 6:27 am #

      Thanks Andreas, good luck mate!

      • Avatar
        Sunday March 10, 2023 at 6:02 pm #

        This write up is helpful and I am encouraged that I can achieve my goal in ML. Thank you, Jason.

        • Avatar
          James Carmichael March 11, 2023 at 7:53 am #

          You are very welcome Sunday! We appreciate the feedback and support.

  9. Avatar
    Henok(Ethiopia) April 3, 2015 at 9:08 pm #

    Nice tips!!!! Many thanks!Nice

  10. Avatar
    kavs June 2, 2015 at 8:03 pm #

    Great post..Thank u very much!

  11. Avatar
    haccks June 10, 2015 at 7:11 am #

    I was lost . Thanks for an eye opener article.

  12. Avatar
    simran October 2, 2015 at 7:17 pm #

    It is a very nice n useful
    article ..
    can u pls mail me a project on pattern recognition and machine learning preferably in python language
    I need this for a reference to my project…

    It will be very helpfull

    Thanks and regards

  13. Avatar
    Tweetman October 3, 2015 at 6:48 pm #

    Great post I am moving towards 2

  14. Avatar
    Gianfranco November 7, 2015 at 10:27 pm #

    Hi,
    Really great post thanks exactly what I was looking for. Moreover, I have just bought the “Small Project Methodolgy” but I have not received any download link . Please provide it.

    Regards
    Gianfranco

  15. Avatar
    Aman Tandon November 12, 2015 at 1:06 pm #

    your every writing is awesome.

  16. Avatar
    MASUME November 20, 2015 at 8:42 pm #

    Hello Jason
    your website is Awesome!! and i really utilize and enjoy your posts. Thanks

  17. Avatar
    Tebziro January 5, 2016 at 6:56 am #

    Super informative stuff. Major Thanks!!!

  18. Avatar
    Ankit March 1, 2016 at 9:51 pm #

    Helpful and the links are great places to learn from.

  19. Avatar
    karan March 10, 2016 at 3:26 am #

    I am a begineer.Please help me decide a project

  20. Avatar
    Archana Chauhan March 11, 2016 at 6:25 am #

    Hey Thanks.. I need the compiler dataset for my project which is “Compiler optimization using machine learning”. I have searched everywhere.. i didn’t find it… If you know then Help me please.. it’ll really helpful

  21. Avatar
    HoaLoThai March 22, 2016 at 1:20 pm #

    I want to install one server to serve the research H2O machine learning
    are looking forward to a little help from you
    thank

    • Avatar
      ij888 October 23, 2016 at 5:25 pm #

      Hey, have you made progress with your H2O machine learning research? How is it coming along?

  22. Avatar
    Shreya May 28, 2016 at 6:31 pm #

    this will be my first ml project, that i’m doing by myself, the tools i decided to focus on are scikit-learn, and weka (java) i’m think i’ll build a spam filter as i am a beginner,
    which one would you advice python or java?
    i know both languages pretty well, thanks.

    • Avatar
      Pranjal Saxena January 9, 2017 at 2:02 pm #

      python!
      Because it has vast libraries to work on!

  23. Avatar
    anum June 4, 2016 at 3:52 am #

    can u please suggest me a small project on weka

  24. Avatar
    Jim Kitzmiller June 24, 2016 at 10:53 am #

    This is very helpful, Jason. Thank you very much.

    I chose Weka.

  25. Avatar
    Usman July 8, 2016 at 10:07 pm #

    Hi Jason,

    MachingLearningMastery.com show us how to be better humans.

    By sharing.

    You have advancing man and machine tremendously.

    May your children learn from you.

  26. Avatar
    Abdulkabir Ojulari July 14, 2016 at 1:08 pm #

    Please I’m a novice, nevertheless willing to learn. I just want to enroll for my PhD and want something related to machine learning. I don’t know if anyone can help me with the research topics.
    This is a topic I have in mind; “Market value of a product using Machine Learning techniques” I don’t know if it is qualified as a topic for PhD in the field of machine learning.
    My aim is to study a particular product for the period of time and determine the future demands of such product based on materials, patronage, price etc

    • Avatar
      Jason Brownlee July 14, 2016 at 2:00 pm #

      The best person to talk to about phd topics is your advisor.

  27. Avatar
    Grace July 19, 2016 at 11:46 pm #

    Hello,
    Highly useful post.I am a beginner in machine learning and was looking forward for a proper strategy to follow.Now I feel greatly helped through this amazing post of yours.Thanks so much!:):):):)

  28. Avatar
    Eudie July 20, 2016 at 4:30 pm #

    Thank you Jason, You are awesome. I will go with “Study a Machine Learning Algorithm” and that would be perceptron.

  29. Avatar
    Khalid Ibrahim July 28, 2016 at 4:34 pm #

    Great effort Jasn;
    Anticipated Thanks. I have started learning weka.

  30. Avatar
    Navid Khoob August 15, 2016 at 1:57 am #

    Hello Everyone and Dear Admin (Mr. Jason Brownlee)

    I am a master’s degree student in Electronics but by making use of Machine Learning Algorithm for Data Fusion. I was looking for some ans and fortunately found very useful topics in your webpage. Now, I have a question. I am working on a research project which is addressed in supervised learning structure and It is basically a Classification problem using Ensemble learning system that combines base classifiers in belief function framework (Dempster-Shafer Theory). I am looking for an applicable database compatible with my project for handling data with imperfect labels. Could you suggest a suitable database for my work that it would be new and challenging in trends ?

    Thank You in Advance

  31. Avatar
    Amit September 18, 2016 at 4:00 am #

    Hi Jason could please help me in data fusion domain I wanted to implement one of the machine learning algorithms, any good reference. Can mail if possible?

    • Avatar
      Jason Brownlee September 18, 2016 at 8:01 am #

      Sorry Amit, I don’t know about data fusion.

  32. Avatar
    Nill November 2, 2016 at 4:27 am #

    highly helpful this post

  33. Avatar
    Mohammad Sami Usmani November 6, 2016 at 1:25 am #

    Very helpful , thankyou!

  34. Avatar
    vishwvir December 8, 2016 at 4:42 am #

    I want to build a project in machine learning please guide me any good or simple topic.

  35. Avatar
    Sergei S May 30, 2017 at 10:00 pm #

    “Do not write code unless that is the purpose of the project” – could you please elaborate on this? Not clear for me.

    • Avatar
      Jason Brownlee June 2, 2017 at 12:35 pm #

      Use machine learning libraries and other libraries as much as possible, do not code things from scratch unless you want to – to learn how to.

  36. Avatar
    Sagar Sarkar July 10, 2017 at 1:36 am #

    Study a Machine Learning Tool:

  37. Avatar
    Abdulmahmoud Umar Adam July 16, 2017 at 5:38 pm #

    I picked number 1: study a machine learning tool.

    • Avatar
      Jason Brownlee July 17, 2017 at 8:45 am #

      Very nice!

      • Avatar
        Sanit Rajula September 10, 2017 at 10:17 pm #

        This article is very useful and informative.
        I am a student and i chose to do project on text classifiers, could you mail me an example of text classifiers so that i can use it as reference for my project.
        It is quite tough understanding the algorithm and implementing it, so it would be a great help if u have some links where i can study about this.

        • Avatar
          Jason Brownlee September 11, 2017 at 12:07 pm #

          I will have posts on text classification on the blog soon. They are scheduled.

  38. Avatar
    Shreya October 12, 2017 at 9:13 pm #

    I checked how to make a small naive bayes code to fit a classifier and predict it but in that I only gave a small array of features to fit and label and the accuracy was also based on that. How do I use the UCI ml datasets for fitting a classifier.

    I will be very thankful if you can please help me. I am a novice in Machine Learning.

    Thanks

  39. Avatar
    Saraa November 23, 2017 at 3:03 am #

    I pick the last one. I work on RNN and want to implement it.

  40. Avatar
    Muruganand December 12, 2017 at 1:07 am #

    Now I have more confident about ML and very keen to learn ML . Please keep post more like this.

  41. Avatar
    Mohammed Ali January 23, 2018 at 1:02 am #

    can you email a small project problem for the starters

  42. Avatar
    Jagruti February 3, 2018 at 4:55 am #

    I have to do six month research for my last semester in college . I want to do that in text summarization in machine learning using python. I did followed some tutorials and have an idea about machine learning algorithm. Can you please mail me some projects for text summarization ? How should I proceed with the research?

  43. Avatar
    Jesús Martínez February 13, 2018 at 3:32 am #

    I think the strategy that I find the most appealing is picking some dataset that looks promising/interesting and start from there. I also like the approach of implementing an algorithm from scratch because it is the software equivalent of disarming a device to know its inner pieces and how they work and then assembling it back again!

    Do you still use the approach described in this post even though you have many years of experience? Or have you developed quicker/better methodologies to grasp concepts more efficiently?

    • Avatar
      Jason Brownlee February 13, 2018 at 8:06 am #

      Good question.

      Yes, I still code things from scratch to understand them. I still apply algorithms to datasets in order to learn how to use them effectively.

  44. Avatar
    Palak Gupta February 13, 2018 at 6:21 am #

    I like your article very much . I am new to ML want to know about the tools and dataset to be used . I want to create a mini project on ML. I want to work using python language . Please guide me for this and provide me the basic ideas .
    Thank you , for sharing .

  45. Avatar
    Amit Kotkar May 7, 2018 at 6:02 pm #

    You can find a list of machine learning projects here : https://deeplink.ml/projects/

  46. Avatar
    shiv June 14, 2018 at 7:25 am #

    awesome work jason

  47. Avatar
    Ahmed July 25, 2018 at 4:58 pm #

    Thanks a lot for providing us these strategies .
    I think I’m choosing the first one : Study a Machine Learning Tool.
    because I’m trying to find small real project that I can walk through using a library to understand how the machine learning really works.

  48. Avatar
    Jinesh Jain July 29, 2018 at 4:02 am #

    thanks

  49. Avatar
    gouri October 24, 2018 at 5:08 pm #

    Thank you so much. Feeling so motivated after reading the post. Each and every bit of the post is so precious! Thanks again.

  50. Avatar
    Jon November 1, 2018 at 2:25 pm #

    It’s gorgeous!!!Jason…Btw I like your writting style a lot!

  51. Avatar
    Maheedar November 14, 2018 at 4:51 am #

    Amazing website! <3

  52. Avatar
    Rafiu Mope Isiaka April 8, 2019 at 3:14 am #

    Thank Jason for this great leverage. I will start with the number 3 option.

  53. Avatar
    Luciana April 13, 2019 at 6:37 am #

    Thank you for all the information, you have no idea how useful are all your post and how much I’ve been learning thanks to you.

  54. Avatar
    Naseem Ansari November 12, 2019 at 11:09 pm #

    Nice and very helpful article. It gave me a path to follow. I will work on all the four strategies and will update soon

  55. Avatar
    Akash Gupta November 20, 2019 at 11:38 am #

    I like it. I pick 4th.

  56. Avatar
    Shruti pandey January 8, 2020 at 3:21 am #

    Thank you so much Jason. You just gave me a path to follow and I will keep on coming to these steps over and over again until I learn the basics of all four.

    Btw, wanted to write one thing. There is a person named Jason Stephenson who had some wonderful guided meditation and helped me calm my anxiety and the next Jason (you) has helped to calm my anxiety related to Machine Learning.

    Thank you so much.

  57. Avatar
    Rajasekhar September 4, 2020 at 1:24 am #

    You are such a wonderful guide Jason. Thanks for this valuable information.

  58. Avatar
    Anil Singh September 30, 2020 at 7:11 am #

    Hi Jason, this is one of the best posts I have read for people wants to get into Machine learning.

    I was struggling to understand what do I need to learn to start understanding machine learning but you have made so clear.

    I am a lot more clear now.

    Thanks again for such a wonderful post.

  59. Avatar
    F. M. Shakirullah June 3, 2021 at 3:25 am #

    Dear Jason, thanks for your nice post. Really you every post is very helpful for me.
    I am interested to start with option 2 that is studying a Machine Learning Dataset.
    Best regards

    • Avatar
      F. M. Shakirullah June 3, 2021 at 3:30 am #

      * Your every post is helpful for me. Thank you Jason.

    • Avatar
      Jason Brownlee June 3, 2021 at 5:38 am #

      Thanks!

  60. Avatar
    Anandan Subramani January 19, 2022 at 9:19 pm #

    Very good suggestion,
    specially laying out a project plan at the outset that is measurable in objective, duration, scope etc
    and a tangible definition of expected outcome.
    Thank you, Jason

    • Avatar
      James Carmichael January 20, 2022 at 7:55 am #

      You are very welcome Anandan! Keep up the great work!

  61. Avatar
    kavana May 23, 2023 at 11:10 pm #

    very helpful, thank you.

    • Avatar
      James Carmichael May 24, 2023 at 8:35 am #

      You are very welcome kavana!

  62. Avatar
    Matthew Cobbinah June 17, 2023 at 2:05 am #

    Very relevant

    • Avatar
      James Carmichael June 17, 2023 at 10:54 am #

      Thank you Matthew for your feedback and support! We appreciate it!

Leave a Reply