Last Updated on September 5, 2016
Real-world examples make the abstract description of machine learning become concrete.
In this post you will go on a tour of real world machine learning problems. You will see how machine learning can actually be used in fields like education, science, technology and medicine.
Each machine learning problem listed also includes a link to the publicly available dataset. This means that if a particular concrete machine learning problem interest you, you can download the dataset and start practicing immediately.
Most Popular Kaggle Datasets
These first 10 examples of machine learning problems were taken from the competitive machine learning website Kaggle.com. Popularity was based on the number of participating teams.
- Otto Group Product Classification Challenge. Given features of products data classify products into one of 9 product categories.
- Rossmann Store Sales. Given historical sales data for products across stores, forecast future sales.
- Bike Sharing Demand. Given daily bike rental and weather records predict future daily bike rental demand.
- The Analytics Edge. Given details of new your times articles predict which news paper articles will be popular.
- Restaurant Revenue Prediction. Given the details of a restaurant site predict the revenue of the restaurant in a given year.
- Liberty Mutual Group: Property Inspection Prediction. Given the details of inspected properties predict a hazard score for properties.
- Springleaf Marketing Response. Given features of customers predict whether they are a marketing target or not.
- Higgs Boson Machine Learning Challenge. Given the description of simulated particle collisions predict whether an event decays into a Higgs boson or not.
- Forest Cover Type Prediction. Given cartographic variables predict forest cover type.
- Amazon.com Employee Access Challenge. Given historical resource access changes for employees predict the resources required by employees.
Most Popular Research Datasets
The next 10 machine learning problems are the most popular on the University California at Irvine Machine Learning Repository website that traditionally hosts machine learning datasets used by the machine learning research community.
- Iris dataset. Given flower measurements in centimeters predict the species of iris.
- Adult dataset. Given census data predict with an individual will earn more than $50,000 a year.
- Wine dataset. Given a chemical analysis of wines predict the origin of the wind.
- Car evaluation dataset. Given details about cars predict the the estimated safety of the car.
- Breast Cancer Wisconsin dataset. Given the results of a diagnostic test on breast tissue, predict whether the mass is a tumor or not.
- Abalone dataset. Given the measurements of abalone predict the age of the abalone.
- Wine Quality dataset. Given various measurements of wine predict the quality of the wine.
- Heart Disease dataset. Given the results of various diagnostic tests on a patient predict the amount of heart disease in the patient.
- Poker Hand dataset. Given a database of poker hands predict the quality of the hand.
- Human activity recognition using smart phones dataset. From smart phone movement data predict the type of activity performed by the person holding the smart phone.
- Forest fires dataset. Given meteorological and other factors predict the burned area of forest fires.
- Internet Advertisements dataset. Given the details of images on web pages predict whether an image is an advertisement or not.
We took a whirlwind tour of 20 real-world machine learning problems.
These are actual problems posed or investigated by science and business organizations around the world.
What’s even more exciting is that these diverse problems have publicly available datasets and are also widely studied and understood.
This means you can download the data right now and explore the problem by implementing your own model, or reproduce someone else’s from a paper or blog post.