7 Time Series Datasets for Machine Learning

By Jason Brownlee on January 1, 2021 in Time Series 79

Machine learning can be applied to time series datasets.

These are problems where a numeric or categorical value must be predicted, but the rows of data are ordered by time.

A problem when getting started in time series forecasting with machine learning is finding good quality standard datasets on which to practice.

In this post, you will discover 8 standard time series datasets that you can use to get started and practice time series forecasting with machine learning.

After reading this post, you will know:

4 univariate time series datasets.
3 multivariate time series datasets.
Websites that you can use to search and download more datasets.

Kick-start your project with my new book Time Series Forecasting With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Updated Apr/2019: Updated the links to the datasets.

Univariate Time Series Datasets

Time series datasets that only have one variable are called univariate datasets.

These datasets are a great place to get started because:

They are so simple and easy to understand.
You can plot them easily in excel or your favorite plotting tool.
You can easily plot the predictions compared to the expected results.
You can quickly try and evaluate a suite of traditional and newer methods.

There are many sources of time series dataset, such as the “Time Series Data Library” created by Rob Hyndman, Professor of Statistics at Monash University, Australia

Below are 4 univariate time series datasets that you can download from a range of fields such as Sales, Meteorology, Physics and Demography.

Stop learning Time Series Forecasting the slow way!

Take my free 7-day email course and discover how to get started (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Shampoo Sales Dataset

This dataset describes the monthly number of sales of shampoo over a 3 year period.

The units are a sales count and there are 36 observations. The original dataset is credited to Makridakis, Wheelwright and Hyndman (1998).

Below is a sample of the first 5 rows of data including the header row.

"Month","Sales of shampoo over a three year period"
"1-01",266.0
"1-02",145.9
"1-03",183.1
"1-04",119.3
"1-05",180.3

"Month","Sales of shampoo over a three year period"

"1-01",266.0

"1-02",145.9

"1-03",183.1

"1-04",119.3

"1-05",180.3

Below is a plot of the entire dataset.

Shampoo Sales Dataset

The dataset shows an increasing trend and possibly some seasonal component.

Download the dataset.

Minimum Daily Temperatures Dataset

This dataset describes the minimum daily temperatures over 10 years (1981-1990) in the city Melbourne, Australia.

The units are in degrees Celsius and there are 3650 observations. The source of the data is credited as the Australian Bureau of Meteorology.

Below is a sample of the first 5 rows of data including the header row.

"Date","Daily minimum temperatures in Melbourne, Australia, 1981-1990"
"1981-01-01",20.7
"1981-01-02",17.9
"1981-01-03",18.8
"1981-01-04",14.6
"1981-01-05",15.8

"Date","Daily minimum temperatures in Melbourne, Australia, 1981-1990"

"1981-01-01",20.7

"1981-01-02",17.9

"1981-01-03",18.8

"1981-01-04",14.6

"1981-01-05",15.8

Below is a plot of the entire dataset.

Minimum Daily Temperatures

The dataset shows a strong seasonality component and has a nice fine grained detail to work with.

Download the dataset.

Monthly Sunspot Dataset

This dataset describes a monthly count of the number of observed sunspots for just over 230 years (1749-1983).

The units are a count and there are 2,820 observations. The source of the dataset is credited to Andrews & Herzberg (1985).

Below is a sample of the first 5 rows of data including the header row.

"Month","Zuerich monthly sunspot numbers 1749-1983"
"1749-01",58.0
"1749-02",62.6
"1749-03",70.0
"1749-04",55.7
"1749-05",85.0

"Month","Zuerich monthly sunspot numbers 1749-1983"

"1749-01",58.0

"1749-02",62.6

"1749-03",70.0

"1749-04",55.7

"1749-05",85.0

Below is a plot of the entire dataset.

Monthly Sun Spot Dataset

The dataset shows seasonality with large differences between seasons.

Download the dataset.

Daily Female Births Dataset

This dataset describes the number of daily female births in California in 1959.

The units are a count and there are 365 observations. The source of the dataset is credited to Newton (1988).

Below is a sample of the first 5 rows of data including the header row.

"Date","Daily total female births in California, 1959"
"1959-01-01",35
"1959-01-02",32
"1959-01-03",30
"1959-01-04",31
"1959-01-05",44

"Date","Daily total female births in California, 1959"

"1959-01-01",35

"1959-01-02",32

"1959-01-03",30

"1959-01-04",31

"1959-01-05",44

Below is a plot of the entire dataset.

Daily Female Births Dataset

Download the dataset.

Multivariate Time Series Datasets

Multivariate datasets are generally more challenging and are the sweet spot for machine learning methods.

A great source of multivariate time series data is the UCI Machine Learning Repository.

At the time of writing, there are 63 time series datasets that you can download for free and work with.

Below is a selection of 3 recommended multivariate time series datasets from Meteorology, Medicine and Monitoring domains.

EEG Eye State Dataset

This dataset describes EEG data for an individual and whether their eyes were open or closed. The objective of the problem is to predict whether eyes are open or closed given EEG data alone.

The objective of the problem is to predict whether eyes are open or closed given EEG data alone.

This is a classification predictive modeling problems and there are a total of 14,980 observations and 15 input variables. The class value of ‘1’ indicates the eye-closed and ‘0’ the eye-open state. Data is ordered by time and observations were recorded over a period of 117 seconds.

Data is ordered by time and observations were recorded over a period of 117 seconds.

Below is a sample of the first 5 rows with no header row.

4329.23,4009.23,4289.23,4148.21,4350.26,4586.15,4096.92,4641.03,4222.05,4238.46,4211.28,4280.51,4635.9,4393.85,0
4324.62,4004.62,4293.85,4148.72,4342.05,4586.67,4097.44,4638.97,4210.77,4226.67,4207.69,4279.49,4632.82,4384.1,0
4327.69,4006.67,4295.38,4156.41,4336.92,4583.59,4096.92,4630.26,4207.69,4222.05,4206.67,4282.05,4628.72,4389.23,0
4328.72,4011.79,4296.41,4155.9,4343.59,4582.56,4097.44,4630.77,4217.44,4235.38,4210.77,4287.69,4632.31,4396.41,0
4326.15,4011.79,4292.31,4151.28,4347.69,4586.67,4095.9,4627.69,4210.77,4244.1,4212.82,4288.21,4632.82,4398.46,0

4329.23,4009.23,4289.23,4148.21,4350.26,4586.15,4096.92,4641.03,4222.05,4238.46,4211.28,4280.51,4635.9,4393.85,0

4324.62,4004.62,4293.85,4148.72,4342.05,4586.67,4097.44,4638.97,4210.77,4226.67,4207.69,4279.49,4632.82,4384.1,0

4327.69,4006.67,4295.38,4156.41,4336.92,4583.59,4096.92,4630.26,4207.69,4222.05,4206.67,4282.05,4628.72,4389.23,0

4328.72,4011.79,4296.41,4155.9,4343.59,4582.56,4097.44,4630.77,4217.44,4235.38,4210.77,4287.69,4632.31,4396.41,0

4326.15,4011.79,4292.31,4151.28,4347.69,4586.67,4095.9,4627.69,4210.77,4244.1,4212.82,4288.21,4632.82,4398.46,0

Learn More

Occupancy Detection Dataset

This dataset describes measurements of a room and the objective is to predict whether or not the room is occupied.

There are 20,560 one-minute observations taken over the period of a few weeks. This is a classification prediction problem. There are 7 attributes including various light and climate properties of the room.

The source for the data is credited to Luis Candanedo from UMONS.

Below is a sample of the first 5 rows of data including the header row.

"date","Temperature","Humidity","Light","CO2","HumidityRatio","Occupancy"
"1","2015-02-04 17:51:00",23.18,27.272,426,721.25,0.00479298817650529,1
"2","2015-02-04 17:51:59",23.15,27.2675,429.5,714,0.00478344094931065,1
"3","2015-02-04 17:53:00",23.15,27.245,426,713.5,0.00477946352442199,1
"4","2015-02-04 17:54:00",23.15,27.2,426,708.25,0.00477150882608175,1
"5","2015-02-04 17:55:00",23.1,27.2,426,704.5,0.00475699293331518,1
"6","2015-02-04 17:55:59",23.1,27.2,419,701,0.00475699293331518,1

"date","Temperature","Humidity","Light","CO2","HumidityRatio","Occupancy"

"1","2015-02-04 17:51:00",23.18,27.272,426,721.25,0.00479298817650529,1

"2","2015-02-04 17:51:59",23.15,27.2675,429.5,714,0.00478344094931065,1

"3","2015-02-04 17:53:00",23.15,27.245,426,713.5,0.00477946352442199,1

"4","2015-02-04 17:54:00",23.15,27.2,426,708.25,0.00477150882608175,1

"5","2015-02-04 17:55:00",23.1,27.2,426,704.5,0.00475699293331518,1

"6","2015-02-04 17:55:59",23.1,27.2,419,701,0.00475699293331518,1

The data is provided in 3 files that suggest the splits that may be used for training and testing a model.

Learn More

Ozone Level Detection Dataset

This dataset describes 6 years of ground ozone concentration observations and the objective is to predict whether it is an “ozone day” or not.

The dataset contains 2,536 observations and 73 attributes. This is a classification prediction problem and the final attribute indicates the class value as “1” for an ozone day and “0” for a normal day.

Two versions of the data are provided, eight-hour peak set and one-hour peak set. I would suggest using the one hour peak set for now.

Below is a sample of the first 5 rows with no header row.

1/1/1998,0.8,1.8,2.4,2.1,2,2.1,1.5,1.7,1.9,2.3,3.7,5.5,5.1,5.4,5.4,4.7,4.3,3.5,3.5,2.9,3.2,3.2,2.8,2.6,5.5,3.1,5.2,6.1,6.1,6.1,6.1,5.6,5.2,5.4,7.2,10.6,14.5,17.2,18.3,18.9,19.1,18.9,18.3,17.3,16.8,16.1,15.4,14.9,14.8,15,19.1,12.5,6.7,0.11,3.83,0.14,1612,-2.3,0.3,7.18,0.12,3178.5,-15.5,0.15,10.67,-1.56,5795,-12.1,17.9,10330,-55,0,0.
1/2/1998,2.8,3.2,3.3,2.7,3.3,3.2,2.9,2.8,3.1,3.4,4.2,4.5,4.5,4.3,5.5,5.1,3.8,3,2.6,3,2.2,2.3,2.5,2.8,5.5,3.4,15.1,15.3,15.6,15.6,15.9,16.2,16.2,16.2,16.6,17.8,19.4,20.6,21.2,21.8,22.4,22.1,20.8,19.1,18.1,17.2,16.5,16.1,16,16.2,22.4,17.8,9,0.25,-0.41,9.53,1594.5,-2.2,0.96,8.24,7.3,3172,-14.5,0.48,8.39,3.84,5805,14.05,29,10275,-55,0,0.
1/3/1998,2.9,2.8,2.6,2.1,2.2,2.5,2.5,2.7,2.2,2.5,3.1,4,4.4,4.6,5.6,5.4,5.2,4.4,3.5,2.7,2.9,3.9,4.1,4.6,5.6,3.5,16.6,16.7,16.7,16.8,16.8,16.8,16.9,16.9,17.1,17.6,19.1,21.3,21.8,22,22.1,22.2,21.3,19.8,18.6,18,18,18.2,18.3,18.4,22.2,18.7,9,0.56,0.89,10.17,1568.5,0.9,0.54,3.8,4.42,3160,-15.9,0.6,6.94,9.8,5790,17.9,41.3,10235,-40,0,0.
1/4/1998,4.7,3.8,3.7,3.8,2.9,3.1,2.8,2.5,2.4,3.1,3.3,3.1,2.3,2.1,2.2,3.8,2.8,2.4,1.9,3.2,4.1,3.9,4.5,4.3,4.7,3.2,18.3,18.2,18.3,18.4,18.6,18.6,18.5,18.7,18.6,18.8,19,19,19.3,19.4,19.6,19.2,18.9,18.8,18.6,18.5,18.3,18.5,18.8,18.9,19.6,18.7,9.9,0.89,-0.34,8.58,1546.5,3,0.77,4.17,8.11,3145.5,-16.8,0.49,8.73,10.54,5775,31.15,51.7,10195,-40,2.08,0.
1/5/1998,2.6,2.1,1.6,1.4,0.9,1.5,1.2,1.4,1.3,1.4,2.2,2,3,3,3.1,3.1,2.7,3,2.4,2.8,2.5,2.5,3.7,3.4,3.7,2.3,18.8,18.6,18.5,18.5,18.6,18.9,19.2,19.4,19.8,20.5,21.1,21.9,23.8,25.1,25.8,26,25.6,24.2,22.9,21.6,20,19.5,19.1,19.1,26,21.1,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,0.58,0.
1/6/1998,3.1,3.5,3.3,2.5,1.6,1.7,1.6,1.6,2.3,1.8,2.5,3.9,3.4,2.7,3.4,2.5,2.2,4.4,4.3,3.2,6.2,6.8,5.1,4,6.8,3.2,18.9,19.5,19.6,19.5,19.5,19.5,19.4,19.2,19.1,19.5,19.6,18.6,18.6,18.9,19.2,19.3,19.2,18.8,17.6,16.9,15.6,15.4,15.9,15.8,19.6,18.5,14.4,0.68,1.52,8.62,1499.5,4.3,0.61,9.04,10.81,3111,-11.8,0.09,11.98,11.28,5770,27.95,46.25,10120,?,5.84,0.

1/1/1998,0.8,1.8,2.4,2.1,2,2.1,1.5,1.7,1.9,2.3,3.7,5.5,5.1,5.4,5.4,4.7,4.3,3.5,3.5,2.9,3.2,3.2,2.8,2.6,5.5,3.1,5.2,6.1,6.1,6.1,6.1,5.6,5.2,5.4,7.2,10.6,14.5,17.2,18.3,18.9,19.1,18.9,18.3,17.3,16.8,16.1,15.4,14.9,14.8,15,19.1,12.5,6.7,0.11,3.83,0.14,1612,-2.3,0.3,7.18,0.12,3178.5,-15.5,0.15,10.67,-1.56,5795,-12.1,17.9,10330,-55,0,0.

1/2/1998,2.8,3.2,3.3,2.7,3.3,3.2,2.9,2.8,3.1,3.4,4.2,4.5,4.5,4.3,5.5,5.1,3.8,3,2.6,3,2.2,2.3,2.5,2.8,5.5,3.4,15.1,15.3,15.6,15.6,15.9,16.2,16.2,16.2,16.6,17.8,19.4,20.6,21.2,21.8,22.4,22.1,20.8,19.1,18.1,17.2,16.5,16.1,16,16.2,22.4,17.8,9,0.25,-0.41,9.53,1594.5,-2.2,0.96,8.24,7.3,3172,-14.5,0.48,8.39,3.84,5805,14.05,29,10275,-55,0,0.

1/3/1998,2.9,2.8,2.6,2.1,2.2,2.5,2.5,2.7,2.2,2.5,3.1,4,4.4,4.6,5.6,5.4,5.2,4.4,3.5,2.7,2.9,3.9,4.1,4.6,5.6,3.5,16.6,16.7,16.7,16.8,16.8,16.8,16.9,16.9,17.1,17.6,19.1,21.3,21.8,22,22.1,22.2,21.3,19.8,18.6,18,18,18.2,18.3,18.4,22.2,18.7,9,0.56,0.89,10.17,1568.5,0.9,0.54,3.8,4.42,3160,-15.9,0.6,6.94,9.8,5790,17.9,41.3,10235,-40,0,0.

1/4/1998,4.7,3.8,3.7,3.8,2.9,3.1,2.8,2.5,2.4,3.1,3.3,3.1,2.3,2.1,2.2,3.8,2.8,2.4,1.9,3.2,4.1,3.9,4.5,4.3,4.7,3.2,18.3,18.2,18.3,18.4,18.6,18.6,18.5,18.7,18.6,18.8,19,19,19.3,19.4,19.6,19.2,18.9,18.8,18.6,18.5,18.3,18.5,18.8,18.9,19.6,18.7,9.9,0.89,-0.34,8.58,1546.5,3,0.77,4.17,8.11,3145.5,-16.8,0.49,8.73,10.54,5775,31.15,51.7,10195,-40,2.08,0.

1/5/1998,2.6,2.1,1.6,1.4,0.9,1.5,1.2,1.4,1.3,1.4,2.2,2,3,3,3.1,3.1,2.7,3,2.4,2.8,2.5,2.5,3.7,3.4,3.7,2.3,18.8,18.6,18.5,18.5,18.6,18.9,19.2,19.4,19.8,20.5,21.1,21.9,23.8,25.1,25.8,26,25.6,24.2,22.9,21.6,20,19.5,19.1,19.1,26,21.1,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,0.58,0.

1/6/1998,3.1,3.5,3.3,2.5,1.6,1.7,1.6,1.6,2.3,1.8,2.5,3.9,3.4,2.7,3.4,2.5,2.2,4.4,4.3,3.2,6.2,6.8,5.1,4,6.8,3.2,18.9,19.5,19.6,19.5,19.5,19.5,19.4,19.2,19.1,19.5,19.6,18.6,18.6,18.9,19.2,19.3,19.2,18.8,17.6,16.9,15.6,15.4,15.9,15.8,19.6,18.5,14.4,0.68,1.52,8.62,1499.5,4.3,0.61,9.04,10.81,3111,-11.8,0.09,11.98,11.28,5770,27.95,46.25,10120,?,5.84,0.

Learn More

Summary

In this post, you discovered a suite of standard time series forecast datasets that you can use to get started and practice time series forecasting with machine learning methods.

Specifically, you learned about:

4 univariate time series forecasting datasets.
3 multivariate time series forecasting datasets.
Two websites where you can download many more datasets.

Did you use one of the above datasets in your own project?
Share your findings in the comments below.

79 Responses to 7 Time Series Datasets for Machine Learning

R. Edwin July 6, 2017 at 3:27 am #

Hey there, great tutorial! I need your help:
I have to make a weather forecasting project for my college. It has to be based on a time series dataset I guess. But I’m having a difficult time trying to get a suitable multivariate dataset, also I would like to ask you for an ML model to use in this kind of problem. I will appreciate any resource you could provide me.

Reply
- Jason Brownlee July 6, 2017 at 10:26 am #
  
  Consider your government’s meteorological organization. Most give data freely.
  
  Reply
Parijat September 29, 2017 at 4:47 am #

Hi, I am looking for industrial time series datasets. Any suggestions.. Thanks.

Reply
- Jason Brownlee September 29, 2017 at 5:09 am #
  
  What is wrong with the examples in this post?
  
  Reply
  - Mihir August 11, 2021 at 2:05 am #
    
    I my work on weather dataset there are 4 classes clear, partially cloud , overcast,rain .
    And I use lstm model . Which lstm model I use for multi classes classification.
    
    Reply
    - Jason Brownlee August 11, 2021 at 7:41 am #
      
      I recommend trying a few different model architectures and compare results to classical ML models in order to discover what works well for your specific dataset.
      
      Reply
Domenico November 4, 2017 at 12:45 am #

Hi Jason,
many thanks for your article, I found usefull datset.
I did not find any dataset on UCI about temperature and energy consumption inside a building, I was wondering if you could help me in some way.
I hope to hear from you soon

Reply
- Jason Brownlee November 4, 2017 at 5:31 am #
  
  Sorry, I’m not aware of such a dataset off-the-cuff.
  
  Reply
Nisha Chaube January 21, 2018 at 7:28 am #

I have a multivariate-dataset with observations from day 1 to 49 for each of the almost 30 patients. The end result is whether the patient has PTSD (1) or not ( 0 ). Please suggest how am I supposed to approach this problem in terms of data pre-processing.

Reply
- Jason Brownlee January 21, 2018 at 9:14 am #
  
  Sounds like a sequence classification problem.
  
  This post might give you some ideas:
  https://machinelearningmastery.com/sequence-prediction/
  
  LSTMs might be a good fit:
  https://machinelearningmastery.com/start-here/#lstm
  
  Reply
VEERENDRA JONNALAGADDA June 1, 2018 at 5:22 am #

any sample code in python or C for time series ie preparing data via pandas(separating needed columns),analysing same for training,preparing model,training the model,applying same on test data…..

Please excuse me incase I have requesting anything wrong.

Reply
- Jason Brownlee June 1, 2018 at 8:26 am #
  
  I have many examples, try searching on the blog.
  
  Reply
Florent January 20, 2019 at 7:51 pm #

Hi, I am trying to create a model that uses past data (sales volume + weather condition for example) to predict the 5 next day of sales volume but I would like to use weather prediction of the next 5 days also to forecast the volumes.

Can you tell me about the model to use (I guess RNN) and how to build my dataset.

Regards

Reply
- Jason Brownlee January 21, 2019 at 5:31 am #
  
  I recommend following this process:
  https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/
  
  You can get code examples for multivariate input and multi-step output here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
Avram March 8, 2019 at 11:38 pm #

Hi Jason,
My question may come to you a bit weird so that i beg your pardon in advance. I am working on short term load forecasting. As i know AEMO opens data about electricity. I can access the half-hourly load demand of past years(from 2006 through 2018) however i cannot access the half-hourly weather data(temperature and bulb) of Australian regions(QSL,VIC,NSW etc). I will make comparative analysis with journal papers so that i am looking for these data and authors of some papers did not shared their AEMO data yet. How can i get or find these data?Can you direct me on this issue?

Reply
- Jason Brownlee March 9, 2019 at 6:29 am #
  
  My best advice is to contact the authors directly, and perhaps their advisors/colleagues?
  
  Reply
fernando A gutierrez March 12, 2019 at 6:56 am #

I have a data set of shipping cost per day (in on year), however, not every day has a shipping cost. What’s the best we to deal with missing daily cost in order to make a Time Series analysis?

Reply
- Jason Brownlee March 12, 2019 at 7:00 am #
  
  Perhaps start by filling the missing values with the mean/average values of the series?
  
  Reply
one July 3, 2019 at 1:07 pm #

I need to find data set and decompose for BTS for fault prediction from fault history
total donw time and 3 cell/ sector how it coud possible

Reply
- Jason Brownlee July 4, 2019 at 7:37 am #
  
  Perhaps check on Kaggle?
  
  Reply
- nandy October 3, 2019 at 5:05 pm #
  
  Hi one. May I get your email address please? i’m also working on similar project
  
  Reply
Abderahmane Bouziane July 23, 2019 at 6:20 am #

Do you think multivariate time series can take advantage of CNNs?
Can you combine CNNs with LSTMs?
How would you build a time series autoencoder for where each instant has 30 variables?

Reply
- Jason Brownlee July 23, 2019 at 8:17 am #
  
  Yes and yes.
  
  I have examples, perhaps start here:
  https://machinelearningmastery.com/how-to-develop-convolutional-neural-network-models-for-time-series-forecasting/
  
  And here:
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
  
  Reply
Shital September 19, 2019 at 3:59 pm #

Multivariate datasets are generally more challenging as you said. How to apply neural network algorithm on these datasets in WEKA? I am doing something wrong as I am getting the same result for yearly/monthly/weekly datasets. Please guide.

Reply
- Jason Brownlee September 20, 2019 at 5:35 am #
  
  Good question.
  
  There may be a way, I don’t have an example sorry.
  
  Reply
  - Shital Bhojani October 1, 2019 at 2:09 pm #
    
    Yes, I found a way. We can use the Overlay for training and test data using the advance configuration in time series package. We can set the single or multiple dependent parameters in overlay. While using overlay, data set is separated automatically in training and test data as per the values we have set in Evaluation tab.
    
    Reply
Shital Bhojani September 20, 2019 at 2:37 pm #

Ohhhk… Thanks for your prompt reply Jason. I am rendering around it.

Reply
- Jason Brownlee September 21, 2019 at 6:42 am #
  
  You’re welcome.
  
  Reply
Arjun November 19, 2019 at 3:35 pm #

Hi jason,
Can you help me on how to convert a txt file to csv file?

Reply
- Jason Brownlee November 20, 2019 at 6:08 am #
  
  Perhaps change the file extension from .txt to .csv?
  
  Reply
Arjun November 19, 2019 at 4:26 pm #

Is it mandatory to convert the text file into a csv file and then into a pandas dataframe for further work? Or does it provide conflict if it is not done?

Reply
- Jason Brownlee November 20, 2019 at 6:09 am #
  
  No, Pandas does not care about file extensions, only the content.
  
  Reply
adil shahzad November 27, 2019 at 8:02 pm #

do anyone have a discreet dataset?

Reply
- Jason Brownlee November 28, 2019 at 6:34 am #
  
  For time series, yes, there are some exampels of time series classification here:
  https://machinelearningmastery.com/how-to-model-human-activity-from-smartphone-data/
  
  Reply
Aashish Agarwal December 21, 2019 at 9:47 am #

Dear Jason,

Thank you for the wonderful post. I have a dataset, similar to Occupancy Detection Dataset, which you have described above.

1. Can we apply LSTMs, CNNs on these data?
2. Are these kind of data count under multivariate time series data? What I have understood till now, in time-series data there is a sequence in the rows and columns i.e. we can’t move any columns and any rows since time-series data have a sequence.
3. What kind of models can we apply to such a problem?

Regards,
Aashish

Reply
- Jason Brownlee December 22, 2019 at 6:06 am #
  
  Perhaps try a suite of algorithms and compare results.
  
  Yes, multivariate inputs. More on the types of time series problems here:
  https://machinelearningmastery.com/taxonomy-of-time-series-forecasting-problems/
  
  Reply
Rajesh December 22, 2019 at 11:29 pm #

Hey Jason, Great Post.

I deal with system and application monitoring data a lot. I am looking for production ready software that would help me store data in Timeseries Database and apply predictive analytics (RNN, S/ARIMA) continuously. I see there are couple of cool libraries like TICK stack, LoudML and Facebook prophet.

Any tutorial would be great demonstrating the deployment of such continuous predictive system.

Best Regards,
Rajesh

Reply
- Jason Brownlee December 23, 2019 at 6:49 am #
  
  Thanks.
  
  Great suggestion!
  
  Reply
Laila January 8, 2020 at 6:54 am #

Hi Jason,

Where can I get information about RNN or LSTM time series prediction datasets that need improvements, for example in terms of accuracy?

Reply
- Jason Brownlee January 8, 2020 at 8:36 am #
  
  We minimize error for time series, not accuracy.
  
  What do you mean by “need improvement”?
  
  If you want to solve real problems where people care about the outcome, perhaps start with kaggle or take on some consulting work?
  
  Reply
GKboy March 30, 2020 at 7:17 pm #

Hi there,

Is there is any solution to handle 3d data with a “traditional” ML solution?
For example, if I have a time series generated with 1000 users. In this scenario, we have 1000x time series. How can I make a generalized Varmax or Arimax model for every user, if I don’t want to use LSTM ?

Reply
- Jason Brownlee March 31, 2020 at 8:03 am #
  
  You can transform the dataset into a supervised learning problem and test a suite of standard ml algorithms:
  https://machinelearningmastery.com/time-series-forecasting-supervised-learning/
  
  And here:
  https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
  
  Reply
  - GKboy March 31, 2020 at 8:10 am #
    
    Thank you!
    
    Reply
    - Jason Brownlee March 31, 2020 at 8:20 am #
      
      You’re welcome.
      
      Reply
Remirab April 13, 2020 at 11:15 pm #

Hi there.

Do we categorize GPS trajectories as Univariate Time Series?

Reply
- Jason Brownlee April 14, 2020 at 6:19 am #
  
  Perhaps start by assigning categories to your sequences first, then explore modeling it as a time series classification task.
  
  The tutorials here will help:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
Suresh Reshu April 22, 2020 at 1:34 am #

can u post some thing like “How to prepare time series dataset for machine learning” that are implemented using sklearn

Reply
- Jason Brownlee April 22, 2020 at 6:02 am #
  
  I have many such tutorials, perhaps start here:
  https://machinelearningmastery.com/machine-learning-data-transforms-for-time-series-forecasting/
  
  Reply
Shubhi Jain May 6, 2020 at 5:39 am #

Hi,

My data is in the format timestamp, no of customers. I want to convert it into an hourly time series. How should I do that?

Reply
- Jason Brownlee May 6, 2020 at 6:31 am #
  
  It really depends on your data, sorry, I cannot give better advice than that.
  
  Reply
Sachin Kannan August 31, 2020 at 12:19 am #

Hi Jason,

I have a dataset with columns as follows “Account Jan Feb Mar Q1 Apr May Jun Q2 Jul Aug Sep Q3 Oct Nov Dec Q4 YearTotal Year”

How am i suppose to consume this data for forecasting model as my month columns dont have any dates to them instead they have the sales figures for each account. Eg.

Account jan feb march Q1 Year
Revision 267829.5 279052.45 260298.54 807180.49 2019

My aim is to predict the Q3 and Q4 for the year 2020.

Please give your thoughts.

Reply
- Jason Brownlee August 31, 2020 at 6:16 am #
  
  Perhaps start with a persistence model, then move on to evaluate a suite of models in order to discover what works well or best for your dataset.
  
  Reply
  - Sachin Kannan September 1, 2020 at 1:34 am #
    
    I saw to your persistence model which you have used on shampoo and monthly car sales data. They both are univariate datasets in my case i have multivariate, can you please suggest how to approach mulivariate.
    
    How to do time series by considering 3 to 5 columns and predict. If there is a way i can share some sample with you, if so do suggest.
    
    Reply
    - Jason Brownlee September 1, 2020 at 6:37 am #
      
      Yes, the tutorials here will get you started with multivariate time series forecasting:
      https://machinelearningmastery.com/start-here/#deep_learning_time_series
      
      Reply
Beste Karacay September 2, 2020 at 5:44 am #

Hi Jason,

What I would like to ask is this, I have a time series historical data. It is daily sales data however, I have different product id’s. For example, I have 3 different dates for product 1, but I have 8 different dates for product 2.
I am expected to build an algorithm to forecast the sales of any product for next day.
How should I proceed?

e.g.
productid date soldquantity
1 23.11.2018 0
21 30.11.2018 0
21 27.12.2018 0
21 9.01.2019 0
21 18.12.2018 0
21 5.01.2019 0
21 7.01.2019 0
21 31.12.2018 0
21 26.12.2018 0
21 25.12.2018 0
21 10.01.2019 0
31 1.12.2018 0
31 19.11.2018 0
31 11.11.2018 0
31 27.11.2018 0
31 22.11.2018 0

Reply
- Jason Brownlee September 2, 2020 at 6:34 am #
  
  I would expect each product id is a separate series.
  
  You can use a machine learning or deep learning model to learn per product or across products.
  
  Reply
  - Beste Karacay September 2, 2020 at 4:52 pm #
    
    Hi again. Thanks for the quick answer.
    I considered taking all products as a seperate series, however I have more than 10 thousand products.
    
    Which machine learning method could be used? I am very new at this.
    
    Reply
    - Jason Brownlee September 3, 2020 at 6:04 am #
      
      I recommend starting with linear models like linear regression or SARIMA then move on to more advanced methods and see if they offer a benefit.
      
      This framework will help:
      https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/
      
      Reply
Gulzar January 1, 2021 at 12:03 am #

Hi! you may want to ctrl+f “At the time of writing, there are” and find that you left this sentence twice in a row. Thanks for the article! it helped me find a dataset I needed.

Reply
- Jason Brownlee January 1, 2021 at 5:29 am #
  
  Thanks, fixed!
  
  Reply
Gopal February 11, 2021 at 9:36 pm #

Jason, can you help us to understand FourierFeaturizer and how interpret it from pmdarima python package. I wanted to use it to forecast seasonal data with long seasonal periods.

Regular approach is taking good amount of time . so based https://robjhyndman.com/hyndsight/longseasonality/ exploring the usage of FourierFeaturizer.

Reply
- Jason Brownlee February 12, 2021 at 5:45 am #
  
  Thank you for the suggestion.
  
  Reply
Aashika Varma April 7, 2021 at 11:28 pm #

Hey Jason, the examples in this article look great! I’m actually looking for a signal processing dataset to apply time series modelling for a project. Could you suggest any open source datasets in this context?

Reply
- Jason Brownlee April 8, 2021 at 5:09 am #
  
  Thanks.
  
  This may help:
  https://machinelearningmastery.com/faq/single-faq/where-can-i-get-a-dataset-on-___/
  
  Reply
Hanna July 30, 2021 at 3:05 am #

I have satellite time series (multivariate-dataset) with images from day 1 to 10 with almost 7 classes . Please suggest how am I supposed to approach this problem in terms of data augmentation

Reply
- Jason Brownlee July 30, 2021 at 6:32 am #
  
  Perhaps you can use a pre-trained model with a custom CNN-LSTM type architecture.
  
  Reply
Priyanka Mohan August 8, 2021 at 8:41 pm #

Hello Jason,

I have a GPS dataset (latitude, longitude, timestamp) as a dataset. Each track (series of GPS points) of a participant is compared with another participant who walks on the same track. I want to do time series classification on this data, which kind of data can this be?

Thank you

Reply
- Jason Brownlee August 9, 2021 at 5:55 am #
  
  That sounds like a great project. It is time series classification, try a suite of models and discover what works well or best for your data.
  
  Reply
Kone December 15, 2021 at 12:05 pm #

i am working with time series about education in my AI thesis project. the values are by year from 2013 to 2021, so i have nine records. i think it is a small dataset for a PHD, what do you think ?? Any suggestions ??

Reply
- Adrian Tam December 17, 2021 at 6:51 am #
  
  9 records probably can’t help you go too far, but it should be a good start.
  
  Reply
sham February 16, 2022 at 6:35 pm #

Hello !
Brother can you provide Supply chain multi mode(Air, Truck, ocean etc) travel time prediction dataset.

Reply
sham February 16, 2022 at 6:36 pm #

Hello !
Brother can you provide Supply chain multi mode(Air, Truck, ocean etc) travel time prediction dataset.
I will be very thankful !

Reply
- James Carmichael February 17, 2022 at 1:29 pm #
  
  Hi Sham…I do not have such a dataset. You may want to check Kaggle or StackOverflow.
  
  Reply
Hanson February 16, 2023 at 3:37 am #

Hello!
Thanks for your post.
Can I get the reference about where datasets on the post came from?
I want to get more about each dataset.
Are they from ““Time Series Data Library” created by Rob Hyndman, Professor of Statistics at Monash University, Australia” as you said in the first part on the post as below?

“There are many sources of time series dataset, such as the “Time Series Data Library” created by Rob Hyndman, Professor of Statistics at Monash University, Australia”

Reply
- James Carmichael February 16, 2023 at 8:32 am #
  
  Hi Hanson…each dataset contains a link that you can follow as the source. Also, in some cases the author’s name is provided so that you can perform a search on the author and the datasets they have published.
  
  Reply
Hanson February 16, 2023 at 3:42 am #

Additional comment:

I read there is some description about the source, such as

“The source of the data is credited as the Australian Bureau of Meteorology.” for Minimum Daily Temperatures Dataset.

But could you let me know how to get the source of the data in detail?

Thank you!

Reply
- James Carmichael February 16, 2023 at 8:31 am #
  
  Hi Hanson…each dataset contains a link that you can follow as the source. Also, in some cases the author’s name is provided so that you can perform a search on the author and the datasets they have published.
  
  Reply
Hanson February 16, 2023 at 3:45 am #

Additional comment:

I read the part about where the source data was such as

“The source of the data is credited as the Australian Bureau of Meteorology.” for Minimum Daily Temperatures Dataset.

But could you let me know how I can get the data through Australian Bureau of Meteorology in detail?

Thank you!

Reply
- James Carmichael February 16, 2023 at 8:32 am #
  
  Hi Hanson…each dataset contains a link that you can follow as the source. Also, in some cases the author’s name is provided so that you can perform a search on the author and the datasets they have published.
  
  Reply

Navigation

7 Time Series Datasets for Machine Learning

Univariate Time Series Datasets

Stop learning Time Series Forecasting the slow way!

Shampoo Sales Dataset

Minimum Daily Temperatures Dataset

Monthly Sunspot Dataset

Daily Female Births Dataset

Multivariate Time Series Datasets

EEG Eye State Dataset

Occupancy Detection Dataset

Ozone Level Detection Dataset

Summary

Want to Develop Time Series Forecasts with Python?

Develop Your Own Forecasts in Minutes

Finally Bring Time Series Forecasting to
Your Own Projects

More On This Topic

79 Responses to 7 Time Series Datasets for Machine Learning

Leave a Reply Click here to cancel reply.

Navigation

Univariate Time Series Datasets

Stop learning Time Series Forecasting the slow way!

Shampoo Sales Dataset

Minimum Daily Temperatures Dataset

Monthly Sunspot Dataset

Daily Female Births Dataset

Multivariate Time Series Datasets

EEG Eye State Dataset

Occupancy Detection Dataset

Ozone Level Detection Dataset

Summary

Want to Develop Time Series Forecasts with Python?

Develop Your Own Forecasts in Minutes

Finally Bring Time Series Forecasting to Your Own Projects

More On This Topic

79 Responses to 7 Time Series Datasets for Machine Learning

Leave a Reply Click here to cancel reply.

Finally Bring Time Series Forecasting to
Your Own Projects