# 7 Time Series Datasets for Machine Learning

Machine learning can be applied to time series datasets.

These are problems where a numeric or categorical value must be predicted, but the rows of data are ordered by time.

A problem when getting started in time series forecasting with machine learning is finding good quality standard datasets on which to practice.

In this post, you will discover 8 standard time series datasets that you can use to get started and practice time series forecasting with machine learning.

After reading this post, you will know:

• 4 univariate time series datasets.
• 3 multivariate time series datasets.
• Websites that you can use to search and download more datasets.

Kick-start your project with my new book Time Series Forecasting With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

• Updated Apr/2019: Updated the links to the datasets.

## Univariate Time Series Datasets

Time series datasets that only have one variable are called univariate datasets.

These datasets are a great place to get started because:

• They are so simple and easy to understand.
• You can plot them easily in excel or your favorite plotting tool.
• You can easily plot the predictions compared to the expected results.
• You can quickly try and evaluate a suite of traditional and newer methods.

There are many sources of time series dataset, such as the “Time Series Data Library” created by Rob Hyndman, Professor of Statistics at Monash University, Australia

Below are 4 univariate time series datasets that you can download from a range of fields such as Sales, Meteorology, Physics and Demography.

### Stop learning Time Series Forecasting the slow way!

Take my free 7-day email course and discover how to get started (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

### Shampoo Sales Dataset

This dataset describes the monthly number of sales of shampoo over a 3 year period.

The units are a sales count and there are 36 observations. The original dataset is credited to Makridakis, Wheelwright and Hyndman (1998).

Below is a sample of the first 5 rows of data including the header row.

Below is a plot of the entire dataset.

Shampoo Sales Dataset

The dataset shows an increasing trend and possibly some seasonal component.

### Minimum Daily Temperatures Dataset

This dataset describes the minimum daily temperatures over 10 years (1981-1990) in the city Melbourne, Australia.

The units are in degrees Celsius and there are 3650 observations. The source of the data is credited as the Australian Bureau of Meteorology.

Below is a sample of the first 5 rows of data including the header row.

Below is a plot of the entire dataset.

Minimum Daily Temperatures

The dataset shows a strong seasonality component and has a nice fine grained detail to work with.

### Monthly Sunspot Dataset

This dataset describes a monthly count of the number of observed sunspots for just over 230 years (1749-1983).

The units are a count and there are 2,820 observations. The source of the dataset is credited to Andrews & Herzberg (1985).

Below is a sample of the first 5 rows of data including the header row.

Below is a plot of the entire dataset.

Monthly Sun Spot Dataset

The dataset shows seasonality with large differences between seasons.

### Daily Female Births Dataset

This dataset describes the number of daily female births in California in 1959.

The units are a count and there are 365 observations. The source of the dataset is credited to Newton (1988).

Below is a sample of the first 5 rows of data including the header row.

Below is a plot of the entire dataset.

Daily Female Births Dataset

## Multivariate Time Series Datasets

Multivariate datasets are generally more challenging and are the sweet spot for machine learning methods.

A great source of multivariate time series data is the UCI Machine Learning Repository.

Below is a selection of 3 recommended multivariate time series datasets from Meteorology, Medicine and Monitoring domains.

### EEG Eye State Dataset

This dataset describes EEG data for an individual and whether their eyes were open or closed. The objective of the problem is to predict whether eyes are open or closed given EEG data alone.

The objective of the problem is to predict whether eyes are open or closed given EEG data alone.

This is a classification predictive modeling problems and there are a total of 14,980 observations and 15 input variables. The class value of ‘1’ indicates the eye-closed and ‘0’ the eye-open state. Data is ordered by time and observations were recorded over a period of 117 seconds.

Data is ordered by time and observations were recorded over a period of 117 seconds.

Below is a sample of the first 5 rows with no header row.

### Occupancy Detection Dataset

This dataset describes measurements of a room and the objective is to predict whether or not the room is occupied.

There are 20,560 one-minute observations taken over the period of a few weeks. This is a classification prediction problem. There are 7 attributes including various light and climate properties of the room.

The source for the data is credited to Luis Candanedo from UMONS.

Below is a sample of the first 5 rows of data including the header row.

The data is provided in 3 files that suggest the splits that may be used for training and testing a model.

### Ozone Level Detection Dataset

This dataset describes 6 years of ground ozone concentration observations and the objective is to predict whether it is an “ozone day” or not.

The dataset contains 2,536 observations and 73 attributes. This is a classification prediction problem and the final attribute indicates the class value as “1” for an ozone day and “0” for a normal day.

Two versions of the data are provided, eight-hour peak set and one-hour peak set. I would suggest using the one hour peak set for now.

Below is a sample of the first 5 rows with no header row.

## Summary

In this post, you discovered a suite of standard time series forecast datasets that you can use to get started and practice time series forecasting with machine learning methods.

• 4 univariate time series forecasting datasets.
• 3 multivariate time series forecasting datasets.

Did you use one of the above datasets in your own project?

## Want to Develop Time Series Forecasts with Python?

#### Develop Your Own Forecasts in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Introduction to Time Series Forecasting With Python

It covers self-study tutorials and end-to-end projects on topics like: Loading data, visualization, modeling, algorithm tuning, and much more...

### 79 Responses to 7 Time Series Datasets for Machine Learning

1. R. Edwin July 6, 2017 at 3:27 am #

Hey there, great tutorial! I need your help:
I have to make a weather forecasting project for my college. It has to be based on a time series dataset I guess. But I’m having a difficult time trying to get a suitable multivariate dataset, also I would like to ask you for an ML model to use in this kind of problem. I will appreciate any resource you could provide me.

• Jason Brownlee July 6, 2017 at 10:26 am #

Consider your government’s meteorological organization. Most give data freely.

2. Parijat September 29, 2017 at 4:47 am #

Hi, I am looking for industrial time series datasets. Any suggestions.. Thanks.

• Jason Brownlee September 29, 2017 at 5:09 am #

What is wrong with the examples in this post?

• Mihir August 11, 2021 at 2:05 am #

I my work on weather dataset there are 4 classes clear, partially cloud , overcast,rain .
And I use lstm model . Which lstm model I use for multi classes classification.

• Jason Brownlee August 11, 2021 at 7:41 am #

I recommend trying a few different model architectures and compare results to classical ML models in order to discover what works well for your specific dataset.

3. Domenico November 4, 2017 at 12:45 am #

Hi Jason,
many thanks for your article, I found usefull datset.
I did not find any dataset on UCI about temperature and energy consumption inside a building, I was wondering if you could help me in some way.
I hope to hear from you soon

• Jason Brownlee November 4, 2017 at 5:31 am #

Sorry, I’m not aware of such a dataset off-the-cuff.

4. Nisha Chaube January 21, 2018 at 7:28 am #

I have a multivariate-dataset with observations from day 1 to 49 for each of the almost 30 patients. The end result is whether the patient has PTSD (1) or not ( 0 ). Please suggest how am I supposed to approach this problem in terms of data pre-processing.

5. VEERENDRA JONNALAGADDA June 1, 2018 at 5:22 am #

any sample code in python or C for time series ie preparing data via pandas(separating needed columns),analysing same for training,preparing model,training the model,applying same on test data…..

Please excuse me incase I have requesting anything wrong.

• Jason Brownlee June 1, 2018 at 8:26 am #

I have many examples, try searching on the blog.

6. Florent January 20, 2019 at 7:51 pm #

Hi, I am trying to create a model that uses past data (sales volume + weather condition for example) to predict the 5 next day of sales volume but I would like to use weather prediction of the next 5 days also to forecast the volumes.

Can you tell me about the model to use (I guess RNN) and how to build my dataset.

Regards

7. Avram March 8, 2019 at 11:38 pm #

Hi Jason,
My question may come to you a bit weird so that i beg your pardon in advance. I am working on short term load forecasting. As i know AEMO opens data about electricity. I can access the half-hourly load demand of past years(from 2006 through 2018) however i cannot access the half-hourly weather data(temperature and bulb) of Australian regions(QSL,VIC,NSW etc). I will make comparative analysis with journal papers so that i am looking for these data and authors of some papers did not shared their AEMO data yet. How can i get or find these data?Can you direct me on this issue?

• Jason Brownlee March 9, 2019 at 6:29 am #

My best advice is to contact the authors directly, and perhaps their advisors/colleagues?

8. fernando A gutierrez March 12, 2019 at 6:56 am #

I have a data set of shipping cost per day (in on year), however, not every day has a shipping cost. What’s the best we to deal with missing daily cost in order to make a Time Series analysis?

• Jason Brownlee March 12, 2019 at 7:00 am #

Perhaps start by filling the missing values with the mean/average values of the series?

9. one July 3, 2019 at 1:07 pm #

I need to find data set and decompose for BTS for fault prediction from fault history
total donw time and 3 cell/ sector how it coud possible

• Jason Brownlee July 4, 2019 at 7:37 am #

Perhaps check on Kaggle?

• nandy October 3, 2019 at 5:05 pm #

10. Abderahmane Bouziane July 23, 2019 at 6:20 am #

Do you think multivariate time series can take advantage of CNNs?
Can you combine CNNs with LSTMs?
How would you build a time series autoencoder for where each instant has 30 variables?

11. Shital September 19, 2019 at 3:59 pm #

Multivariate datasets are generally more challenging as you said. How to apply neural network algorithm on these datasets in WEKA? I am doing something wrong as I am getting the same result for yearly/monthly/weekly datasets. Please guide.

• Jason Brownlee September 20, 2019 at 5:35 am #

Good question.

There may be a way, I don’t have an example sorry.

• Shital Bhojani October 1, 2019 at 2:09 pm #

Yes, I found a way. We can use the Overlay for training and test data using the advance configuration in time series package. We can set the single or multiple dependent parameters in overlay. While using overlay, data set is separated automatically in training and test data as per the values we have set in Evaluation tab.

12. Shital Bhojani September 20, 2019 at 2:37 pm #

Ohhhk… Thanks for your prompt reply Jason. I am rendering around it.

13. Arjun November 19, 2019 at 3:35 pm #

Hi jason,
Can you help me on how to convert a txt file to csv file?

• Jason Brownlee November 20, 2019 at 6:08 am #

Perhaps change the file extension from .txt to .csv?

14. Arjun November 19, 2019 at 4:26 pm #

Is it mandatory to convert the text file into a csv file and then into a pandas dataframe for further work? Or does it provide conflict if it is not done?

• Jason Brownlee November 20, 2019 at 6:09 am #

No, Pandas does not care about file extensions, only the content.

do anyone have a discreet dataset?

16. Aashish Agarwal December 21, 2019 at 9:47 am #

Dear Jason,

Thank you for the wonderful post. I have a dataset, similar to Occupancy Detection Dataset, which you have described above.

1. Can we apply LSTMs, CNNs on these data?
2. Are these kind of data count under multivariate time series data? What I have understood till now, in time-series data there is a sequence in the rows and columns i.e. we can’t move any columns and any rows since time-series data have a sequence.
3. What kind of models can we apply to such a problem?

Regards,
Aashish

17. Rajesh December 22, 2019 at 11:29 pm #

Hey Jason, Great Post.

I deal with system and application monitoring data a lot. I am looking for production ready software that would help me store data in Timeseries Database and apply predictive analytics (RNN, S/ARIMA) continuously. I see there are couple of cool libraries like TICK stack, LoudML and Facebook prophet.

Any tutorial would be great demonstrating the deployment of such continuous predictive system.

Best Regards,
Rajesh

18. Laila January 8, 2020 at 6:54 am #

Hi Jason,

Where can I get information about RNN or LSTM time series prediction datasets that need improvements, for example in terms of accuracy?

• Jason Brownlee January 8, 2020 at 8:36 am #

We minimize error for time series, not accuracy.

What do you mean by “need improvement”?

If you want to solve real problems where people care about the outcome, perhaps start with kaggle or take on some consulting work?

19. GKboy March 30, 2020 at 7:17 pm #

Hi there,

Is there is any solution to handle 3d data with a “traditional” ML solution?
For example, if I have a time series generated with 1000 users. In this scenario, we have 1000x time series. How can I make a generalized Varmax or Arimax model for every user, if I don’t want to use LSTM ?

20. Remirab April 13, 2020 at 11:15 pm #

Hi there.

Do we categorize GPS trajectories as Univariate Time Series?

21. Suresh Reshu April 22, 2020 at 1:34 am #

can u post some thing like “How to prepare time series dataset for machine learning” that are implemented using sklearn

22. Shubhi Jain May 6, 2020 at 5:39 am #

Hi,

My data is in the format timestamp, no of customers. I want to convert it into an hourly time series. How should I do that?

• Jason Brownlee May 6, 2020 at 6:31 am #

It really depends on your data, sorry, I cannot give better advice than that.

23. Sachin Kannan August 31, 2020 at 12:19 am #

Hi Jason,

I have a dataset with columns as follows “Account Jan Feb Mar Q1 Apr May Jun Q2 Jul Aug Sep Q3 Oct Nov Dec Q4 YearTotal Year”

How am i suppose to consume this data for forecasting model as my month columns dont have any dates to them instead they have the sales figures for each account. Eg.

Account jan feb march Q1 Year
Revision 267829.5 279052.45 260298.54 807180.49 2019

My aim is to predict the Q3 and Q4 for the year 2020.

• Jason Brownlee August 31, 2020 at 6:16 am #

Perhaps start with a persistence model, then move on to evaluate a suite of models in order to discover what works well or best for your dataset.

• Sachin Kannan September 1, 2020 at 1:34 am #

I saw to your persistence model which you have used on shampoo and monthly car sales data. They both are univariate datasets in my case i have multivariate, can you please suggest how to approach mulivariate.

How to do time series by considering 3 to 5 columns and predict. If there is a way i can share some sample with you, if so do suggest.

24. Beste Karacay September 2, 2020 at 5:44 am #

Hi Jason,

What I would like to ask is this, I have a time series historical data. It is daily sales data however, I have different product id’s. For example, I have 3 different dates for product 1, but I have 8 different dates for product 2.
I am expected to build an algorithm to forecast the sales of any product for next day.
How should I proceed?

e.g.
productid date soldquantity
1 23.11.2018 0
21 30.11.2018 0
21 27.12.2018 0
21 9.01.2019 0
21 18.12.2018 0
21 5.01.2019 0
21 7.01.2019 0
21 31.12.2018 0
21 26.12.2018 0
21 25.12.2018 0
21 10.01.2019 0
31 1.12.2018 0
31 19.11.2018 0
31 11.11.2018 0
31 27.11.2018 0
31 22.11.2018 0

• Jason Brownlee September 2, 2020 at 6:34 am #

I would expect each product id is a separate series.

You can use a machine learning or deep learning model to learn per product or across products.

• Beste Karacay September 2, 2020 at 4:52 pm #

Hi again. Thanks for the quick answer.
I considered taking all products as a seperate series, however I have more than 10 thousand products.

Which machine learning method could be used? I am very new at this.

25. Gulzar January 1, 2021 at 12:03 am #

Hi! you may want to ctrl+f “At the time of writing, there are” and find that you left this sentence twice in a row. Thanks for the article! it helped me find a dataset I needed.

26. Gopal February 11, 2021 at 9:36 pm #

Jason, can you help us to understand FourierFeaturizer and how interpret it from pmdarima python package. I wanted to use it to forecast seasonal data with long seasonal periods.

Regular approach is taking good amount of time . so based https://robjhyndman.com/hyndsight/longseasonality/ exploring the usage of FourierFeaturizer.

• Jason Brownlee February 12, 2021 at 5:45 am #

Thank you for the suggestion.

27. Aashika Varma April 7, 2021 at 11:28 pm #

Hey Jason, the examples in this article look great! I’m actually looking for a signal processing dataset to apply time series modelling for a project. Could you suggest any open source datasets in this context?

28. Hanna July 30, 2021 at 3:05 am #

I have satellite time series (multivariate-dataset) with images from day 1 to 10 with almost 7 classes . Please suggest how am I supposed to approach this problem in terms of data augmentation

• Jason Brownlee July 30, 2021 at 6:32 am #

Perhaps you can use a pre-trained model with a custom CNN-LSTM type architecture.

29. Priyanka Mohan August 8, 2021 at 8:41 pm #

Hello Jason,

I have a GPS dataset (latitude, longitude, timestamp) as a dataset. Each track (series of GPS points) of a participant is compared with another participant who walks on the same track. I want to do time series classification on this data, which kind of data can this be?

Thank you

• Jason Brownlee August 9, 2021 at 5:55 am #

That sounds like a great project. It is time series classification, try a suite of models and discover what works well or best for your data.

30. Kone December 15, 2021 at 12:05 pm #

i am working with time series about education in my AI thesis project. the values are by year from 2013 to 2021, so i have nine records. i think it is a small dataset for a PHD, what do you think ?? Any suggestions ??

• Adrian Tam December 17, 2021 at 6:51 am #

9 records probably can’t help you go too far, but it should be a good start.

31. sham February 16, 2022 at 6:35 pm #

Hello !
Brother can you provide Supply chain multi mode(Air, Truck, ocean etc) travel time prediction dataset.

32. sham February 16, 2022 at 6:36 pm #

Hello !
Brother can you provide Supply chain multi mode(Air, Truck, ocean etc) travel time prediction dataset.
I will be very thankful !

• James Carmichael February 17, 2022 at 1:29 pm #

Hi Sham…I do not have such a dataset. You may want to check Kaggle or StackOverflow.

33. Hanson February 16, 2023 at 3:37 am #

Hello!
Can I get the reference about where datasets on the post came from?
I want to get more about each dataset.
Are they from ““Time Series Data Library” created by Rob Hyndman, Professor of Statistics at Monash University, Australia” as you said in the first part on the post as below?

“There are many sources of time series dataset, such as the “Time Series Data Library” created by Rob Hyndman, Professor of Statistics at Monash University, Australia”

• James Carmichael February 16, 2023 at 8:32 am #

Hi Hanson…each dataset contains a link that you can follow as the source. Also, in some cases the author’s name is provided so that you can perform a search on the author and the datasets they have published.

34. Hanson February 16, 2023 at 3:42 am #

“The source of the data is credited as the Australian Bureau of Meteorology.” for Minimum Daily Temperatures Dataset.

But could you let me know how to get the source of the data in detail?

Thank you!

• James Carmichael February 16, 2023 at 8:31 am #

Hi Hanson…each dataset contains a link that you can follow as the source. Also, in some cases the author’s name is provided so that you can perform a search on the author and the datasets they have published.

35. Hanson February 16, 2023 at 3:45 am #

I read the part about where the source data was such as

“The source of the data is credited as the Australian Bureau of Meteorology.” for Minimum Daily Temperatures Dataset.

But could you let me know how I can get the data through Australian Bureau of Meteorology in detail?

Thank you!

• James Carmichael February 16, 2023 at 8:32 am #

Hi Hanson…each dataset contains a link that you can follow as the source. Also, in some cases the author’s name is provided so that you can perform a search on the author and the datasets they have published.