How to Save a NumPy Array to File for Machine Learning

Last Updated on

Developing machine learning models in Python often requires the use of NumPy arrays.

NumPy arrays are efficient data structures for working with data in Python, and machine learning models like those in the scikit-learn library, and deep learning models like those in the Keras library, expect input data in the format of NumPy arrays and make predictions in the format of NumPy arrays.

As such, it is common to need to save NumPy arrays to file.

For example, you may prepare your data with transforms like scaling and need to save it to file for later use. You may also use a model to make predictions and need to save the predictions to file for later use.

In this tutorial, you will discover how to save your NumPy arrays to file.

After completing this tutorial, you will know:

  • How to save NumPy arrays to CSV formatted files.
  • How to save NumPy arrays to NPY formatted files.
  • How to save NumPy arrays to compressed NPZ formatted files.

Let’s get started.

How to Save a NumPy Array to File for Machine Learning

How to Save a NumPy Array to File for Machine Learning
Photo by Chris Combe, some rights reserved.

Tutorial Overview

This tutorial is divided into three parts; they are:

  1. Save NumPy Array to .CSV File (ASCII)
  2. Save NumPy Array to .NPY File (binary)
  3. Save NumPy Array to .NPZ File (compressed)

1. Save NumPy Array to .CSV File (ASCII)

The most common file format for storing numerical data in files is the comma-separated variable format, or CSV for short.

It is most likely that your training data and input data to your models are stored in CSV files.

It can be convenient to save data to CSV files, such as the predictions from a model.

You can save your NumPy arrays to CSV files using the savetxt() function. This function takes a filename and array as arguments and saves the array into CSV format.

You must also specify the delimiter; this is the character used to separate each variable in the file, most commonly a comma. This can be set via the “delimiter” argument.

1.1 Example of Saving a NumPy Array to CSV File

The example below demonstrates how to save a single NumPy array to CSV format.

Running the example will define a NumPy array and save it to the file ‘data.csv‘.

The array has a single row of data with 10 columns. We would expect this data to be saved to a CSV file as a single row of data.

After running the example, we can inspect the contents of ‘data.csv‘.

We should see the following:

We can see that the data is correctly saved as a single row and that the floating point numbers in the array were saved with full precision.

1.2 Example of Loading a NumPy Array from CSV File

We can load this data later as a NumPy array using the loadtext() function and specify the filename and the same comma delimiter.

The complete example is listed below.

Running the example loads the data from the CSV file and prints the contents, matching our single row with 10 columns defined in the previous example.

2. Save NumPy Array to .NPY File (binary)

Sometimes we have a lot of data in NumPy arrays that we wish to save efficiently, but which we only need to use in another Python program.

Therefore, we can save the NumPy arrays into a native binary format that is efficient to both save and load.

This is common for input data that has been prepared, such as transformed data, that will need to be used as the basis for testing a range of machine learning models in the future or running many experiments.

The .npy file format is appropriate for this use case and is referred to as simply “NumPy format“.

This can be achieved using the save() NumPy function and specifying the filename and the array that is to be saved.

2.1 Example of Saving a NumPy Array to NPY File

The example below defines our two-dimensional NumPy array and saves it to a .npy file.

After running the example, you will see a new file in the directory with the name ‘data.npy‘.

You cannot inspect the contents of this file directly with your text editor because it is in binary format.

2.2 Example of Loading a NumPy Array from NPY File

You can load this file as a NumPy array later using the load() function.

The complete example is listed below.

Running the example will load the file and print the contents, confirming that both it was loaded correctly and that the content matches what we expect in the same two-dimensional format.

3. Save NumPy Array to .NPZ File (compressed)

Sometimes, we prepare data for modeling that needs to be reused across multiple experiments, but the data is large.

This might be pre-processed NumPy arrays like a corpus of text (integers) or a collection of rescaled image data (pixels). In these cases, it is desirable to both save the data to file, but also in a compressed format.

This allows gigabytes of data to be reduced to hundreds of megabytes and allows easy transmission to other servers of cloud computing for long algorithm runs.

The .npz file format is appropriate for this case and supports a compressed version of the native NumPy file format.

The savez_compressed() NumPy function allows multiple NumPy arrays to be saved to a single compressed .npz file.

3.1 Example of Saving a NumPy Array to NPZ File

We can use this function to save our single NumPy array to a compressed file.

The complete example is listed below.

Running the example defines the array and saves it into a file in compressed numpy format with the name ‘data.npz’.

As with the .npy format, we cannot inspect the contents of the saved file with a text editor because the file format is binary.

3.2 Example of Loading a NumPy Array from NPZ File

We can load this file later using the same load() function from the previous section.

In this case, the savez_compressed() function supports saving multiple arrays to a single file. Therefore, the load() function may load multiple arrays.

The loaded arrays are returned from the load() function in a dict with the names ‘arr_0’ for the first array, ‘arr_1’ for the second, and so on.

The complete example of loading our single array is listed below.

Running the example loads the compressed numpy file that contains a dictionary of arrays, then extracts the first array that we saved (we only saved one), then prints the contents, confirming the values and the shape of the array matches what we saved in the first place.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Posts

APIs

Summary

In this tutorial, you discovered how to save your NumPy arrays to file.

Specifically, you learned:

  • How to save NumPy arrays to CSV formatted files.
  • How to save NumPy arrays to NPY formatted files.
  • How to save NumPy arrays to compressed NPZ formatted files.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Discover Fast Machine Learning in Python!

Master Machine Learning With Python

Develop Your Own Models in Minutes

...with just a few lines of scikit-learn code

Learn how in my new Ebook:
Machine Learning Mastery With Python

Covers self-study tutorials and end-to-end projects like:
Loading data, visualization, modeling, tuning, and much more...

Finally Bring Machine Learning To
Your Own Projects

Skip the Academics. Just Results.

See What's Inside

10 Responses to How to Save a NumPy Array to File for Machine Learning

  1. Eric November 13, 2019 at 8:03 am #

    Very interesting. Is there a difference in performance among them? Especially between CSV and NPY? Unless there’s one, using the portable CSV might be more convenient.

    I think that for fast file systems NPY should be faster than NPZ, but on very large arrays and slow file systems NPZ could sometimes be faster.

    • Jason Brownlee November 13, 2019 at 1:44 pm #

      Thanks!

      Good question. I don’t have good stats on performance comparisons, although working with 10/100MB of random floats in an array would give results quickly.

      My expectation is that getting data into RAM fast, e.g. compressed would have the best performance.

      I use NPY and NPZ a lot myself.

  2. Deep Learner November 14, 2019 at 11:07 pm #

    Hi Jason,

    Thanks for the post, a very useful feature, it is a good complement to another good post on this very site which deals with models: https://machinelearningmastery.com/save-load-machine-learning-models-python-scikit-learn/

  3. Darmawan Utomo November 15, 2019 at 7:18 pm #

    Hi Jason,

    It seems that savetxt is only for 1D or 2D.
    But no problem with npy and npz.

    Thank you.

  4. araya November 19, 2019 at 12:11 am #

    please would you mind to give me the book regarding this title

    • Jason Brownlee November 19, 2019 at 7:42 am #

      A book on saving NumPy arrays?

      What additional problems are you having exactly?

  5. Anirban Ray November 26, 2019 at 12:18 am #

    Hi!

    Can you please tell me whether it is possible to append to a .npy file?

    For example, suppose I have an numpy array x, and stored it in x.npy. If I now want to append a few elements to it, do I have to load it, append, and then save again? Or, is there a way to append directly to the x.npy file without loading it?

    Thanks for this and all the other great articles.

    • Jason Brownlee November 26, 2019 at 6:06 am #

      Maybe.

      Instead, I would recommend loading it into memory, append to the array, then save it again.

Leave a Reply