How to Save a NumPy Array to File for Machine Learning

By Jason Brownlee on August 19, 2020 in Python Machine Learning 40

Developing machine learning models in Python often requires the use of NumPy arrays.

NumPy arrays are efficient data structures for working with data in Python, and machine learning models like those in the scikit-learn library, and deep learning models like those in the Keras library, expect input data in the format of NumPy arrays and make predictions in the format of NumPy arrays.

As such, it is common to need to save NumPy arrays to file.

For example, you may prepare your data with transforms like scaling and need to save it to file for later use. You may also use a model to make predictions and need to save the predictions to file for later use.

In this tutorial, you will discover how to save your NumPy arrays to file.

After completing this tutorial, you will know:

How to save NumPy arrays to CSV formatted files.
How to save NumPy arrays to NPY formatted files.
How to save NumPy arrays to compressed NPZ formatted files.

Kick-start your project with my new book Machine Learning Mastery With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

How to Save a NumPy Array to File for Machine Learning
Photo by Chris Combe, some rights reserved.

Tutorial Overview

This tutorial is divided into three parts; they are:

Save NumPy Array to .CSV File (ASCII)
Save NumPy Array to .NPY File (binary)
Save NumPy Array to .NPZ File (compressed)

1. Save NumPy Array to .CSV File (ASCII)

The most common file format for storing numerical data in files is the comma-separated variable format, or CSV for short.

It is most likely that your training data and input data to your models are stored in CSV files.

It can be convenient to save data to CSV files, such as the predictions from a model.

You can save your NumPy arrays to CSV files using the savetxt() function. This function takes a filename and array as arguments and saves the array into CSV format.

You must also specify the delimiter; this is the character used to separate each variable in the file, most commonly a comma. This can be set via the “delimiter” argument.

1.1 Example of Saving a NumPy Array to CSV File

The example below demonstrates how to save a single NumPy array to CSV format.

# save numpy array as csv file
from numpy import asarray
from numpy import savetxt
# define data
data = asarray([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
# save to csv file
savetxt('data.csv', data, delimiter=',')

# save numpy array as csv file

from numpy import asarray

from numpy import savetxt

# define data

data = asarray([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

# save to csv file

savetxt('data.csv', data, delimiter=',')

Running the example will define a NumPy array and save it to the file ‘data.csv‘.

The array has a single row of data with 10 columns. We would expect this data to be saved to a CSV file as a single row of data.

After running the example, we can inspect the contents of ‘data.csv‘.

We should see the following:

0.000000000000000000e+00,1.000000000000000000e+00,2.000000000000000000e+00,3.000000000000000000e+00,4.000000000000000000e+00,5.000000000000000000e+00,6.000000000000000000e+00,7.000000000000000000e+00,8.000000000000000000e+00,9.000000000000000000e+00

1	0.000000000000000000e+00,1.000000000000000000e+00,2.000000000000000000e+00,3.000000000000000000e+00,4.000000000000000000e+00,5.000000000000000000e+00,6.000000000000000000e+00,7.000000000000000000e+00,8.000000000000000000e+00,9.000000000000000000e+00

We can see that the data is correctly saved as a single row and that the floating point numbers in the array were saved with full precision.

1.2 Example of Loading a NumPy Array from CSV File

We can load this data later as a NumPy array using the loadtext() function and specify the filename and the same comma delimiter.

The complete example is listed below.

# load numpy array from csv file
from numpy import loadtxt
# load array
data = loadtxt('data.csv', delimiter=',')
# print the array
print(data)

# load numpy array from csv file

from numpy import loadtxt

# load array

data = loadtxt('data.csv', delimiter=',')

# print the array

print(data)

Running the example loads the data from the CSV file and prints the contents, matching our single row with 10 columns defined in the previous example.

[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]

1	[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]

2. Save NumPy Array to .NPY File (binary)

Sometimes we have a lot of data in NumPy arrays that we wish to save efficiently, but which we only need to use in another Python program.

Therefore, we can save the NumPy arrays into a native binary format that is efficient to both save and load.

This is common for input data that has been prepared, such as transformed data, that will need to be used as the basis for testing a range of machine learning models in the future or running many experiments.

The .npy file format is appropriate for this use case and is referred to as simply “NumPy format“.

This can be achieved using the save() NumPy function and specifying the filename and the array that is to be saved.

2.1 Example of Saving a NumPy Array to NPY File

The example below defines our two-dimensional NumPy array and saves it to a .npy file.

# save numpy array as npy file
from numpy import asarray
from numpy import save
# define data
data = asarray([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
# save to npy file
save('data.npy', data)

# save numpy array as npy file

from numpy import asarray

from numpy import save

# define data

data = asarray([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

# save to npy file

save('data.npy', data)

After running the example, you will see a new file in the directory with the name ‘data.npy‘.

You cannot inspect the contents of this file directly with your text editor because it is in binary format.

2.2 Example of Loading a NumPy Array from NPY File

You can load this file as a NumPy array later using the load() function.

The complete example is listed below.

# load numpy array from npy file
from numpy import load
# load array
data = load('data.npy')
# print the array
print(data)

# load numpy array from npy file

from numpy import load

# load array

data = load('data.npy')

# print the array

print(data)

Running the example will load the file and print the contents, confirming that both it was loaded correctly and that the content matches what we expect in the same two-dimensional format.

[[0 1 2 3 4 5 6 7 8 9]]

1	[[0 1 2 3 4 5 6 7 8 9]]

3. Save NumPy Array to .NPZ File (compressed)

Sometimes, we prepare data for modeling that needs to be reused across multiple experiments, but the data is large.

This might be pre-processed NumPy arrays like a corpus of text (integers) or a collection of rescaled image data (pixels). In these cases, it is desirable to both save the data to file, but also in a compressed format.

This allows gigabytes of data to be reduced to hundreds of megabytes and allows easy transmission to other servers of cloud computing for long algorithm runs.

The .npz file format is appropriate for this case and supports a compressed version of the native NumPy file format.

The savez_compressed() NumPy function allows multiple NumPy arrays to be saved to a single compressed .npz file.

3.1 Example of Saving a NumPy Array to NPZ File

We can use this function to save our single NumPy array to a compressed file.

The complete example is listed below.

# save numpy array as npz file
from numpy import asarray
from numpy import savez_compressed
# define data
data = asarray([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
# save to npy file
savez_compressed('data.npz', data)

# save numpy array as npz file

from numpy import asarray

from numpy import savez_compressed

# define data

data = asarray([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

# save to npy file

savez_compressed('data.npz', data)

Running the example defines the array and saves it into a file in compressed numpy format with the name ‘data.npz’.

As with the .npy format, we cannot inspect the contents of the saved file with a text editor because the file format is binary.

3.2 Example of Loading a NumPy Array from NPZ File

We can load this file later using the same load() function from the previous section.

In this case, the savez_compressed() function supports saving multiple arrays to a single file. Therefore, the load() function may load multiple arrays.

The loaded arrays are returned from the load() function in a dict with the names ‘arr_0’ for the first array, ‘arr_1’ for the second, and so on.

The complete example of loading our single array is listed below.

# load numpy array from npz file
from numpy import load
# load dict of arrays
dict_data = load('data.npz')
# extract the first array
data = dict_data['arr_0']
# print the array
print(data)

# load numpy array from npz file

from numpy import load

# load dict of arrays

dict_data = load('data.npz')

# extract the first array

data = dict_data['arr_0']

# print the array

print(data)

Running the example loads the compressed numpy file that contains a dictionary of arrays, then extracts the first array that we saved (we only saved one), then prints the contents, confirming the values and the shape of the array matches what we saved in the first place.

[[0 1 2 3 4 5 6 7 8 9]]

1	[[0 1 2 3 4 5 6 7 8 9]]

Summary

In this tutorial, you discovered how to save your NumPy arrays to file.

Specifically, you learned:

How to save NumPy arrays to CSV formatted files.
How to save NumPy arrays to NPY formatted files.
How to save NumPy arrays to compressed NPZ formatted files.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

40 Responses to How to Save a NumPy Array to File for Machine Learning

Eric November 13, 2019 at 8:03 am #

Very interesting. Is there a difference in performance among them? Especially between CSV and NPY? Unless there’s one, using the portable CSV might be more convenient.

I think that for fast file systems NPY should be faster than NPZ, but on very large arrays and slow file systems NPZ could sometimes be faster.

Reply
- Jason Brownlee November 13, 2019 at 1:44 pm #
  
  Thanks!
  
  Good question. I don’t have good stats on performance comparisons, although working with 10/100MB of random floats in an array would give results quickly.
  
  My expectation is that getting data into RAM fast, e.g. compressed would have the best performance.
  
  I use NPY and NPZ a lot myself.
  
  Reply
Deep Learner November 14, 2019 at 11:07 pm #

Hi Jason,

Thanks for the post, a very useful feature, it is a good complement to another good post on this very site which deals with models: https://machinelearningmastery.com/save-load-machine-learning-models-python-scikit-learn/

Reply
- Jason Brownlee November 15, 2019 at 7:53 am #
  
  Thanks!
  
  Reply
Darmawan Utomo November 15, 2019 at 7:18 pm #

Hi Jason,

It seems that savetxt is only for 1D or 2D.
But no problem with npy and npz.

Thank you.

Reply
- Jason Brownlee November 16, 2019 at 7:22 am #
  
  Yes.
  
  Reply
araya November 19, 2019 at 12:11 am #

please would you mind to give me the book regarding this title

Reply
- Jason Brownlee November 19, 2019 at 7:42 am #
  
  A book on saving NumPy arrays?
  
  What additional problems are you having exactly?
  
  Reply
Anirban Ray November 26, 2019 at 12:18 am #

Hi!

Can you please tell me whether it is possible to append to a .npy file?

For example, suppose I have an numpy array x, and stored it in x.npy. If I now want to append a few elements to it, do I have to load it, append, and then save again? Or, is there a way to append directly to the x.npy file without loading it?

Thanks for this and all the other great articles.

Reply
- Jason Brownlee November 26, 2019 at 6:06 am #
  
  Maybe.
  
  Instead, I would recommend loading it into memory, append to the array, then save it again.
  
  Reply
  - Eva February 12, 2020 at 11:40 pm #
    
    Great article, as always Dr. Jason!
    However, if the data is too large to fit in RAM, then loading the .npy file into memory, appending to the array, then saving it again would not be possible, I think.
    How to store very large data to .npy file then??
    
    Reply
    - Jason Brownlee February 13, 2020 at 5:41 am #
      
      Perhaps explore using a memory mapped file?
      https://docs.python.org/3.6/library/mmap.html
      
      Reply
Mona January 11, 2020 at 8:03 am #

How do I know what arr_0 is in any arbitrary npz file that I load?

data = dict_data[‘arr_0’]

Reply
- Jason Brownlee January 11, 2020 at 8:14 am #
  
  Arrays are loaded in the same order that they were saved.
  
  Reply
Esha January 11, 2020 at 10:41 pm #

After creating .npz file in google colab and saving it in google drive, How can I APPEND anything to the save .npz file

Reply
- Jason Brownlee January 12, 2020 at 8:02 am #
  
  You can load the array, concat the array and re-save it.
  
  Reply
Akil R March 8, 2020 at 5:38 am #

I tried this to dump a large list of arrays, I wasn’t successful. Process gets killed after long wait.

Reply
- Jason Brownlee March 8, 2020 at 6:16 am #
  
  Perhaps try a subset to confirm your code works.
  
  Reply
Mahshad March 31, 2020 at 10:04 am #

Thank you Jason.

I have saved an image as an array into a csv file but when I tried to display the image from the saved array it doesn’t show the picture.

This is while I can show the image from the array I made from the same picture (without saving as csv) but when it reads from csv file it doesn’t work.
I think it’s because of those dots that comes after each number in the array:

[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]

Do u know how can I get rid of those dots when reading from csv?

Reply
- Jason Brownlee March 31, 2020 at 1:34 pm #
  
  Yes, you may have to change the shape of the array and scale the pixel values before displaying tht using matplotlib.
  
  I do have examples of this on the blog.
  
  Reply
Rucha May 2, 2020 at 10:49 pm #

This is for a 1D array. How do I convert a 2D array to .csv format?

Reply
- Jason Brownlee May 3, 2020 at 6:10 am #
  
  The code is identical.
  
  Reply
  - Gill May 10, 2020 at 10:46 pm #
    
    Hoe moet je dan een 3D-array opslaan in een .csv bestand?
    
    Reply
    - Jason Brownlee May 11, 2020 at 5:59 am #
      
      The same as a 2d array. E.g. call the same functions to save and load it.
      
      Reply
      - Gill May 11, 2020 at 6:03 pm #
        
        Als ik dit uitvoer voor mijn array krijg ik deze foutmelding. Expected 1D or 2D array, got 4D array instead. Wilt dus dus eigenlijk zeggen dat ik een 4D array heb en nu komt opnieuw mijn vraag, hoe sla ik dit het beste op. Dit is voorlopig mijn code om dit op te slaan:
        # save numpy array as csv file
        from numpy import asarray
        from numpy import savetxt
        # define data
        nieuwe_array =asarray([[nieuwe_array]])
        # save to csv file
        savetxt(‘nieuwe_array.csv’, nieuwe_array, delimiter=’,’)
      - Jason Brownlee May 12, 2020 at 6:41 am #
        
        That is surprising.
        
        As far as I know, saving an array is agnostic to the size and dimensionality of the array.
Souvik Mukherjee October 3, 2020 at 7:15 am #

Jason,
Your tutorials rock!! I have enjoyed and benefited from a number of them. If there’s a place to provide a recommendation for your work, just let me know.
Thanks.
Souvik

Reply
- Jason Brownlee October 3, 2020 at 7:59 am #
  
  Thanks!
  
  Yes, anything you can do to spread the word on social media helps.
  
  Reply
Imdadul Haque October 4, 2020 at 3:34 am #

Is it possible to load image Dataset then convert the image dataset into csv file?

Reply
- Jason Brownlee October 4, 2020 at 6:54 am #
  
  No, we do not convert images to CSV files.
  
  Reply
JoAnn Alvarez November 19, 2020 at 6:35 am #

I’m wondering whether np.save, the np.savez_compressed, or some other method (joblib, json) would be best for my situation. The array is about 150 GB in memory.

First I tried json, but it exceeded my memory:

with open(‘X_train_list.json’, ‘w’) as file_handle:
json.dump(X_train.tolist(), file_handle)

I have X_train (a np array), and I first converted it using X_train.tolist(), and then used json.dump(). I think converting it to a list and/or using json.dump() saved copies in memory before writing.

My first priority is not exceeding my memory. Then other things to consider are: speed of writing/reading, file size, universality of file type (for example, is it going to easily break if I use a new version of dill, can other programs open it).

Do you have any advice? Much appreciated for this tutorial.

Reply
- Jason Brownlee November 19, 2020 at 7:55 am #
  
  That is large!
  
  My advice is to trial a few approaches and discover which meets your requirements. Maybe do a little research into specialized methods for managing large data, e.g. memory mapped files.
  
  Reply
Razi December 15, 2020 at 4:48 pm #

hello!
I want to save my all images (genuine and forgery) paths with their labels 1 as genuine and 0 as forgery, after comparison of genuine and forgery third column would be labels again which would show its genuine or forgery in the form 0 or 1, in the txt or csv file.
I am trying to do that but i couldn’t, Could you help me?

I want the below format,

Path1 Path2 labels
E/img.jpg 1 E/img.jpg 1 1
E/img.jpg 1 E/img.jpg 0 0

Reply
- Jason Brownlee December 16, 2020 at 7:44 am #
  
  Perhaps construct your data as an array in memory first, then save the array to file as a CSV.
  
  Reply
Mayank Mishra March 5, 2021 at 4:44 pm #

I ran a code to store my images (2868 of them) as an array for an image classification task. The shape of the array is (2868, 224, 224, 3). I saved the array using numpy.save(). But later as I try to load the array, it gives the following error, “cannot reshape array of size 92437951 into shape (2868,224,224,3)”. How to overcome this issue? Does numpy.load() not work for multidimensional arrays?

Reply
- Jason Brownlee March 6, 2021 at 5:14 am #
  
  That is very odd.
  
  Perhaps try posting your code and error message to stackoverflow.com
  
  Reply
Debajyoti Ghosh March 7, 2021 at 6:57 pm #

How to reading a NumPy matrix from CSV File and perform operations and access.
Say I want to find each row’s max

Reply
- Jason Brownlee March 8, 2021 at 4:47 am #
  
  This tutorial will show you how to load a CSV as a numpy array:
  https://machinelearningmastery.com/load-machine-learning-data-python/
  
  Reply
Ankit April 3, 2021 at 4:32 pm #

Hi, Jason could you please tell me how can I store the frame pixel values of a bunch of videos into a NumPy .npz file for training the model. And I am using LSTM for predicting some classes in all the videos, but it always shows a shape error when I use the LSTM layer. Can you please help me out?

Reply
- Jason Brownlee April 4, 2021 at 6:45 am #
  
  Sorry, I don’t have examples of loading and working with video, I can’t give you useful advice.
  
  In terms of video with models, I’d recommend a CNN-LSTM and you can see an example in the LSTM book:
  https://machinelearningmastery.com/lstms-with-python/
  
  Reply

Navigation

How to Save a NumPy Array to File for Machine Learning

Tutorial Overview

1. Save NumPy Array to .CSV File (ASCII)

1.1 Example of Saving a NumPy Array to CSV File

1.2 Example of Loading a NumPy Array from CSV File

2. Save NumPy Array to .NPY File (binary)

2.1 Example of Saving a NumPy Array to NPY File

2.2 Example of Loading a NumPy Array from NPY File

3. Save NumPy Array to .NPZ File (compressed)

3.1 Example of Saving a NumPy Array to NPZ File

3.2 Example of Loading a NumPy Array from NPZ File

Further Reading

Posts

APIs

Summary

Discover Fast Machine Learning in Python!

Develop Your Own Models in Minutes

Finally Bring Machine Learning To
Your Own Projects

More On This Topic

40 Responses to How to Save a NumPy Array to File for Machine Learning

Leave a Reply Click here to cancel reply.

Navigation

Tutorial Overview

1. Save NumPy Array to .CSV File (ASCII)

1.1 Example of Saving a NumPy Array to CSV File

1.2 Example of Loading a NumPy Array from CSV File

2. Save NumPy Array to .NPY File (binary)

2.1 Example of Saving a NumPy Array to NPY File

2.2 Example of Loading a NumPy Array from NPY File

3. Save NumPy Array to .NPZ File (compressed)

3.1 Example of Saving a NumPy Array to NPZ File

3.2 Example of Loading a NumPy Array from NPZ File

Further Reading

Posts

APIs

Summary

Discover Fast Machine Learning in Python!

Develop Your Own Models in Minutes

Finally Bring Machine Learning To Your Own Projects

More On This Topic

40 Responses to How to Save a NumPy Array to File for Machine Learning

Leave a Reply Click here to cancel reply.

Finally Bring Machine Learning To
Your Own Projects