Last Updated on August 19, 2020
Developing machine learning models in Python often requires the use of NumPy arrays.
NumPy arrays are efficient data structures for working with data in Python, and machine learning models like those in the scikit-learn library, and deep learning models like those in the Keras library, expect input data in the format of NumPy arrays and make predictions in the format of NumPy arrays.
As such, it is common to need to save NumPy arrays to file.
For example, you may prepare your data with transforms like scaling and need to save it to file for later use. You may also use a model to make predictions and need to save the predictions to file for later use.
In this tutorial, you will discover how to save your NumPy arrays to file.
After completing this tutorial, you will know:
- How to save NumPy arrays to CSV formatted files.
- How to save NumPy arrays to NPY formatted files.
- How to save NumPy arrays to compressed NPZ formatted files.
Kick-start your project with my new book Machine Learning Mastery With Python, including step-by-step tutorials and the Python source code files for all examples.
Let’s get started.

How to Save a NumPy Array to File for Machine Learning
Photo by Chris Combe, some rights reserved.
Tutorial Overview
This tutorial is divided into three parts; they are:
- Save NumPy Array to .CSV File (ASCII)
- Save NumPy Array to .NPY File (binary)
- Save NumPy Array to .NPZ File (compressed)
1. Save NumPy Array to .CSV File (ASCII)
The most common file format for storing numerical data in files is the comma-separated variable format, or CSV for short.
It is most likely that your training data and input data to your models are stored in CSV files.
It can be convenient to save data to CSV files, such as the predictions from a model.
You can save your NumPy arrays to CSV files using the savetxt() function. This function takes a filename and array as arguments and saves the array into CSV format.
You must also specify the delimiter; this is the character used to separate each variable in the file, most commonly a comma. This can be set via the “delimiter” argument.
1.1 Example of Saving a NumPy Array to CSV File
The example below demonstrates how to save a single NumPy array to CSV format.
1 2 3 4 5 6 7 |
# save numpy array as csv file from numpy import asarray from numpy import savetxt # define data data = asarray([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]) # save to csv file savetxt('data.csv', data, delimiter=',') |
Running the example will define a NumPy array and save it to the file ‘data.csv‘.
The array has a single row of data with 10 columns. We would expect this data to be saved to a CSV file as a single row of data.
After running the example, we can inspect the contents of ‘data.csv‘.
We should see the following:
1 |
0.000000000000000000e+00,1.000000000000000000e+00,2.000000000000000000e+00,3.000000000000000000e+00,4.000000000000000000e+00,5.000000000000000000e+00,6.000000000000000000e+00,7.000000000000000000e+00,8.000000000000000000e+00,9.000000000000000000e+00 |
We can see that the data is correctly saved as a single row and that the floating point numbers in the array were saved with full precision.
1.2 Example of Loading a NumPy Array from CSV File
We can load this data later as a NumPy array using the loadtext() function and specify the filename and the same comma delimiter.
The complete example is listed below.
1 2 3 4 5 6 |
# load numpy array from csv file from numpy import loadtxt # load array data = loadtxt('data.csv', delimiter=',') # print the array print(data) |
Running the example loads the data from the CSV file and prints the contents, matching our single row with 10 columns defined in the previous example.
1 |
[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.] |
2. Save NumPy Array to .NPY File (binary)
Sometimes we have a lot of data in NumPy arrays that we wish to save efficiently, but which we only need to use in another Python program.
Therefore, we can save the NumPy arrays into a native binary format that is efficient to both save and load.
This is common for input data that has been prepared, such as transformed data, that will need to be used as the basis for testing a range of machine learning models in the future or running many experiments.
The .npy file format is appropriate for this use case and is referred to as simply “NumPy format“.
This can be achieved using the save() NumPy function and specifying the filename and the array that is to be saved.
2.1 Example of Saving a NumPy Array to NPY File
The example below defines our two-dimensional NumPy array and saves it to a .npy file.
1 2 3 4 5 6 7 |
# save numpy array as npy file from numpy import asarray from numpy import save # define data data = asarray([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]) # save to npy file save('data.npy', data) |
After running the example, you will see a new file in the directory with the name ‘data.npy‘.
You cannot inspect the contents of this file directly with your text editor because it is in binary format.
2.2 Example of Loading a NumPy Array from NPY File
You can load this file as a NumPy array later using the load() function.
The complete example is listed below.
1 2 3 4 5 6 |
# load numpy array from npy file from numpy import load # load array data = load('data.npy') # print the array print(data) |
Running the example will load the file and print the contents, confirming that both it was loaded correctly and that the content matches what we expect in the same two-dimensional format.
1 |
[[0 1 2 3 4 5 6 7 8 9]] |
3. Save NumPy Array to .NPZ File (compressed)
Sometimes, we prepare data for modeling that needs to be reused across multiple experiments, but the data is large.
This might be pre-processed NumPy arrays like a corpus of text (integers) or a collection of rescaled image data (pixels). In these cases, it is desirable to both save the data to file, but also in a compressed format.
This allows gigabytes of data to be reduced to hundreds of megabytes and allows easy transmission to other servers of cloud computing for long algorithm runs.
The .npz file format is appropriate for this case and supports a compressed version of the native NumPy file format.
The savez_compressed() NumPy function allows multiple NumPy arrays to be saved to a single compressed .npz file.
3.1 Example of Saving a NumPy Array to NPZ File
We can use this function to save our single NumPy array to a compressed file.
The complete example is listed below.
1 2 3 4 5 6 7 |
# save numpy array as npz file from numpy import asarray from numpy import savez_compressed # define data data = asarray([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]) # save to npy file savez_compressed('data.npz', data) |
Running the example defines the array and saves it into a file in compressed numpy format with the name ‘data.npz’.
As with the .npy format, we cannot inspect the contents of the saved file with a text editor because the file format is binary.
3.2 Example of Loading a NumPy Array from NPZ File
We can load this file later using the same load() function from the previous section.
In this case, the savez_compressed() function supports saving multiple arrays to a single file. Therefore, the load() function may load multiple arrays.
The loaded arrays are returned from the load() function in a dict with the names ‘arr_0’ for the first array, ‘arr_1’ for the second, and so on.
The complete example of loading our single array is listed below.
1 2 3 4 5 6 7 8 |
# load numpy array from npz file from numpy import load # load dict of arrays dict_data = load('data.npz') # extract the first array data = dict_data['arr_0'] # print the array print(data) |
Running the example loads the compressed numpy file that contains a dictionary of arrays, then extracts the first array that we saved (we only saved one), then prints the contents, confirming the values and the shape of the array matches what we saved in the first place.
1 |
[[0 1 2 3 4 5 6 7 8 9]] |
Further Reading
This section provides more resources on the topic if you are looking to go deeper.
Posts
- How To Load Machine Learning Data in Python
- A Gentle Introduction to NumPy Arrays in Python
- How to Index, Slice and Reshape NumPy Arrays for Machine Learning
APIs
- numpy.savetxt API
- numpy.save API
- numpy.savez API
- numpy.savez_compressed API
- numpy.load API
- numpy.loadtxt API
Summary
In this tutorial, you discovered how to save your NumPy arrays to file.
Specifically, you learned:
- How to save NumPy arrays to CSV formatted files.
- How to save NumPy arrays to NPY formatted files.
- How to save NumPy arrays to compressed NPZ formatted files.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Very interesting. Is there a difference in performance among them? Especially between CSV and NPY? Unless there’s one, using the portable CSV might be more convenient.
I think that for fast file systems NPY should be faster than NPZ, but on very large arrays and slow file systems NPZ could sometimes be faster.
Thanks!
Good question. I don’t have good stats on performance comparisons, although working with 10/100MB of random floats in an array would give results quickly.
My expectation is that getting data into RAM fast, e.g. compressed would have the best performance.
I use NPY and NPZ a lot myself.
Hi Jason,
Thanks for the post, a very useful feature, it is a good complement to another good post on this very site which deals with models: https://machinelearningmastery.com/save-load-machine-learning-models-python-scikit-learn/
Thanks!
Hi Jason,
It seems that savetxt is only for 1D or 2D.
But no problem with npy and npz.
Thank you.
Yes.
please would you mind to give me the book regarding this title
A book on saving NumPy arrays?
What additional problems are you having exactly?
Hi!
Can you please tell me whether it is possible to append to a .npy file?
For example, suppose I have an numpy array
x
, and stored it inx.npy
. If I now want to append a few elements to it, do I have to load it, append, and then save again? Or, is there a way to append directly to thex.npy
file without loading it?Thanks for this and all the other great articles.
Maybe.
Instead, I would recommend loading it into memory, append to the array, then save it again.
Great article, as always Dr. Jason!
However, if the data is too large to fit in RAM, then loading the .npy file into memory, appending to the array, then saving it again would not be possible, I think.
How to store very large data to .npy file then??
Perhaps explore using a memory mapped file?
https://docs.python.org/3.6/library/mmap.html
How do I know what arr_0 is in any arbitrary npz file that I load?
data = dict_data[‘arr_0’]
Arrays are loaded in the same order that they were saved.
After creating .npz file in google colab and saving it in google drive, How can I APPEND anything to the save .npz file
You can load the array, concat the array and re-save it.
I tried this to dump a large list of arrays, I wasn’t successful. Process gets killed after long wait.
Perhaps try a subset to confirm your code works.
Thank you Jason.
I have saved an image as an array into a csv file but when I tried to display the image from the saved array it doesn’t show the picture.
This is while I can show the image from the array I made from the same picture (without saving as csv) but when it reads from csv file it doesn’t work.
I think it’s because of those dots that comes after each number in the array:
[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
Do u know how can I get rid of those dots when reading from csv?
Yes, you may have to change the shape of the array and scale the pixel values before displaying tht using matplotlib.
I do have examples of this on the blog.
This is for a 1D array. How do I convert a 2D array to .csv format?
The code is identical.
Hoe moet je dan een 3D-array opslaan in een .csv bestand?
The same as a 2d array. E.g. call the same functions to save and load it.
Als ik dit uitvoer voor mijn array krijg ik deze foutmelding. Expected 1D or 2D array, got 4D array instead. Wilt dus dus eigenlijk zeggen dat ik een 4D array heb en nu komt opnieuw mijn vraag, hoe sla ik dit het beste op. Dit is voorlopig mijn code om dit op te slaan:
# save numpy array as csv file
from numpy import asarray
from numpy import savetxt
# define data
nieuwe_array =asarray([[nieuwe_array]])
# save to csv file
savetxt(‘nieuwe_array.csv’, nieuwe_array, delimiter=’,’)
That is surprising.
As far as I know, saving an array is agnostic to the size and dimensionality of the array.
Jason,
Your tutorials rock!! I have enjoyed and benefited from a number of them. If there’s a place to provide a recommendation for your work, just let me know.
Thanks.
Souvik
Thanks!
Yes, anything you can do to spread the word on social media helps.
Is it possible to load image Dataset then convert the image dataset into csv file?
No, we do not convert images to CSV files.
I’m wondering whether np.save, the np.savez_compressed, or some other method (joblib, json) would be best for my situation. The array is about 150 GB in memory.
First I tried json, but it exceeded my memory:
with open(‘X_train_list.json’, ‘w’) as file_handle:
json.dump(X_train.tolist(), file_handle)
I have X_train (a np array), and I first converted it using X_train.tolist(), and then used json.dump(). I think converting it to a list and/or using json.dump() saved copies in memory before writing.
My first priority is not exceeding my memory. Then other things to consider are: speed of writing/reading, file size, universality of file type (for example, is it going to easily break if I use a new version of dill, can other programs open it).
Do you have any advice? Much appreciated for this tutorial.
That is large!
My advice is to trial a few approaches and discover which meets your requirements. Maybe do a little research into specialized methods for managing large data, e.g. memory mapped files.
hello!
I want to save my all images (genuine and forgery) paths with their labels 1 as genuine and 0 as forgery, after comparison of genuine and forgery third column would be labels again which would show its genuine or forgery in the form 0 or 1, in the txt or csv file.
I am trying to do that but i couldn’t, Could you help me?
I want the below format,
Path1 Path2 labels
E/img.jpg 1 E/img.jpg 1 1
E/img.jpg 1 E/img.jpg 0 0
Perhaps construct your data as an array in memory first, then save the array to file as a CSV.
I ran a code to store my images (2868 of them) as an array for an image classification task. The shape of the array is (2868, 224, 224, 3). I saved the array using numpy.save(). But later as I try to load the array, it gives the following error, “cannot reshape array of size 92437951 into shape (2868,224,224,3)”. How to overcome this issue? Does numpy.load() not work for multidimensional arrays?
That is very odd.
Perhaps try posting your code and error message to stackoverflow.com
How to reading a NumPy matrix from CSV File and perform operations and access.
Say I want to find each row’s max
This tutorial will show you how to load a CSV as a numpy array:
https://machinelearningmastery.com/load-machine-learning-data-python/
Hi, Jason could you please tell me how can I store the frame pixel values of a bunch of videos into a NumPy .npz file for training the model. And I am using LSTM for predicting some classes in all the videos, but it always shows a shape error when I use the LSTM layer. Can you please help me out?
Sorry, I don’t have examples of loading and working with video, I can’t give you useful advice.
In terms of video with models, I’d recommend a CNN-LSTM and you can see an example in the LSTM book:
https://machinelearningmastery.com/lstms-with-python/