How to Plan and Run Machine Learning Experiments Systematically

Machine learning experiments can take a long time. Hours, days, and even weeks in some cases.

This gives you a lot of time to think and plan for additional experiments to perform.

In addition, the average applied machine learning project may require tens to hundreds of discrete experiments in order to find a data preparation model and model configuration that gives good or great performance.

The drawn-out nature of the experiments means that you need to carefully plan and manage the order and type of experiments that you run.

You need to be systematic.

In this post, you will discover a simple approach to plan and manage your machine learning experiments.

With this approach, you will be able to:

  • Stay on top of the most important questions and findings in your project.
  • Keep track of what experiments you have completed and would like to run.
  • Zoom in on the data preparations, models, and model configurations that give the best performance.

Let’s dive in.

How to Plan and Run Machine Learning Experiments Systematically

How to Plan and Run Machine Learning Experiments Systematically
Photo by Qfamily, some rights reserved.

Confusion of Hundreds of Experiments

I like to run experiments overnight. Lots of experiments.

This is so that when I wake up, I can check results, update my ideas of what is working (and what is not), and kick off the next round of experiments, then spend some time analyzing the findings.

I hate wasting time.

And I hate running experiments that do not get me closer to the goal of finding the most skillful model, given the time and resources I have available.

It is easy to lose track of where you’re up to. Especially after you have results, analysis, and findings from hundreds of experiments.

Poor management of your experiments can lead to bad situations where:

  • You’re watching experiments run.
  • You’re trying to come up with good ideas of experiments to run right after a current batch has finished.
  • You run an experiment that you had already run before.

You never want to be in any of these situations!

If you are on top of your game, then:

  • You know exactly what experiments you have run at a glance and what the findings were.
  • You have a long list of experiments to run, ordered by their expected payoff.
  • You have the time to dive into the analysis of results and think up new and wild ideas to try.

But how can we stay on top of hundreds of experiments?

Design and Run Experiments Systematically

One way that I have found to help me be systematic with experiments on a project is to use a spreadsheet.

Manage the experiments you have done, that are running, and that you want to run in a spreadsheet.

It is simple and effective.


It is simple in that I or anyone can access it from anywhere and see where we’re at.

I use Google Docs to host the spreadsheet.

There’s no code. No notebook. No fancy web app.

Just a spreadsheet.


It’s effective because it only contains the information needed with one line per experiment and one column for each piece of information to track on the experiment.

Experiments that are done can be separated from those that are planned.

Only experiments that are planned are set-up and run and their order ensures that the most important experiments are run first.

You will be surprised at how much such a simple approach can free up your time and get you thinking deeply about your project.

Example Spreadsheet

Let’s look at an example.

We can imagine a spreadsheet with the columns below.

These are just an example from the last project I worked on. I recommend adapting these to your own needs.

  • Sub-Project: A subproject may be a group of ideas you are exploring, a technique, a data preparation, and so on.
  • Context: The context may be the specific objective such as beating a baseline, tuning, a diagnostic, and so on.
  • Setup: The setup is the fixed configuration of the experiment.
  • Name: The name is the unique identifier, perhaps the filename of the script.
  • Parameter: The parameter is the thing being varied or looked at in the experiment.
  • Values: The value is the value or values of the parameter that are being explored in the experiment.
  • Status: The status is the status of the experiment, such as planned, running, or done.
  • Skill: The skill is the North Star metric that really matters on the project, like accuracy or error.
  • Question: The question is the motivating question the experiment seeks to address.
  • Finding: The finding is the one line summary of the outcome of the experiment, the answer to the question.

To make this concrete, below is a screenshot of a Google Doc spreadsheet with these column headings and a contrived example.

Systematic Experimental Record

Systematic Experimental Record

I cannot say how much time this approach has saved me. And the number of assumptions that it proved wrong in the pursuit of getting top results.

In fact, I’ve discovered that deep learning methods are often quite hostile to assumptions and defaults. Keep this in mind when designing experiments!

Get The Most Out of Your Experiments

Below are some tips that will help you get the most out of this simple approach on your project.

  • Brainstorm: Make the time to frequently review findings and list new questions and experiments to answer them.
  • Challenge: Challenge assumptions and challenge previous findings. Play the scientist and design experiments that would falsify your findings or expectations.
  • Sub-Projects: Consider the use of sub-projects to structure your investigation where you follow leads or investigate specific methods.
  • Experimental Order: Use the row order as a priority to ensure that the most important experiments are run first.
  • Deeper Analysis: Save deeper analysis of results and aggregated findings to another document; the spreadsheet is not the place.
  • Experiment Types: Don’t be afraid to mix-in different experiment types such as grid searching, spot checks, and model diagnostics.

You will know that this approach is working well when:

  • You are scouring API documentation and papers for more ideas of things to try.
  • You have far more experiments queued up than resources to run them.
  • You are thinking seriously about hiring a ton more EC2 instances.


In this post, you discovered how you can effectively manage hundreds of experiments that have run, are running, and that you want to run in a spreadsheet.

You discovered that a simple spreadsheet can help you:

  • Keep track of what experiments you have run and what you discovered.
  • Keep track of what experiments you want to run and what questions they will answer.
  • Zoom in on the most effective data preparation, model, and model configuration for your predictive modeling problem.

Do you have any questions about this approach? Have you done something similar yourself?
Let me know in the comments below.

30 Responses to How to Plan and Run Machine Learning Experiments Systematically

  1. Avatar
    Stephen August 4, 2017 at 5:55 am #

    Listing sub-projects in a spreadsheet is great for tracking the effects of single decisions, but what about when there are more? Eg. say you want to see the effect of scaling data (maybe different approaches, maybe only certain variables/columns, etc), and also imputation of missing data (maybe none vs different algorithms).

    With N such different decisions to evaluate, it turns into an N-dimensional matrix of results, not so easily represented in a spreadsheet! How do you handle this N-dimensional grid-search of analysis decisions?

    • Avatar
      Jason Brownlee August 4, 2017 at 7:05 am #

      One experiment should test one thing.

      You can test more, but cause-effect will become tangled.

      • Avatar
        Jon August 4, 2017 at 4:24 pm #

        Agreed. The same principle applies as for troubleshooting a performance problem. Alter one parameter at once and it fixes the problem, you know that’s the cause. Alter several, and any one of them could have made the difference.

        • Avatar
          Jason Brownlee August 5, 2017 at 5:44 am #

          Yep. Experimental design in the scientific method and all that.

          • Avatar
            gezmi August 7, 2017 at 5:49 pm #

            Thanks for this post, I needed it. I have started to use a table similar to this just a week ago.

            What about models with multiple hyperparameters to tune? Are those separate
            experiments or do you treat them as one?

          • Avatar
            Jason Brownlee August 8, 2017 at 7:44 am #

            Glad to hear it!

            I try to treat them as separate experiments if possible.

            The interactions may be nonlinear between some parameters, so running multivariate grid searches is still desirable.

            Perhaps add new columns or add more detail to the “what is being tested” column.

          • Avatar
            Stephen August 11, 2017 at 6:22 am #

            As you mentioned, there may be interactions between parameters (and other processing decisions).

            Running separate experiments makes sense for determining causality, but doesn’t deal with optimizing parameter/decision choices.

            How would you use a grid search systematically with your spreadsheet?

          • Avatar
            Jason Brownlee August 11, 2017 at 6:51 am #

            I rarely need results that good, and you hit a point of diminishing returns.

            Plus, I often am training deep nets that are super slow and it’s rarely possible to even use cross validation.

            If you are working on Kaggle or something, then I would recommend searching all combinations. Consider naming the run and listing all hyperparameter ranges tested in the suitable column – the same as any other run. This would work fine for random or grid searching.

    • Avatar
      Calvin De Lima November 4, 2018 at 12:25 am #

      A possible solution is to add some sort of “ID” column and then create multiple rows with values of all columns remaining the same for a particular experiment except for the Parameter and Parameter Values column which include the hyperparameters you’re trying to optimize. So if you had a model with two sets of hyperparameter values, you would include two rows associated with ID #1.


  2. Avatar
    Sebastian August 4, 2017 at 6:09 am #

    Really nice post! Would be interesting to get see how you use this systematic approach on a particular problem that you also describe (such as one of the problems you’ve worked on earlier on this blog). To see the full problem solving procedure and not just what worked well in the end!

    • Avatar
      Jason Brownlee August 4, 2017 at 7:06 am #

      I would love to, but it would be a very long post.

      I used it recently on a multvariate time series forecasting problem with LSTM and that was the reason I decided to write this up as a post.

  3. Avatar
    Eustace August 4, 2017 at 6:12 pm #

    Nice post and good approach. Would like to try it out in my next project.
    Thanks for sharing.

    • Avatar
      Jason Brownlee August 5, 2017 at 5:44 am #

      Thanks. Please do and let me know how you go. I found it invaluable myself. Massive time saver.

  4. Avatar
    Luis August 8, 2017 at 6:52 pm #

    Hi, Jason.

    I was looking info about tools to help with the iterative machine learning process. There isn’t a clean reference tool but these projects look like interesting ones:

    – Data Version Control:
    – ModelDB:

    • Avatar
      Jason Brownlee August 9, 2017 at 6:27 am #

      Interesting, thanks for sharing.

      I use my own scripts at this stage, but perhaps the time for using a platform is coming…

      • Avatar
        Luis August 9, 2017 at 7:46 pm #

        It would be very interesting to know your expert opinion about these or other similar tools.

        We’ll stay tuned 🙂

  5. Avatar
    Chris S. August 19, 2017 at 11:10 pm #

    For this exact reason I started with to have a simple api I can push my stats to and look at them.
    Since I started a few other, sadly priced, alternatives started. MIght be a worthwile path, depending on the work one does.

  6. Avatar
    Sean September 18, 2017 at 11:31 pm #

    • Avatar
      Jason Brownlee September 19, 2017 at 7:46 am #

      Thanks for sharing Sean, is this your blog?

      • Avatar
        Sean O'Connor October 2, 2017 at 3:19 pm #

        Yes, it is. I just posted some information about a perspective you can take – deep neural networks as pattern based fuzzy logic.
        And then there has been a recent paper by Luke Godfrey and Michael Gashler demonstrating fuzzy logic activation functions for neural network.

  7. Avatar
    Xu Zhang May 25, 2019 at 11:44 am #

    Reread your post and wish to see your update on this topic.
    Your method is simple and easy. But when we have to manage so many experiments at the same time, manually using the excel sheet is not feasible somehow. Would you like to share your scripts? Thanks.

  8. Avatar
    hello June 11, 2020 at 2:02 am #

    absolute trash, logging experiments on a spreadsheet? are you from the 1990’s ?

    • Avatar
      Jason Brownlee June 11, 2020 at 6:01 am #

      Thanks for your opinion. Yes, I started ml in the 90s. Some of us are old school.

      What is a better way?

    • Avatar
      Yasser Taima December 17, 2020 at 12:21 pm #

      I’m from the 1980s and use vi as a code editor. Are you going to trash talk that as well? If so, tread carefully or you’ll piss off a lot of highly competent, influential people, likely bigger and stronger in the field than you. If not, why?

  9. Avatar
    Glauber April 19, 2021 at 10:49 pm #

    Thanks for your great post!.I was akeen to install an experiment tracking API, then I read your post and changed my mind. If it’s simple and functional, take it !! That’s my rule #1. You made my day.

Leave a Reply