Bootstrapping Machine Learning: Book Review

Louis Dorard has released his book titled Bootstrapping Machine Learning. It’s a book that provides a gentle introduction to the field of machine learning targeted at developers and start-ups with a focus on prediction APIs.

I just finished reading this book and I want to share some my thoughts. If you are interested, I have already reviewed the sample Louis provides on his webpage that covers the first two chapters.

Bootstrapping Machine Learning

Bootstrapping Machine Learning

Overview

The book is broken down into eight chapters, as follows.

  1. Introduction
  2. Machine Learning and Artificial Intelligence
  3. Concepts
  4. Examples
  5. Applying Machine Learning to Your Domain
  6. Prediction APIs
  7. Case Study: Priority Inbox
  8. Wrap-up

Prediction APIs

Louis provides a taxonomy of prediction APIs in Chapter 3. I’m not clear whether it is his own taxonomy or a general breakdown that is used in the field, but I found it useful nevertheless. He classifies prediction APIs as follows:

  • Specialized AP‭Is: These are APIs that solve a specific problem such as sentiment analysis of tweets or face recognition in images.
  • Generic APIs: these are general machine learning APIs where you upload a dataset and the system returns predictions. Google Prediction API is an example of this.
  • Algorithm APIs: these are generic APIs that provide the details of the algorithms such as their configuration parameters, and choice between algorithms. I think BigML is an example of this API (CART), but Louis suggests BigML is a Generic API.

Louis motivates the need for hosted prediction APIs by suggesting that if you don’t have the time to figure out how machine learning algorithms work, you won’t have the time to figure out how to scale them. I like this point, it highlights the need for the developer or startup to focus on their core offering and to move quickly.

Problem Breakdown

Louis provides a number of example machine learning problems in Chapter 4 in the areas of business and applications. These provide a concrete motivation for the types of problems for which machine learning is suited and how to think about those problems. Louis provides a framework to think about machine learning problems in a structured way that I really like. In summary, that framework is:

  • Who: who does the example concern?
  • Description: what is the context, and what are we trying to
  • do?
  • Question asked: how would you write the questions that the predictive model should give answers to in plain English?
  • Type of ML problem: classification or regression?
  • Input: what are we doing predictions on?
  • Features: which aspects of the inputs are we considering, and what kind of information do we have in their representation?
  • Output: what does the predictive model return?
  • Data collection: how are example input-output pairs obtained to train the predictive model?
  • How predictions are used: when are predictions being made, and what do we do once we have them?

Apply Machine Learning

Chapter 5 focuses on the concerns of applying machine learning to your own domain. Louis guides you through topics such as data collection, feature engineering, preparing data, sanity checks, privacy and performance.

A strong point I liked i this chapter was the thought experiment Louis suggests when approaching a machine learning problem of imaging the system achieving perfect predictions. He uses this to suggest that you think about defining success criteria, performance measures and most importantly: whether solving the problem can yield a return on the investment. He makes this point with a concrete example of customer churn.

Priority Inbox Case Study

Louis finishes out the book with Chapter 6 that summarizes the state of prediction APIs and, touching on text/Natural Language Processing, Computer Vision and examples of using BigML and the Google Prediction API. I didn’t realize that there were so many companies and such a variety of prediction APIs available at the moment. For example, Louis links to the post List of 40+ Machine Learning APIs.

Chapter 7 provides a worked case study on developing a priority inbox leveraging both Google infrastructure and the BigML platform. The thing I liked the most about this example was that it was clear and concise. I don’t like examples with an over abundance of code and this was the right mix for me.

Louis rounds out the book in Chapter 8 with a call to arms in adopting prediction APIs and Machine Learning as a Service (MLaaS) as a way of addressing the the current (and expected to worsen) talent shortage in data science and machine learning. The resources at the back suggest books and courses for learning more about a specific area covered throughout the text.

Summary

I have been thinking deeply about commoditized machine learning. I think it is a market and a adoption that will only grow. I think the benefits will be in the ways in which to best integrate and offer it into the businesses.

The book is clearly presented with the content focused and well suited for the audience. It is not maths heavy, nor is it bogged down with pages and pages of code examples. I really like the crisp presentation of two APIs focused in the book – Google Prediction API and BigML and the world example is just the right level of detail.

You could figure out how to use the APIs on your own, but the benefits in reading Louis’ book is that he motivates the problem solving and machine learning around the available APIs. I recommend this book to a developer or startup looking to start using machine learning quickly and effectively in the web application.

No comments yet.

Leave a Reply