Last Updated on
Reading bootstrapping machine learning, Louis mentioned a paper that I had to go off and read. The title of the paper is Machine Learning that Matters (PDF) by Kiri Wagstaff from JPL and was published in 2012.
Kiri’s thesis is that the machine learning research community has lost its way. She suggests that much of machine learning is done for machine learning’s sake. She points to three key problems:
- Overfocus on Benchmark Data: Focus on datasets in UCI repository, but very few make an impact in the domain that is being addressed. She points to the lack of standards for experiment reproducability which void the use of standard datasets and the skew towards regression and classification problems. She comments that using the UCI repository is worse than using synthetic daa because we don’t even have control over how the data was created.
- Overfocus on Abstract Metrics: A strong focus on algorithm racing or bake-offs and the use of generic metrics like RMSE and F-measure that do not have a direct meaning in the domain.
- Lack of Follow-Through: It is really easy to download datasets and run algorithms in Weka. It is very hard to interpret the results and relate them to the domain, but that is what is require to make an impact.
The crux of the problem is that she describes machine learning as three classes of activities and the “machine learning contribution” focuses on algorithm selection and experiments ignoring problem definition and result interpretation.
Change in Mindset
Kiri suggests the research community needs to change the way it formulates, attacks and evaluates machine learning research projects. She comments on three areas to address:
- Meaningful evaluation methods: Measure the direct impact of the machine learning system in the domain. For example, dollars saved, lives preserved, time conserved or effort reduced. Selecting a direct impact measure will have a flow-on effect on the design of the experiment and the selection of the data.
- Involvement of the outside world: Involve domain experts to define the problem and data, and more importantly use them to interpret the significance of the results in the domain. This is to stop the solving of problems of little significance (iris plant classification) and develop systems that are reliable and useful enough to be adopted in practice.
- Eyes on the prize: Select research problems for their impact. Consider the status quo in the problem domain and describe the results as a level of improvement above that status quo. Engage the community and motivate adoption.
Kiri throws down the gauntlet and suggests 6 problems as examples of research projects where machine learning could make a difference:
- A law passed or legal decision made that relies on the result of an ML analysis.
- $100M saved through improved decision making provided by an ML system.
- A conflict between nations averted through high-quality translation provided by an ML system.
- A 50% reduction in cybersecurity break-ins through ML defenses.
- A human life saved through a diagnosis or intervention recommended by an ML system.
- Improvement of 10% in one country’s Human Development Index (HDI) attributable to an ML system.
She purposely left the problems open to avoid suggesting a singular problem or technical capability. Real challenges are difficult. The are examples intended to inspire rather than an exhaustive and prioritized list of problems to work on.
Finally, Kiri finishes up with a comment on the obstacles that may stand in the way of effectively addressing research problems that matter.
- Jargon: The overuse of machine learning nomenclature which is a useful shorthand in the field but basically impenetrable out side of the field. More general language is needed when targeting a broader audience.
- Risk: When a machine learning system is making decisions of consequence, who is culpable when it makes mistakes? Who maintains the system going forward? (I can’t help but feel that civil engineering and the safety critical manufacturing industries have worked through similar issues)
- Complexity: Machine learning methods are still not fire-and-forget, and PhD is still required to understand and use the methods. We need better tools. (I think commoditized machine learning is moving very fast).
I think it is a good paper that could motivate young researches away from racing algorithm toward more impactful work. It reminds me of O’reilly’s call to arms “work on stuff that matters“. I would have liked some more concrete examples though, perhaps less idealistic and more business focused like IBMs Watson, Siri and large scale image classification.
I also can’t help but feel that there are classes of problems where beginners can make progress and get direct personal benefits. Like classification their own photo’s, organizing their documents or trading on the stock market.