Last Updated on

In November 2014, Bernhard Scholkopf was awarded the Milner Award by the Royal Society for his contributions to machine learning.

In accepting the award, he gave a layman’s presentation of his work on statistical and causal machine learning methods titled “Statistical and causal approaches to machine learning“.

It’s an excellent one hour talk and I highly recommend that you watch it.

## Statistical Learning

On the statistical side, Scholkopf talks about empirical inference and generalisation.

An interesting and motivating point he makes early is on hard inference problems, motivating his work on kernel machines.

Specifically, he references the problem of classifying DNA sequences from locations as mentioned in Sonnenburg, et al. 2008 titled “Large Scale Multiple Kernel Learning“. In the paper, the authors show that algorithm performance increases as a function of the amount of data available.

He calls this a paradigm changing fact and characterizes these hard inference problems as having:

- High dimensionality
- Complex regularities
- Little prior knowledge
- Requiring “big data” sets

He finishes this part of the talk on statistical learning, describing the three key aspects of contribution of kernels methods.

- Formalizes the notion of similarity
- Induces a linear representation of the data in a vector space, no mater where the original data comes from
- Encodes the function class used for learning, solutions of kernel algorithms can be expressed as kernel expansions

## Causal Learning

The second part of the talk talks about Scholkopf’s work on causal modeling.

He describes causality, graphical models of causality and how one may infer a causal model from data.

Specifically, he touched on two new approaches to addressing the problems in inferring a causal model:

- Separating out the cause from the mechanism (independence of noise and functions)
- Restricting the functional model

The most interesting part of this discussion for me was when he touched on his work on viewing semi-supservied learning through the lens of a causal model. This was drawn from his work in “On causal and anticausal learning“, 2012.

He describes two examples:

**Example 1**: Predicting proteins from mRNA sequences. Here X (mRNA) causes Y (protein) and it is a causal problem.**Example 2**: Predicting class membership from a handwritten digit. Here X (class membership) causes Y (handwritten digit) and it is an anti-causal problem.

The key finding is that modeling P(X) with extra data does not help in the first problem. We assume that P(X) is independent of P(Y|X). But in the second case modeling P(Y) is helpful because P(Y) is dependent on P(X|Y).

Problems like those in example 2 (predicting the cause X from the effect Y) will benefit from semi-supervised learning techniques. I’m surprised that this finding is talked about more often, perhaps it’s obvious to those deeper in the field.

## Summary

It’s a great video and I’m sure it will get you motivated with regard to two important areas of machine learning.

Again, you can watch the video here: “Statistical and causal approaches to machine learning“.

Thanks for pointing this out. Indeed is extremely important in that it defines a limitation on the capabilities of machine learning technology that is not entirely obvious.

This in an intuitive sense is that you cannot build a machine that is capable of predicting the output behavior of another machine based on its starting inputs. All that machine learning is capable of is determine given the outputs, which input states does it “probably” emanate from.

Very illuminating indeed for me. It wasn’t obvious to me until I read this.

Thanks!

Right onh-tis helped me sort things right out.