Last Updated on
In accepting the award, he gave a layman’s presentation of his work on statistical and causal machine learning methods titled “Statistical and causal approaches to machine learning“.
It’s an excellent one hour talk and I highly recommend that you watch it.
On the statistical side, Scholkopf talks about empirical inference and generalisation.
An interesting and motivating point he makes early is on hard inference problems, motivating his work on kernel machines.
Specifically, he references the problem of classifying DNA sequences from locations as mentioned in Sonnenburg, et al. 2008 titled “Large Scale Multiple Kernel Learning“. In the paper, the authors show that algorithm performance increases as a function of the amount of data available.
He calls this a paradigm changing fact and characterizes these hard inference problems as having:
- High dimensionality
- Complex regularities
- Little prior knowledge
- Requiring “big data” sets
He finishes this part of the talk on statistical learning, describing the three key aspects of contribution of kernels methods.
- Formalizes the notion of similarity
- Induces a linear representation of the data in a vector space, no mater where the original data comes from
- Encodes the function class used for learning, solutions of kernel algorithms can be expressed as kernel expansions
The second part of the talk talks about Scholkopf’s work on causal modeling.
He describes causality, graphical models of causality and how one may infer a causal model from data.
Specifically, he touched on two new approaches to addressing the problems in inferring a causal model:
- Separating out the cause from the mechanism (independence of noise and functions)
- Restricting the functional model
The most interesting part of this discussion for me was when he touched on his work on viewing semi-supservied learning through the lens of a causal model. This was drawn from his work in “On causal and anticausal learning“, 2012.
He describes two examples:
- Example 1: Predicting proteins from mRNA sequences. Here X (mRNA) causes Y (protein) and it is a causal problem.
- Example 2: Predicting class membership from a handwritten digit. Here X (class membership) causes Y (handwritten digit) and it is an anti-causal problem.
The key finding is that modeling P(X) with extra data does not help in the first problem. We assume that P(X) is independent of P(Y|X). But in the second case modeling P(Y) is helpful because P(Y) is dependent on P(X|Y).
Problems like those in example 2 (predicting the cause X from the effect Y) will benefit from semi-supervised learning techniques. I’m surprised that this finding is talked about more often, perhaps it’s obvious to those deeper in the field.
It’s a great video and I’m sure it will get you motivated with regard to two important areas of machine learning.
Again, you can watch the video here: “Statistical and causal approaches to machine learning“.