In this post I want to show you that programmers can get into machine learning. I will show you that learning machine learning can be just like learning any other piece of high technology. We’ll compare learning machine learning to learning to program in the first place, which may have been an even larger challenge.
A Designer Wants to Code
Pretend you are a designer, say a young web designer. You make web designs in Photoshop or something and maybe snip up designs and turn them into CSS. You hang around programmers and maybe you have a little coding envy. You think you might want to learn how to code. Rightly or wrongly, you think that CSS and HTML are “practically coding”, that it’s all creative expression, that programming is just another medium or outlet for your creativity.
You jump on Quora or StackOverflow and ask a question like “I’m a designer, how can I learn how to code?”
You get answers back that are all over the map. You get seemingly seasoned and expert programmers gifting you free advice like “learn ANSI C and pointers”, “learn binary”, “buy a book on ASM”, “start with LISP”. Maybe an eloquent communicator drops by and writes a long and convincing reply that you really should buy and read Knuth’s The Art of Computer Programming, Vols. 1-3 (Vol 1-3) (Affiliate Link). You take his advice, buy the books on Amazon and get as far as the introduction before regretting the purchase, hating yourself for not being smart enough and giving up on your interest of learning to code, only to repeat the same cycle in 3 months.
What happened? The advice seems like good advice.
The problem is the advice is retrospective. The advice is from programmers thinking about their craft and how people who are already programmers can be better programmers. The advice does not consider an absolute beginner, an interested amateur looking to dip their toe in and see if they want to go for a swim.
Now I agree, the world is a little different now. This problem has been identified and there are great services out there tackling exactly this problem, i.e. teaching people how to code.
Sure, maybe we learn pointers, binary, ASM and LISP and even read sections of Knuth (no one really reads it cover to cover right?), but that comes later. But how did you start out? I started out hacking together things, trying things out, experimenting, creating and learning. I dove deep into the technical detail later because I wanted to create bigger, better and more powerful programs. I didn’t started in the technical detail. I think this experience might be similar for a lot of programmers out there, was it like this for you? Leave a comment.
A Programmer Wants to do Machine Learning
Now, if you’re reading this, you’re probably a programmer or some kind if developer. Think about your interest in machine learning. Have you had a look at some of the responses from seasoned and expert machine learners gifting you free advice on how to get started?
I’ve been searching for and reading this advice and some is good, much of it is not. I’ve collected some samples below:
- You absolutely will need to be comfortable with basic linear algebra (manipulating vectors and matrices) and working with logarithmic and exponential functions.
- You’ll need to know Linear Algebra through Eigenvectors if you want things to be “easy”.
- You do want to have some familiarity with probability, linear algebra, linear programming, and multivariable calculus.
- First, you need to have a decent CS/Math background. ML is an advanced topic so most textbooks assume that you have that background.
- Statistics, Probability, distributed computing, and Statistics.
There is some really great advice in there, but is this advice appropriate for the absolute beginner? Is it appropriate for the programmer dipping their toe in the water to test the temperature?
Maybe people are asking the wrong questions. Also, I’ve cherry-picked snippets of answers that suggest one needs maths before they can get started in machine learning. The point I want to make is that a beginner is going to focus on what they don’t have and what they can’t do. They may give up before they have even tried.
I absolutely agree that having a strong grasp of linear algebra and probability will provide an excellent base for getting into machine learning. I absolutely agree that The Elements of Statistical Learning (Affiliate Link) is a tremendous book on multiple levels. I just don’t think that the first step a programmer interested in machine learning should do is take a math course or read a dense theoretical treatment of the field. I actually strongly recommend against it.
Two Machine Learning Fields
There are two sides to machine learning:
- Practical Machine Learning: This is about queries databases, cleaning data, writing scripts to transform data and gluing algorithm and libraries together and writing custom code to squeeze reliable answers from data to satisfy difficult and ill defined questions. It’s the mess of reality.
- Theoretical Machine Learning: This is about math and abstraction and idealized scenarios and limits and beauty and informing what is possible. It is a whole lot neater and cleaner and removed from the mess of reality.
The practical side cannot have frameworks and rigour without the theoretical side. The theoretical side does not have meaning or motivation without the practical side. This dichotomy is false, it’s really a sea of tools and equations, but stay with me.
As a programmer, you may have a penchant for the practical side, but you will reach the limits as a “technician” and will require an understanding of the theoretical side to be able to effectively improvise. You must read mathematical treatments of algorithms, you must work through dense text books. That is what it takes to do well in the field. And the problem is, that is the advice freely given from practitioners to beginners, which is idealized, suited to the intermediate and is inappropriate to the beginners.
Programmers Like Power Tools
I think it’s useful for an experienced programmer to think of machine learning like an advanced programming topic, like threading (stay with me now).
If you want to get into threading you just write some multithreaded programs and get an idea of the power it can unleash. You make all kinds of blunders, but some of the things you prototype work and you get glimpses of what is possible. If you decide this is for you, you can read books and go deeper.
You can use libraries of existing multithreaded constructs, you can write your own, you can go deep and learn about some of the more mathematical subjects behind the various threading constructs. You let your interest drive you learning and eventually you can credibly build and deploy production multithreaded code. It’s a process, not a step change.
Now obviously, machine learning is a larger and more diverse area, but the general stepped strategy is the one I advocate and will elaborate in a future post.
You don’t go from a beginner to running a team and put machine learning systems into production. The danger zone is real. You can and will learn just enough to be dangerous. But your self discipline learned from wielding the power to program machines (also your process of code reviews, peers and mentors, and common sense) limit those very real dangers.
Just like learning to program, learning machine learning is a journey where the learning does not end, mastery really means continuous education. Learning to read equations, turn them into code, and write your own to frame your problems may be rest stops along the way, if you’re interested.
I’ve listed some resources if you want to keen thinking on this issue. It’s a little deep and I’m sure we can generate some good discussions.
- Scroll up and read through some of the answers to the StackOverflow questions listed above. There are answers there that discourage programmers unless they know maths, but there are also other really encouraging answers that will really lift you up.
- Why becoming a data scientist might be easier than you think A Gigaom post that points to case studies for the general opportunity for the analytically inclined (like programmers) to start from scratch in Data Science and quickly become internationally competitive.
- Is mathematics necessary for programming? Interestingly, I think the arguments for and against are very relevant and a useful perspective.
This is a pretty charged post and I’m really interested in what you think. Discussing this issue with friends, I do hear a lot about the danger zone and the options for progression for “the technician” machine learning apprentice. I will follow up on these two topics in future posts.
What do you think? Is there an substance to my proposed similarity between learning to program and for a programmer to learn machine learning?