Last Updated on August 19, 2019
Getting started in deep learning is a struggle.
It’s a struggle because deep learning is taught by academics, for academics.
If you’re a developer (or practitioner), you’re different.
You want results.
The way practitioners learn new technologies is by developing prototypes that deliver value quickly.
This is a top-down approach to learning, but it is not the way that deep learning is taught.
There is another way. A way that works for top-down practitioners like you.
In this post, you will discover this other way.
(I teach this approach and have helped more than 1,145 developers
get their start in deep learning with python, click to learn more)
You will believe that being successful with applied deep learning is possible. I hope it will inspire you to take your first step towards this goal.
Kick-start your project with my new book Deep Learning With Python, including step-by-step tutorials and the Python source code files for all examples.
Let’s dive in.
You Want To Get Started In Deep Learning…
But You’re Different
You don’t have a Masters or Ph.D. in advanced math.
You’re not a machine learning expert.
You’re a professional or a student with a keen interest and eager to start using deep learning.
Maybe You’re a Developer
- You want to know how to apply deep learning to solve complex problems.
- You want deep learning skills to improve your job prospects.
- You want to use deep learning as leverage to get into a data scientist (or similar) position.
Maybe You’re a Data Scientist
- You want to use deep learning on a future project.
- You have a sticky problem that you think deep learning can help with.
- You want deep learning skills to stay relevant and on top of your field.
Maybe You’re a Student
- You want deep learning skills to improve your prospects for getting a job.
- You have an interesting problem for which you think deep learning will be a good fit.
- You want to discover why deep learning is so popular.
Does one of these reasons fit you?
Let me know in the comments, I’d love to hear your reason?
Do you have a different reason for getting into deep learning?
Let me know in the comments and I will give you personal advice.
The reasons for getting into the field of deep learning are varied.
Regardless, you’re treated the same as everyone else. Like an academic.
Deep Learning Is For Academics… The Lie
Deep Learning is an academic field of study.
It has been this way for a long time. The field used to involve the study of small artificial neural networks. Now the focus is on much larger networks and more exotic network architectures. The breakthroughs in the field are still coming from academia. The field is young and this is to be expected.
This means that most of the information on deep learning is written by academics. And it is written for other academics, like Researchers, Masters and Ph.D. students.
It is not written for developers, like us.
This is why you see bad advice like:
You need a PhD to get into deep learning.
Or comments like:
You need 3 years of advanced math before you can get into deep learning.
This is why getting started in deep learning is such a struggle. It is a challenge that developers think they can only solve by going back to school, going into debt and investing 3-to-7 years of their life.
You can work through deep learning tutorials in minutes. You can begin building a portfolio that you can use to show your growing skills in the field. And you can start today.
Programming Is Only For Computer Scientists (NOT)
Programming used to be hard and theoretical.
You needed to know a lot of math to understand programming before there were computers.
Things like computability and completeness theorems.
In the early days of programming, you had to define your own data structures and basic algorithms. This was before all the low-hanging fruit of algorithms and data structures were defined. This required a good understand of discrete math.
Things like complexity theory.
These theoretical topics can help you be a better programmer and engineer today. They are still taught in computer science courses.
But you and I both know that you don’t need them to get started in programming. You don’t even need these topics if you’re working in most programming jobs.
You call the sort routine,
you don’t derive a new sort operation from first principles.
Can we stretch this analogy to deep learning?
Do you need to derive the back propagation equation from first principles and implement it from scratch? Instead, we can just call model.fit() on the deep learning API.
Wait… What About The Top Engineers?
Yes, the top engineers can derive a new algorithm.
In fact, they are often hired to do just this. It’s their job. They can do the easy stuff and the harder stuff. They can call the sort routine and derive a new sort method for business data that is too big to fit into memory.
My point is that these capabilities do not have to come first, they can come later.
Top-down. Rather than bottom-up.
This is key.
Like practical programming in the real world.
The Top-Down Programmer (…that gets results)
Programming is fun in the beginning.
You learn this function. You learn that API. You stitch together your own programs and discover you can solve problems with your own ideas.
You’re productive early on and only get more productive with time. You can progress deeper into the theory to solve more challenging problems, or not. It’s up to you.
Being productive early is important for two reasons:
- It keeps you motivated, which keeps you engaged.
- It allows you to deliver value early, which feeds motivation.
It’s too easy to stop.
It’s too easy to give up.
It is a super power. Knowing that you can write a program to solve a specific problem. Then having the confidence to actually implement and deploy.
Code and design will be crappy, to begin with. Hard to maintain. Not viable for long-term operational use. But code gets better with experience, with mentors, with continual learning.
This is how the majority of IT operates. Top-down. Not bottom-up.
You don’t take a University course in computer language theory to learn Ruby on Rails for your next web development project. You work through some tutorials, make some mistakes and get familiar with the platform.
Repeat for the next framework, the next library. Again and again.
Repeat for Deep Learning.
Deep Learning Is NOT Just For The Academics
You can learn deep learning from the bottom-up.
It may take many years and a few higher degrees, but you will know a lot about the theory of deep learning techniques.
Even after all this effort, you may or may not know how to apply them in practice to real data. They generally don’t teach practical or vocational skills at University.
The academic textbooks, video courses, and journal papers are a fantastic resource to leverage. They are a gold mine of ideas. They are just not the place to begin when starting out with deep learning.
Focus on Delivering Value With Deep Learning
The value in deep learning to business and other fields of study is in reliable predictions.
Learn how to model problems with deep learning.
Develop (or steal) a systematic process for working through a predictive modeling problem. Then apply it again, and again and again, until you get really good at delivering this value.
Get good at applying deep learning.
We all like the things we’re good at.
If you can do this well and reliably, you will, in turn, have a valuable skill that the market wants in great supply.
You will find yourself diving into the academic papers parsing the Greek letters and emailing or calling the authors. All to extract gold nuggets that you can use to get better model performance on your next project.
Time For You To Get Success With Deep Learning
Now, hopefully, you believe that you can get started and get good at applied deep learning.
It is time to take action. It is time to get started with deep learning.
1. Pick a Framework
I recommend the Keras platform. It supports Python. Meaning you can leverage scikit-learn and the whole SciPy ecosystem for your deep learning projects.
This is important.
You essentially get data preparation, model evaluation and hyperparameter optimization for free.
Keras also provides a practitioner-friendly API (i.e. simple and intuitive). It wraps the power (and unnecessary complexity) of the Theano and TensorFlow libraries. It gives you the speed and efficiency of bleeding edge frameworks, without the tens or hundreds of lines of code needed to make something work.
I have a ton of tutorials on Keras, as well as a free 14-day mini-course, see:
2. Pick a Process
Keep it simple, but pick a strong skeleton that you can add to and tailor to your preferred techniques and problem types.
A good set of general steps I like to use on predictive modeling projects are:
- Define Problem: What problem you are trying to solve and the data and framing you need to solve it.
- Prepare Data: What transforms to apply to the data to create views that best expose the structure of the prediction problem to the models.
- Evaluate Algorithms: What techniques to use to model the problem, and the metrics to filter good from bad solutions.
- Improve Results: What parameter tuning and even ensemble methods to use to get the most from what is working.
- Present Results: What results you achieved, lessons learned and the saved model or set of predictions of which you can make direct use.
For more on my process for working through predictive modeling problems, see:
3. Pick a Problem
You need practice. Lots of practice.
If you are interested in predictive modeling with image data, find all the standard machine learning problems with image data and work your way through them.
Text data? Video data? Use the same approach.
Learn how to get a result using your process.
Then learn how to get a good result.
Then a world class result.
The good thing about standard machine learning datasets is that you have a benchmark score to compare your results to.
Not sure about your preferences yet? Start with Multilayer Perceptrons on standard datasets from the UCI Machine Learning Repository (here’s a tutorial). Then try Convolutional Neural Networks on standard object recognition problems (here’s a tutorial). Move onto Recurrent Neural Networks on simple timeseries problems (here’s a tutorial).
Later, graduate to more complex problems used in machine learning competitions like those on Kaggle. Graduate further to defining your own problems and gathering data from creative commons.
Your goal is to develop a portfolio of completed projects.
This portfolio will be a resource you can leverage as you take on large and more challenging projects. It can also be a resource that you can use to show your growing skills with deep learning and ability to deliver value.
For more on developing a machine learning portfolio, see the post:
You discovered that deep learning as you know it is “deep learning for academics“. Not “deep learning for developers“.
You now know that there is a whole world of libraries and tutorials designed for you and developers like you.
You discovered a simple 3 step process that you can use to get success in deep learning as a developer, summarized as:
- Pick a Framework (like Keras).
- Pick a Process (like the one listed above).
- Pick a Problem (then develop a portfolio).
Has this changed your mind about deep learning?
Leave a comment and let me know.
Some more posts on deep learning that you might like to read include:
- Develop Your First Neural Network in Python With Keras Step-By-Step
- 8 Inspirational Applications of Deep Learning
- Crash Course On Multilayer Perceptron Neural Networks
I’m glad you found it useful Siffi.
Thanks, this is encouraging and makes sense of my impressions of the field so far. It makes sense that a new field would come from academia and gradually become more accessible as it is deployed in industry.
To back this up, machine learning is already becoming accessible via APIs and GUIs and even cloud services.
Thanks Zach, nice observation about machine learning APIs.
I loved the advice. I have done my master’s with focus in Computer Vision and Machine Learning. I am a guy who relies in extracting features and then solving the problem but I realised that the state of Art is the CNN and are way better than the old ones. I am planning to learn it so that I can use it for my future projects and also to build a good profile.
Thanks Susmit. Good luck with your computer vision projects.
I have a PhD in computer science. Your top-down vision is really helpful for me. I hope it will be for others. Thank you for sharing your thoughts and I’m looking for more complete resources (such as a complete book: deep learning-a practical approach or something similar).
I’d love that too!
I’m glad you found it useful Amir.
I do have a top-down based book on deep learning. It teaches you deep learning (MLPs, CNNs, RNNs) using tutorials and projects.
You can learn more about it here:
Hi Jason, thanks for your very interesting series of blog posts. I see python tools being applied often and sucessfully. However for several reasons, I would like to stick to R. Do you have any useful links for applied deep learning (in the sense that you presented above) for R users? Thank you!
I love R too, it’s great. Sorry, I am not on top of deep learning with R. It’s an area I need to invest some time in and report back on the blog. Soon I can do this, I hope.
“Keep it simple, but pick a strong skeleton that you can add to and tailor to your preferred techniques and problem types.” – Jason … Love this stuff and can’t wait to get started.
I’m so glad to hear that Johnny. Let me know how you go. Shoot me an email with any questions you have.
I appreciate your humility and honesty in your views to make machine learning more accessible to all developers as a tool. While the enthusiasm is great I would just warn that it is still very important to understand for the practitioner on how deep learning works. “If we don’t understand the technology we have no responsibility for it”
You make a very important point. We must circle back and understand the “why” of the methods. I’m just advocating to not start with the “why” but to instead to start with the “how”.
I just joined this class I am eager to learn alot
A major hurdle that I encountered while developing Deep Learning systems is the lack of hardware infrastructure. I had developed a Visual Question Answering model, and even the 4GB GPU based AWS servers ran out of memory in the first epoch. The CPU servers took 1.5 hours for a single epoch containing 1000 images.
Thus, it’s somewhat difficult for indy developers to play around much with the technology.
This too might be an opportunity. I think it might force you to come up with alternate strategies like transfer learning, parallel training across a cluster, lots of smaller models trained on fewer observations per epoch in an ensemble, etc.
Thank you, get it clear now 🙂
I’m glad the post was useful Matthew.
Great advice, Jason. Thanks to your tutorials, I’m getting rid of my fear of deep learning.
Can you talk about situations in which deep learning isn’t necessary? I see everyone getting (overly) excited by deep learning and I’m wondering whether classical machine learning algorithms would be made obsolete or not? Do “classical” algorithms like SVM, Logistic regression, k-means etc still have their place in the machine learning community?
I’m glad you liked it Madhav.
I would probably not use deep learning for tabular data (like you see in excel). Deep learning is really good at problems with raw data, like text, image, audio, etc.
For tabular data, where feature engineering is manual (rather than learned in deep learning), I would use XGBoost and let all other methods prove their worth against the results from XGBoost.
Classical methods do still have a place. Ideally, we want to use a model that is as simple as possible and gives predictions that are good enough for our requirements. If you can solve the problem with linear regression, then don’t even try deep learning.
Does that make sense? This is an important topic, please ask more questions if you have them
Thanks Jason. Ya, that does make sense. I’m experimenting on binary classification with LSTM vs classical methods. I’ll let you know my findings once I’m done with it.
From what I understand, deep learning works when there’s a lot of data and computing power available. How much data is “good enough” ? Do you know if there’s any empirical relationship between the size of data, required computing power and the desired accuracy ? I guess the data size is really problem dependent, but I’m looking for answers that go beyond the phrase, ” the more, the better ” or something similar.
Hi Madhav, the train-compute time will be linear with the volume of data.
The skill-to-data relationship is problem dependent.
Indeed, a key benefit of deep learning is that skill can keep rising with data, unlike other methods that seem to plateau.
Sorry, I cannot be more specific, there may be problem specific studies out there that give you some concrete numbers, but I would not expect the behavior to transfer to new domains.
wow, just another post which claims that you can take a monkey and convert it to ML expert. yes you need math if you want to do something more than copypasting tutorials from keras github. There is huge amount of monkey coders right now only for one reason – in a lot of cases business need monkeys to do stupid and boring job of moving a button 3px to the left and making a button slightly greener.
There is very little opening jobs that requires deep-learning and for 150k+ they would rather hire phd with strong math background than a monkey who all knowledge is to copy some script that somehow works.
Thanks for sharing your opinion.
There is indeed a lot of low hanging fruit in business that can be picked by practitioners without a decade of higher education or a strong math background.
There is room and opportunity enough for a range of practitioners.
All Jason is saying is that–much like most notable technology fields (embedded devices, software programming, etc)–there are alternate routes than becoming a researcher in the field.
To be as allegorical as possible, I started my career as a self-taught programmer while flying around in a C-130 in the military. I was able to build (and sell) a technology company that way. I later went to school for a formal CS degree (Penn State ’13) and can honestly tell you that, in terms of what you learn in a 4-year program, I learned very little in addition to what I had already picked up myself. Perhaps there was some helpful *aha* theory here and there, and a rehashed disdain of multivariable calculus textbooks, but nothing that was truly groundbreaking.
The reason I was able to teach myself is because my self-taught roadmap had been full of drive, passion, and the willingness to challenge myself. There are many people like that in the world. Jason is providing an invaluable resource for people who want to become practitioners of the craft. I’ve now been a senior software engineer for the last 6-7 years, and find myself continually coming back to his works to help the development of my machine learning skill set.
Very well said Collin. Thank you so much.
It is rare I get the sense that anyone understands what I’m trying to do here. Thank you!!!
Hi Jason! First of all, thank you for your writing! It is extremely good. I bought some of your books and those are great too!
I am one of the people you are addressing in this post: non-academic, self-studying Machine Learning and Deep Learning through online sources (Udacity.com in my case).
I did an image classifier using Pytorch. What do you think of it? I studies Keras as well, but the project was based on pytorch.
Thanks, I’m glad the post helped.
Sorry, I don’t have tutorials on pytorch. I cannot give you good advice.
Hi Jason! First of all, thank you for your writing! It is extremely good.
Kindly make some tutorials on Video Summarization using Keras
Great suggestion, thanks!