You must become intimate with your data. Any machine learning models that you build are only as good as the data that you provide them. The first step in understanding your data is to actually look at some raw values and calculate some basic statistics. In this post, you will discover how you can quickly get […]
Search results for "Machine Learning"
Super Fast Crash Course in R (for developers)
As a developer you can pick-up R super fast. If you are already a developer, you don’t need to know much about a new language to be able to reading and understanding code snippets and writing your own small scripts and programs. In this post you will discover the basic syntax, data structures and control […]
Gentle Introduction to Predictive Modeling
When you’re an absolute beginner it can be very confusing. Frustratingly so. Even ideas that seem so simple in retrospect are alien when you first encounter them. There’s a whole new language to learn. I recently received this question: So using the iris exercise as an example if I were to pluck a flower from my […]
Data Science From Scratch: Book Review
Programmers learn by implementing techniques from scratch. It is a type of learning that is perhaps slower than other types of learning, but fuller in that all of the micro decisions involved become intimate. The implementation is owned from head to tail. In this post we take a close look at Joel Grus popular book […]
How To Work Through A Problem Like A Data Scientist
In a 2010 post Hilary Mason and Chris Wiggins described the OSEMN process as a taxonomy of tasks that a data scientist should feel comfortable working on. The title of the post was “A Taxonomy of Data Science” on the now defunct dataists blog. This process has also been used as the structure of a […]
Use Random Forest: Testing 179 Classifiers on 121 Datasets
If you don’t know what algorithm to use on your problem, try a few. Alternatively, you could just try Random Forest and maybe a Gaussian SVM. In a recent study these two algorithms were demonstrated to be the most effective when raced against nearly 200 other algorithms averaged over more than 100 data sets. In […]
Better Naive Bayes: 12 Tips To Get The Most From The Naive Bayes Algorithm
Naive Bayes is a simple and powerful technique that you should be testing and using on your classification problems. It is simple to understand, gives good results and is fast to build a model and make predictions. For these reasons alone you should take a closer look at the algorithm. In a recent blog post, you […]
Evaluate Yourself As a Data Scientist
What skills do you need to be a data scientist? I read an interesting data-driven approach to answering this question in the book Doing Data Science: Straight Talk from the Frontline. In this post I summarize this self-assessment approach that you can use to evaluate your strengths as a data scientist and where you might […]
Understand Your Problem and Get Better Results Using Exploratory Data Analysis
You often jump from problem-to-problem in applied machine learning and you need to get up to speed on a new dataset, fast. A classical and under-utilised approach that you can use to quickly build a relationship with a new data problem is Exploratory Data Analysis. In this post you will discover Exploratory Data Analysis (EDA), […]
Data Management Matters And Why You Need To Take It Seriously
We live in a world drowning in data. Internet tracking, stock market movement, genome sequencing technologies and their ilk all produce enormous amounts of data. Most of this data is someone else’s responsibility, generated by someone else, stored in someone else’s database, which is maintained and made available by… you guessed it… someone else. But. […]