Machine Learning has been commoditized into a service.
This is a recent trend that looks like it will develop into the mainstream like commoditized storage and virtualization. It is the natural next step.
In this review you will learn about BigML that provides commoditized machine learning as a service for business analysts and application integration.
BigML was co-founded by a group of five guys in 2011. Francisco Martin seems to be active in the community commenting and pushing out content. At the time of writing it is a privately held company and received $1.3M in a funding round in mid 2013. I can also see one patent credited to the company titled “Method and apparatus for visualizing and interacting with decision trees“, a good sign that they are pushing the limits of this tech.
I’ve looked at the service a few times since they’ve launched and each time I have noticed changes to the interface and service. It is under active development and responding to feedback from users.
The service focuses on decision trees which is a smart move. They are an effective machine learning method and their decisions can transparently be understood by domain experts.
Tag lines for the service include phrasing such as “highly scalable cloud-based machine learning service” focusing on the enterprises and its appetite for big data and cloud computing needs, and “predictive analytics made easy, beautiful and understandable” focusing on the end-user analyst making reports and understanding business processes.
About the Service
The service can be used in production mode or development mode. Development mode is free but limited in the size of tasks that can be completed. Production mode is a paid mode and credits can be purchased ad hoc in blocks or on a subscription basis. This is a familiar pattern from other cloud based services like storage or compute servers.
BigML provides three main modes to use the service:
- Web Interface: A slick web user interface that is fast and responsive. The web interface guides the analyst through the process of uploading data and making a descriptive or predictive model and evaluating the model or making predictions as needed. It’s clean and once you buy into the pipeline approach, makes a lot of sense.
- Command Line Interface: A command line tool called bigmler built upon the mature Python API for the service that allows more flexibility than the web interface such as the choice of making predictions against a constructed model locally or remotely, and performing tasks such as cross-validation to approximate model accuracy. Checkout the full bigmler docs.
- API: A RESTful API is provided that can be used directly via curl commands or via a wrapper in your favorite programming language. At the time of writing, the Python API is the most mature, but wrappers are also provided in Ruby, PHP, Java, C#, NodeJS, Clojure, Bash (curl calls), and Objective C.
The web interface is presented as a pipeline of steps. You can choose where you want to drop in and out of the pipeline depending on what you are looking for.
- Data Sources: These are the raw data for the problem under study. It may be a raw CSV file you upload, a remote data file you specify by URL or a data store to point to in Amazon S3. You can describe attributes, give them names and generally manage the way the data source is parsed and presented.
- Data Sets: These are views on data source that you can use as the basis for building models. Datasets specify the target attribute (class in classification or output in regression). Data is summarized with bar graphs and five-number summaries. You can also split a dataset into a training and test for a controlled evaluation of a models performance later.
- Models: These are decision trees created from a dataset. A decision tree model is interactive. You can see the confidence and support in the training data reflected in the model at each node. You can work your way through a tree and see the rule antecedents build up, which is a clever and clean presentation of the model. Models can be downloaded in your favorite language, their rules can be reviewed and alternative visualizations are provided such as the sunburst view.
- Ensembles: These are models comprised of sub-models. Ensembles are less useful for descriptive models and more useful for predictive, ideally providing increased accuracy from the combination of predictions from varied perspectives of the problem domain.
- Predictions: A model can be used to generate predictions. This can be question-by-question through the branches of the decision tree model that was constructed (like a decision support time), via sliders for specifying an input instance, one at a time, and via batch predictions, the results of which can be downloaded to file.
- Evaluation: Evaluation presents the estimation of a models performance based on a dataset. If you split a dataset into training and test, you can estimate the capability of the model on unseen data using measures such as classification accuracy, precision, recall and others. The performance is also summarized in graphs. The performance of models (ensembles or otherwise) can also be compared side-by-side.
- Tasks: This is a log of tasks performed using the service and is interesting only from a service auditing perspective. It probably should be given such prominence in the user interface.
The flow seems to be walking some invisible line between easy to use and configurability. There are configuration options that I can’t imagine an analyst or beginner ever wanting to touch (pruning methods for example).
Also the differentiation between prediction and evaluation may also be confusing for a beginner. I can imagine an interface that is much simpler: data, model predictions and the robots in the cloud take care of figuring out how robust the model is and reporting that to me (n-fold cross validation of all the various things that can be tuned and automatic model selection).
5 Clever Features in the Web Interface
The web interface is very responsive and clearly uses modern interface design techniques. While using the web interface, I noted 5 clever features you should know about.
- 1-Click: You can complete useful tasks in one-click, like macros. It makes me think of Amazon’s one-click purchasing, clever marketing. For example, from a selected dataset you could one-click create a model, an ensemble or a split of the dataset into training and test sets. This naming convention oozes ease of use.
- Interactive Trees: Decision trees are an old favorite because you can print them out and they are easily understood by subject matter experts. You can unambiguously see how a decision is made in the context of the domain (unlike opaque neural nets and SVM). Making the trees interactive is a natural next step. You can play with the visualization all day long performing what-if’s and relating it back to the domain.
- Downloadable Trees: You can download the rules for a model or the tree itself in a programming language of your choice. Very clever. Can can create a descriptive or predictive model in BigML, download the code and put it to work in your application such as website or decision support tool. This is useful and I love it.
- Sunburst View: The sunburst view of a model provides an innovative (at least to me) way of thinking about and exploring the rules developed in a decision tree.
- Gallery: Any model (perhaps any object) can be made publicly available in the gallery. You can explore and use other peoples models developed on their own and open data. Models can be commented on and access to your objects in the gallery can also be sold, a fascinating idea. This is really clever and way ahead of its time, I like it, but I fear that like IBMs many eyes will be under utilized. It might have more value if it was private and curated within an organization.
This is a service for moderately technical business analysts and software developers. The focus of the service is ease of use (to make models quickly) and model transparency (to make models understandable by domain experts).
Below are some use cases I believe where BigML could be the most useful.
- Descriptive Model: A business analyst (or some hacker) has complex data that they would like to describe. A descriptive model can be constructed to explain the relationships between attributes and the predicted attribute and to play what-if scenarios. This could be done using the website.
- Predictive Model: A business analyst (or some hacker) has complex problem that would like to predict from past examples. A predictive model can be constructed on the website and predictions made in batch and downloaded as a CSV file for analysis and application.
- Periodic Predictions: Like the previous scenario, but predictions are needed periodically in an ongoing manner. The model could be maintained on the BigML platform (updated when needed) can called remotely to make predictions as needed via the Bigmler command line interface.
- Integration: The service could be integrated into a script, internal website or desktop application for decision support. This would require the the use of the API and the model would be best maintained on the platform.
Machine Learning Practitioner
The machine learning practitioner may be left wanting. There is not a focus on model selection or performance estimation. You will not find the ability in the web interface to design sophisticated multi-run experiments and estimation of classification accuracy using cross validation.
There is some configuration that can be performed during the creation of a model, for example the tree pruning method can be varied between “smart pruning”, statistical pruning, and no statistical pruning. I can only imagine that smart pruning tries a bunch of methods and picks the best resulting tree. Ensembles are limited to bagging, I see great opportunity at abstracting this away further, doing boosting and stacking, thresholding and other methods behind the scenes and presenting a robot selected best “ensemble method” for use.
I can imagine that if I was a hard core decision tree guy I’d be frustrated at the lack of control or insight over choices made.
The benefits of a hosted solution is to remove the technical details and let large machines compute variations on the desired model and select the best one for you. It is not clear how much of this is going on behind the scene in BigML, but there may be opportunity for a lot more of this further hiding configuration details that will not longer be required – let the robots in the cloud handle model tuning and selection.
These are intentional design choices and it makes sense. The service is responsible for offering a “good enough” model for the problem and allow you to download or query a hosted version of that model on demand.
There is an opportunity here for machine learning practitioners, and that is to get something working very quickly, for example:
- Demonstration: Create a model from client data and show them in a video or presentation how decision trees work and how they have worked on their data, allowing them to gain insight into the model given their domain experience. This is much easier with BigML’s interactive decision trees rather than using static trees generated by R or scikit-learn.
- Fast Model: You need a model really really fast. BigML will make a tree or ensemble that you could use to generate predictions. This would be a fast and generic process that could be spend up with a scripted version of the Python API or command line interface. Faster than R, Weka, or scikit-learn? Maybe, if it was a large dataset hosted on S3 or something (bigger data or even Big Data).
- In-line Model: If you are hacking some DSL or prototype app in Java, Python or Ruby and you need a temporary or proof of concept model for demonstration purposes only, a BigML model can be created and downloaded in native code. A very handy feature in deed.
- Temporary Integration: As with the above, but you could be working with streaming data where the model needs periodic updates, you could integrate with the BigML API and make predictions in your prototype application until you workout your own solution.
BigML no doubt is counting on their solution being good enough that you don’t need to switch it out, and it very may will be for most general case problems.
BigML is very cool, the interface is slick and the are very clever ideas like the gallery and one-click everything. It would be worth trying out on the next small or side project, just to see what it’s like focusing on the before and after modeling stages more than the modeling stages itself.
There may be something lacking in the service though. I can’t put my finger on it. It might just need to be dialed in just that little more, either deciding it is for beginners and doing more lifting in the cloud or deciding it is for decision tree hackers and offering a cockpit of knobs and dials.
Machine learning as a service is the future: having a managed black box of the complex data-driven heart of your application. It makes sense for many internal business applications and even for exploratory side projects. Maybe these projects are just not discrete enough. You don’t see jobs on odesk or elance asking for a descriptive or a predictive model on some dataset. This maybe because the data is confidential, but it may also be because the problem cannot be decoupled from the systems, people and visualizations that are required.
Whatever the reason these platforms are not mainstream yet, I hope these guys can hang on until then.
In this section I want to provides you some pointers where you can learn more about BigML.
BigML provides great documentation as well as examples on their blog, checkout:
- BigML Features for an overview of what BigML can do, with a focus on the web interface
- BigML API Documentation called BigML.io
- BigML developers FAQ for questions and answers on the service and on machine learning in general
BigML have a healthy YouTube channel that include marketing videos as well as helpful tutorial videos. Below is a great example of using BigML to model and make predictions for the StumbleUpon Evergreen Classification Challenge on Kaggle.
More on Decision Trees
If you are interested in diving deeper into decision trees, below are some classic texts you might like to take a look at: