Let me begin with a video showing how Machine Learning helps to improve our life.
The lift is called ThyssenKrupp Elevator, an example of Predictive Maintenance. For more information about it, please read an article about how the system works and the challenges of implementing it on different types of lift.
I first learnt about the term “Machine Learning” when I was taking the online Stanford AI course in 2011. The course basically taught us about the basics of Artificial Intelligence. So, I got the opportunity to learn about Game Theory, object recognition, robotic car, path planning, machine learning, etc.
Meetup in Microsoft
I was very excited to see the announcement from Azure Community Singapore saying that there would be a Big Data expert to talk about Azure Machine Learning in the community monthly meetup.
The speaker is Doli, Big Data engineer working in Malaysia iProperty Group. He gave us a good introduction to Azure Machine Learning, and then followed by Market Basket Analysis, Regression, and a recommendation system works on Azure Machine Learning.
I found the talk to be interesting, especially for those who want to know more about Big Data and Machine Learning but still new to them. I will try my best to share with you here what I have learned from Doli’s 2-hour presentation.
Ano… What is Machine Learning?
Could we make the computer to learn and behave more intelligently based on the data? For example, is it possible that from both the flight and weather data, we can know which scheduled flights are going to be delayed? Machine Learning makes it possible. Machine Learning takes historical data and make prediction about future trend.
This Sounds Similar to Data Mining
During the meetup, there was a question raised. What is the difference between Data Mining and Machine Learning?
Data Mining is normally carried out by a person to discover the pattern from a massive, complicated dataset. However, Machine Learning can be done without human guidance to predict based on previous patterns and data.
There is a very insightful discussion on Cross Validated that I recommend for those who want to understand more about Data Mining and Machine Learning.
Supervised vs. Unsupervised Learning
Two types of Machine Learning tasks are highlighted in Doli’s talk. Supervised and unsupervised learning.
In supervised learning, new data is classified based on the training data which are accompanied with labels to help the system to learn by example. The web app how-old.net which went viral recently is using supervised learning. There is an interesting discussion on Quora about how how-old.net works. In the discussion, the Microsoft Bing Senior Program Manager, Eason Wang, also shared his blog post about this how-old.net project that he works on.
Gmail is also using supervised learning to find out which emails are spam or need to be prioritized. In the slides of Introduction to Apache Mahout, it uses YouTube Recommendation an example of supervised learning. This is because the recommendation given by YouTube has taken videos explicitly liked, added to favourites, rated by the user.
Unlike supervised learning, unsupervised learning is trying to find structure in unlabeled data. Clustering, as one of the unsupervised learning techniques, is grouping data into small groups based on similarity such that data in the same group are as similar as possible and data in different groups are as different as possible. An example for unsupervised learning is called the k-means Clustering.
Clearly, the prediction of Machine Learning is not about perfect accuracy.
Azure Machine Learning: Experiment!
With Azure Machine Learning, we are now able to perform cloud-based predictive analysis.
Azure Machine Learning is a service that developer can use to build predictive analytic models with training datasets. Those models then can be deployed for consumption as web service in C#, Python, and R. Hence, the process can be summarized as follows.
- Data Collection: Understanding the problem and collecting data
- Train: Training the model
- Analyze: Validating and tuning the data
- Deploy: Exposing the model to be consumed
Collecting data is part of the Experiment stage in Machine Learning. In case some of you wonder where to get large datasets, Doli shared with us a link to a discussion on Quora about where to find those public accessible large datasets.
In fact, there are quite a number of sample datasets available in Azure Machine Learning Studio too. During the presentation, Doli also showed us how to use Reader to connect to a MS SQL server to get data.
To see the data of the dataset, we can click on the output port at the bottom of the box and then select “Visualize”.
After getting the data, we need to do pre-processing, i.e. cleaning up the data. For example, we need to remove rows which have missing data.
In addition, we will choose relevant columns from the dataset (aka features in machine learning) which will help in the prediction. Choosing columns requires a few rounds of experiments before finding a good set of features to use for a predictive model.
Train and Analyze
As mentioned earlier, Machine Learning learns from a dataset and apply it to new data. Hence, in order to evaluate an algorithm in Machine Learning, the data collected will be split into two sets, the Training Set for Machine Learning to train the algorithm and Testing Set for prediction.
Doli said that the more data we use to train the model the better. However, they are many people having different opinions. For example, there is one online discussion about the optimal ratio between the Training Set and Testing Set. Some said 3:2, some said 1:1, and some said 3:1. I don’t know much about Statistical Analysis, so I will just make it 1:1, as shown in the tutorial in Machine Learning Studio.
So, what “algorithm” are we talking about here? In Machine Learning Studio, there are many learning algorithms to choose from. I won’t go into details about which algo to choose here. =)
Finally, we just hit the “Run” button located at the command bar to train the model and make a prediction on the test dataset.
After the run is successfully completed, we can view the prediction results.
From here, we can improve the model by changing the features, properties of algorithm, or even algorithm itself.
When we are satisfied with the model, we can publish it as a web service so that we can directly use it for new data in the future. Alternatively, you can also download an Excel workbook from the Machine Learning Studio which has macro added to compute the predicted values.
Read More and Join Our Meetup!
If you would like to find out more about Azure Machine Learning, there is a detailed step-by-step guide available on Microsoft Azure documentation about how to create an experiment in Machine Learning Studio. There is also a free e-book from Microsoft about Azure Machine Learning. Please take a look!
Oh ya, in case you would like to know more about how-old.net which is using Machine Learning, please visit the homepage of Microsoft Project Oxford to find out more about the Face APIs, Speech APIs, Computer Vision APIs, and other cools APIs that you can use.
Please correct me if you spot any mistake in my post because I am still very, very new to Machine Learning. Please join our meetup too, if you would like to know more about Azure.