There has been a lot of talk around self-taught machines. With this blog we want to demystify the myth of machines being able to teach themselves.
Machines cannot teach themselves; a machine can only learn from the information provided by humans, for example engineers and programmers, using algorithms (also developed by humans). Machine ‘learning’ refers to the application of statistical models in order to find a solution (which are certain numbers called weights) to solve a particular problem. Today, there are two predominant approaches to machine learning, supervised and unsupervised.
Supervised machine learning uses input variables together with an output variable to create mappings and correlations between the inputs and the output with the help of an algorithm.
Unsupervised machine learning is a technical term that refers to a class of machine learning techniques where a machine is shown only inputs to try and come up with an answer (outputs). The closest we are able to get to machines to teaching themselves is through unsupervised machine learning.
Clustering is one example of unsupervised learning. Let’s say we want to group similar customers, but we don't know what the similarities are. Unsupervised learning will analyse all the information we have for the customers and come up with all existing similarities from which we can then select the ones most relevant to our purposes. A similarity can be in spending habits, products purchased, age group, location etc.
Another popular technique of unsupervised learning is the Principal Component Analysis (PCA). In simple terms, PCA can be used to find predominant attributes within large data sets. PCA tries to preserve the attributes that have more variation and remove the non-essential attributes with fewer variation. For example, a product might have hundreds of attributes like shape, colour, size, weight, power, price etc. PCA can help us find out which of these attributes is the most important.
Drawbacks of unsupervised machine learning
A common problem with unsupervised learning is interpretability and verifiability. It's not easy to interpret unsupervised learning because the algorithm transforms the original input into a new representation (for example f1 and f2 could be the output to the previous example of a product’s predominant attributes). This is why in most cases, the output from the unsupervised learning analysis is fed to another Machine Learning algorithm that will produce the final result -an actual answer to our question which might be whether the product should go into mass production or not. Similarly, the results of unsupervised learning cannot be readily verified as there is no prior knowledge of the unsupervised method.
If you would like to find out more about supervised and unsupervised machine learning, then give us a call on +44 (0)203 475 7980 or email us at Salesforce@coforge.com
Other useful links: