Challenges in Implementing Unsupervised Learning

In addition to the regular issues of finding the right algorithms and hardware, unsupervised learning presents a unique challenge: it’s difficult to figure out if you’re getting the job done or not.

In supervised learning, we define metrics that drive decision making around model tuning. Measures like precision and recall give a sense of how accurate your model is, and parameters of that model are tweaked to increase those accuracy scores. Low accuracy scores mean you need to improve, and so on.

Since there are no labels in unsupervised learning, it’s near impossible to get a reasonably objective measure of how accurate your algorithm is. In clustering for example, how can you know if K-Means found the right clusters? Are you using the right number of clusters in the first place? In supervised learning we can look to an accuracy score; here you need to get a bit more creative.

A big part of the “will unsupervised learning work for me?” question is totally dependent on your business context. If we take an example of customer segmentation, clustering will only work well if your customers actually do fit into natural groups. One of the best (but most risky) ways to test your unsupervised learning model is by implementing it in the real world and seeing what happens! Designing an A/B test–with and without the clusters your algorithm outputted–can be an effective way to see if it’s useful information or totally incorrect.