List the differences between supervised and unsupervised learning

List the differences between supervised and unsupervised learning.

Let us start with an example itself . Say you have the data for all netflix customers that includes the customer ID, the number of shows they have watched so far, the time they have spend on the platform, the number of sub users in their ID, main genre of the shows they watch etc.

Now let us say we have two tasks as below -

  1. You just want to identify which customers are similar to each other based on their behavior. In this case you are not telling the algorithm what to find , you are not asking it to find a specific answer by saying use these inputs and predict something. In other words you are leaving the algorithm/model “unsupervised”. All you want is patterns and letting the algorithm find them . Clustering, Association mining, Topic modelling in text data etc are examples of classes of unsupervised models - you are just mining for patterns n these cases
  2. In the second case, say you want to identify the probability of customer churn. Here you will build the algorithm or model such that it gives you a specific answer. In other words you are “supervising” the algorithm by saying - here is a set of input variables (number of shows, time spent etc) and give an output variable (probability of churn). This can be done for both categorical/classification as well as regression/prediction. Linear regression, ogistic regression, random forest, decision trees etc are examples of supervised models because given a set of input variables, it tries to predict a particular output variable.

Unsupervised learning uses the entire dataset for the supervised training process. In contrast, in self-supervised learning, you withhold part of the data in some form, and you try to predict the rest.

I) In unsupervised learning, you try to find some ‘structure’ (clusters, densities, latent representation) in the entire dataset while using their original form.

II) In self-supervised learning, you try to learn the ‘dynamics’ of the data at its raw level. Popular self-supervised learning, i.e., image colorization, uses only the gray-scale (part of the data is withheld) version. You try to predict its colors (the rest is predicted). The same could be thought for inpainting or jigsaw puzzle with image patches.