How do machines learn? Let’s explore supervised, unsupervised, and reward learning approaches.

Francesco Conti : 12 November 2025 22:03

Artificial intelligence isn’t about magic, it’s about learning! This article aims to demystify the esotericism surrounding artificial intelligence (AI) by providing a comprehensive answer to the question , “How do machines learn?” Indeed, the ” magic ” behind AI’s functioning lies in the learning phase. Artificial intelligence applications use vast amounts of data, from which patterns are identified to make data-driven decisions.

There are several approaches to learning, including supervised , unsupervised , and reinforcement learning. These methods differ in their objectives and problems to solve, as well as in the type of data available: labeled examples, unlabeled examples, or through direct interaction with an environment, respectively.

In this article, we’ll explore these three methods and try to understand how they work! We’ll also provide an overview of modern learning mechanisms, such as active learning and reinforcement learning from human feedback !

Indice dei contenuti nascondi

1. Supervised Learning

2. Unsupervised Learning

3. Reinforcement Learning

4. It doesn’t end here!

5. Modern approaches

6. Conclusions

Supervised Learning

Supervised learning is one of the most popular approaches in machine learning. Methods based on this approach rely on a training phase using data, in which each example is associated with a corresponding response, or label . The primary goal of a machine learning (ML) model in this context is to learn the relationship between data features and labels in order to make accurate predictions on new inputs. The main tasks that can be solved through supervised learning are:

Classification : The goal is to assign objects or instances to predefined categories or classes. For example, classification might involve assigning emails as “spam” or “not spam” or identifying images as “dog” or “cat.”
Regression : The prediction of a continuous numerical value based on input characteristics. For example, regression can be used to predict the price of a house based on its characteristics, such as the number of rooms, size, and location.

To gain a clearer understanding, let’s consider a classification example in which we want to train a model to predict whether a customer in an online store belongs to a “high-end” or “low-end” segment in order to target luxury product advertising. To this end, we collect data on customers’ income and average monthly spending. Each training example is assigned a label that evaluates whether the customer has responded to high-end ads in the past, associating a value of 1 (yellow square) and 0 (green triangle). In this example:

Features: annual income and average spending history;
Labels: Whether or not the customer has responded to luxury product advertisements in the past;
Objective: To supervised learning of a rule or function to correctly classify customers based on labeled data in the training dataset.

Once the rule has been learned, the model is used in a phase called inference , to classify new customers and understand whether to advertise luxury products. In this phase, the model

Beyond this trivial example, supervised learning is currently successfully used for problems of:

Image Detection : The technology that identifies, recognizes, and locates objects, animals, vehicles, or people in images. Training these models typically requires a large dataset of labeled examples, in which each image is associated with the position and class of all the objects present.
Sentiment Analysis : This involves determining the sentiment or emotion expressed in text, such as social media posts, product reviews, or customer feedback. Supervised learning can be used to train models that can classify text into different categories, such as positive, negative, or neutral. Models are trained on labeled datasets in which text samples are annotated with their corresponding sentiment labels.

Unsupervised Learning

In unsupervised learning, we have no labels or correct answers associated with the training data. The primary goal of this approach is to discover hidden patterns or structures in the data without any external guidance. The main tasks associated with unsupervised learning are:

Clustering : This involves grouping data sets based on their intrinsic similarities. For example, it can be used to group customers based on their product preferences or similar purchasing characteristics.
Dimensionality reduction : This involves reducing the number of variables under consideration while preserving the most relevant information. This can be useful, for example, when working with many interrelated features for easier interpretation and visualization.
Anomaly Detection : Anomaly detection involves identifying unusual or anomalous patterns or instances in a dataset. These approaches are used to identify unusual or anomalous transactions that may indicate fraudulent activity, thus providing an automated fraud detection system.

Returning to the store example, in this case we might have collected information about income and spending history, but without recording information about responses to previous advertisements for luxury products. In this case, we only have the features and not the labels, but we might still be interested in profiling customers to assess whether there are groups that might prove more responsive. From the figure, we can see that users are grouped into two clusters. Unsupervised learning, therefore, can still be used to extract insights from data and establish a rule for taking actions, such as targeted advertising, which in this case will be directed toward the cluster with the highest income and average spending.

Among the most significant applications of unsupervised learning:

Recommendation Systems : E-commerce platforms use unsupervised learning to offer personalized recommendations to customers by analyzing historical purchase data and browsing behavior to suggest related or interesting products.
Image compression : Unsupervised learning can be used to compress images by reducing their size without significant loss of visual quality.

Reinforcement Learning

Reinforcement Learning (RL) is a branch of artificial intelligence in which agents learn to make decisions through direct interaction with an environment. Unlike previous approaches, RL is based on a trial-and-error learning process. Agents explore the environment and receive positive or negative rewards based on their actions. The agent’s goal is to learn an optimal strategy to maximize the cumulative amount of reinforcements obtained over the long term. Through continuous iterations, the agent updates its action policy to make more intelligent decisions in the specific context.

RL finds application in a wide range of tasks, including controlling autonomous robots, resource management, strategic games, and action planning. For example, in robot control, the agent learns to perform actions that maximize the achievement of a specific goal, such as walking or manipulating objects. In strategic games, RL can be used to train agents capable of making tactical and strategic decisions to win complex games such as chess or video games.

It doesn’t end here!

The approaches described are the fundamental ingredients of machine learning and are necessary for understanding how functional blocks of entire AI systems function. However, some learning techniques benefit from hybrid or multi-stage training.

One example is unsupervised pretraining for computer vision tasks. In particular, when available labeled data is limited, pretraining with an unsupervised task allows a model to learn meaningful image representations from unlabeled data.

These learned features can be transferred to specific tasks, improving performance and reducing the need for labeled data. This type of learning is called transfer learning : a model is pre-trained on a task or domain and is then used as a starting point for tackling a new task.

Approaches of this type are used to address the lack (or excessive cost) of properly labeled data. Below, we’ll discuss other methods that achieve the same goal!

Modern approaches

Other learning approaches that are very popular in recent applications:

Semi-supervised : Machine learning methods that combine labeled data (labeled data) and unlabeled data (unlabeled data) to improve a model’s generalization ability. For example, imagine we need to build a model to classify emails as “spam” or “not spam.” We may only have a few emails labeled as spam or not spam, but many other emails have not yet been labeled. Using semi-supervised learning, we can use labeled emails to teach the model spam recognition criteria, but we can also leverage unlabeled emails to learn additional patterns that could help improve classification.

Self-supervised : A category of methods in which a model learns from unlabeled data without the need for explicit labels. The model creates a sort of “artificial supervision” from unlabeled data by generating implicit learning labels. A typical example is in text processing, where a model is trained to predict missing words through masks applied to sentences. Specifically, some words are masked, and the model must predict which words are missing. Self-supervised learning is often a key learning step in NLP models.

Active Learning : This strategy aims to reduce the cost of data annotation and improve learning efficiency by selecting which data requires additional labels from a human supervisor. Unlike supervised learning, in which labels are provided for the entire training set upfront, in active learning the model starts with a small set of labeled data. Then, instead of requiring the entire set to be labeled, active learning intelligently selects which additional data instances need to be labeled to improve model performance. You can learn more in this blog !

Reinforcement Learning with Human Feedback : This is a hybrid approach that uses both RL and human feedback to improve model performance. In RL learning, a model learns through trial and error, receiving rewards or punishments based on its actions in the environment. However, this process can be time-consuming and costly in complex situations. To make learning more efficient and effective, human feedback is introduced into the process. People provide demonstrations, explicit feedback, or preferences about the agent’s action, helping the model learn more quickly and achieve specific desired outcomes. In the context of Language Models (LLM), RL with human feedback is used to improve text generation. People can correct the generated text, indicate preferences for different text options, or provide examples of correct text. These procedures help the LLM model produce high-quality text, avoid errors, and generate consistent and accurate results. You can learn more in this blog !

Conclusions

In this article, we explored how machines learn; the techniques illustrated represent an important framework for formalizing AI problems. Even complex systems such as image recognition and language models rely on these functional building blocks. Future articles will explore how machine learning and deep learning extract insights from data to solve various tasks.

Francesco Conti

Lista degli articoli