AI EducademyAIEducademy
🌳

Fondations IA

🌱
AI Seeds

Partez de zéro

🌿
AI Sprouts

Construisez les fondations

🌳
AI Branches

Mettez en pratique

🏕️
AI Canopy

Approfondissez

🌲
AI Forest

Maîtrisez l'IA

🔨

Maîtrise IA

✏️
AI Sketch

Partez de zéro

🪨
AI Chisel

Construisez les fondations

⚒️
AI Craft

Mettez en pratique

💎
AI Polish

Approfondissez

🏆
AI Masterpiece

Maîtrisez l'IA

🚀

Prêt pour la Carrière

🚀
Rampe de lancement entretien

Commencez votre parcours

🌟
Maîtrise comportementale

Maîtrisez les compétences relationnelles

💻
Entretiens techniques

Réussissez l'épreuve de code

🤖
Entretiens IA et ML

Maîtrisez l'entretien ML

🏆
Offre et au-delà

Décrochez la meilleure offre

Voir tous les programmes→

Labo

7 expériences chargées
🧠Terrain de jeu neuronal🤖IA ou humain ?💬Labo de prompts🎨Generateur d'images😊Analyseur de sentiment💡Constructeur de chatbot⚖️Simulateur d'ethique
🎯Entretien simuléEntrer dans le labo→
ParcoursBlog
🎯
À propos

Rendre l'éducation en IA accessible à tous, partout

❓
FAQ

Common questions answered

✉️
Contact

Get in touch with us

⭐
Open Source

Construit publiquement sur GitHub

Commencer gratuitement
AI EducademyAIEducademy

Licence MIT. Open Source

Apprendre

  • Programmes
  • Leçons
  • Labo

Communauté

  • GitHub
  • Contribuer
  • Code de conduite
  • À propos
  • FAQ

Soutien

  • Offrez-moi un café ☕
  • Conditions d'utilisation
  • Politique de confidentialité
  • Contact
Programmes d'IA et d'ingénierie›🌿 AI Sprouts›Leçons›Arbres de décision : l'algorithme que vous pouvez dessiner sur papier
🌳
AI Sprouts • Intermédiaire⏱️ 25 min de lecture

Arbres de décision : l'algorithme que vous pouvez dessiner sur papier

Decision Trees: The Algorithm You Can Draw on Paper 🌳

Most machine learning algorithms are black boxes — you feed in data, something mathematical happens inside, and a prediction comes out. Decision trees are different. They are one of the few algorithms you can fully explain to a non-technical colleague, draw on a whiteboard, and still trust to make accurate predictions.


🎮 The 20 Questions Analogy

You've probably played 20 Questions: one person thinks of something, and others ask yes/no questions to narrow it down. "Is it alive? Is it bigger than a car? Does it live in water?" Each answer eliminates a huge swath of possibilities until the answer becomes obvious.

A decision tree works exactly like this. Given a new data point to classify, the tree asks a series of questions about its features, following the branches that match each answer, until it reaches a leaf — a final prediction.

A decision tree for classifying animals: first split on 'has wings?', then 'lives in water?', leading to leaf nodes with animal names
A decision tree asks a series of questions about features, narrowing down to a prediction at each leaf node.

🌿 Anatomy of a Tree

Before we get into how trees learn, let's name the parts:

  • Root node — the very top question; the most important feature
  • Internal nodes — questions at each branch point
  • Branches — the paths taken based on yes/no (or value-range) answers
  • Leaf nodes — the endpoints; each holds a final prediction

A single data point travels from root to leaf, answering one question at each node, until it reaches a prediction.


📐 How a Decision Tree Learns

The clever part: how does the algorithm decide which question to ask at each node? It tries every possible split on every feature and picks the one that best separates the data.

Information Gain and Gini Impurity

Two common measures of "best separation":

Gini impurity measures how mixed a group is. A perfectly pure node — all examples belong to one class — has a Gini impurity of 0. A completely mixed node has the maximum impurity. The algorithm prefers splits that produce the purest child nodes.

Information gain is similar: it measures how much a split reduces uncertainty (entropy) about the class label. Higher information gain = better split.

Both measures ask the same underlying question:

Leçon 15 sur 160% terminé
←Apprentissage supervisé vs non supervisé : différences clés expliquées

Discussion

Sign in to join the discussion

Suggérer une modification de cette leçon
after splitting on this feature, how much more certain am I about the class?
🤯

The CART algorithm (Classification and Regression Trees), introduced in 1984 by Breiman, Friedman, Olshen, and Stone, is the foundation of most modern decision tree implementations. Despite being 40 years old, it remains one of the most widely used ML algorithms.


✂️ Overfitting and Pruning

Left unconstrained, a decision tree will grow until every training example has its own leaf — achieving 100% accuracy on training data but failing completely on new data. This is overfitting.

Imagine memorising every past exam question word-for-word instead of understanding the subject. You'd ace the past papers but fail the real exam.

Two main remedies:

  1. Pre-pruning (early stopping) — set limits during training: maximum depth, minimum samples per leaf, minimum information gain threshold. The tree stops growing when it hits these limits.

  2. Post-pruning — grow the full tree, then trim back branches that don't improve performance on a validation set.

🤔
Think about it:

A decision tree with depth 1 (a single question) is called a "decision stump". It's extremely simple — almost certainly underfitting. A tree of depth 100 with one sample per leaf is overfitting. How would you decide where to stop?


🌲 From Trees to Forests

A single decision tree is powerful but brittle — small changes in training data can produce very different trees. The solution: grow hundreds of trees, each trained on a random subset of the data and features, then average their predictions.

This is a Random Forest — one of the most reliable and widely-used algorithms in all of machine learning. You'll cover it in depth in a later lesson. For now, remember: individual trees are interpretable, forests are robust.


✅ Strengths and ⚠️ Weaknesses

| Strengths | Weaknesses | |---|---| | Fully interpretable — can be visualised | Prone to overfitting without pruning | | No need to normalise or scale features | Small data changes = very different trees | | Handles both numerical and categorical features | Biased towards features with more values | | Works without feature engineering | Not great at capturing linear relationships | | Fast to train and predict | Single trees often underperform ensembles |