AI EducademyAIEducademy
🌳

AI 学习路径

🌱
AI 种子

从零开始

🌿
AI 萌芽

打好基础

🌳
AI 枝干

付诸实践

🏕️
AI 树冠

深入探索

🌲
AI 森林

精通AI

🔨

工程技能路径

✏️
AI 草图

从零开始

🪨
AI 雕刻

打好基础

⚒️
AI 匠心

付诸实践

💎
AI 打磨

深入探索

🏆
AI 杰作

精通AI

查看所有学习计划→

实验室

已加载 7 个实验
🧠神经网络游乐场🤖AI 还是人类?💬提示实验室🎨图像生成器😊情感分析器💡聊天机器人构建器⚖️伦理模拟器
进入实验室→
📝

博客

关于AI、教育和技术的最新文章

阅读博客→
nav.faq
🎯
使命

让AI教育触达每一个人、每一个角落

💜
价值观

开源、多语言、社区驱动

⭐
Open Source

在 GitHub 上公开构建

认识创始人→在 GitHub 上查看
立即开始
AI EducademyAIEducademy

MIT 许可证。开源项目

学习

  • 学习计划
  • 课程
  • 实验室

社区

  • GitHub
  • 参与贡献
  • 行为准则
  • 关于
  • 常见问题

支持

  • 请我喝杯咖啡 ☕
AI & 工程学习计划›🌿 AI 萌芽›课程›决策树:可以在纸上画出的算法
🌳
AI 萌芽 • 中级⏱️ 25 分钟阅读

决策树:可以在纸上画出的算法

Decision Trees: The Algorithm You Can Draw on Paper 🌳

Most machine learning algorithms are black boxes — you feed in data, something mathematical happens inside, and a prediction comes out. Decision trees are different. They are one of the few algorithms you can fully explain to a non-technical colleague, draw on a whiteboard, and still trust to make accurate predictions.


🎮 The 20 Questions Analogy

You've probably played 20 Questions: one person thinks of something, and others ask yes/no questions to narrow it down. "Is it alive? Is it bigger than a car? Does it live in water?" Each answer eliminates a huge swath of possibilities until the answer becomes obvious.

A decision tree works exactly like this. Given a new data point to classify, the tree asks a series of questions about its features, following the branches that match each answer, until it reaches a leaf — a final prediction.

A decision tree for classifying animals: first split on 'has wings?', then 'lives in water?', leading to leaf nodes with animal names
A decision tree asks a series of questions about features, narrowing down to a prediction at each leaf node.

🌿 Anatomy of a Tree

Before we get into how trees learn, let's name the parts:

  • Root node — the very top question; the most important feature
  • Internal nodes — questions at each branch point
  • Branches — the paths taken based on yes/no (or value-range) answers
  • Leaf nodes — the endpoints; each holds a final prediction

A single data point travels from root to leaf, answering one question at each node, until it reaches a prediction.


📐 How a Decision Tree Learns

The clever part: how does the algorithm decide which question to ask at each node? It tries every possible split on every feature and picks the one that best separates the data.

Information Gain and Gini Impurity

Two common measures of "best separation":

Gini impurity measures how mixed a group is. A perfectly pure node — all examples belong to one class — has a Gini impurity of 0. A completely mixed node has the maximum impurity. The algorithm prefers splits that produce the purest child nodes.

Information gain is similar: it measures how much a split reduces uncertainty (entropy) about the class label. Higher information gain = better split.

Both measures ask the same underlying question: after splitting on this feature, how much more certain am I about the class?

\ud83e\udd2f

The CART algorithm (Classification and Regression Trees), introduced in 1984 by Breiman, Friedman, Olshen, and Stone, is the foundation of most modern decision tree implementations. Despite being 40 years old, it remains one of the most widely used ML algorithms.


✂️ Overfitting and Pruning

Left unconstrained, a decision tree will grow until every training example has its own leaf — achieving 100% accuracy on training data but failing completely on new data. This is overfitting.

Imagine memorising every past exam question word-for-word instead of understanding the subject. You'd ace the past papers but fail the real exam.

Two main remedies:

  1. Pre-pruning (early stopping) — set limits during training: maximum depth, minimum samples per leaf, minimum information gain threshold. The tree stops growing when it hits these limits.

  2. Post-pruning — grow the full tree, then trim back branches that don't improve performance on a validation set.

\ud83e\udd14
Think about it:

A decision tree with depth 1 (a single question) is called a "decision stump". It's extremely simple — almost certainly underfitting. A tree of depth 100 with one sample per leaf is overfitting. How would you decide where to stop?


🌲 From Trees to Forests

A single decision tree is powerful but brittle — small changes in training data can produce very different trees. The solution: grow hundreds of trees, each trained on a random subset of the data and features, then average their predictions.

This is a Random Forest — one of the most reliable and widely-used algorithms in all of machine learning. You'll cover it in depth in a later lesson. For now, remember: individual trees are interpretable, forests are robust.


✅ Strengths and ⚠️ Weaknesses

| Strengths | Weaknesses | |---|---| | Fully interpretable — can be visualised | Prone to overfitting without pruning | | No need to normalise or scale features | Small data changes = very different trees | | Handles both numerical and categorical features | Biased towards features with more values | | Works without feature engineering | Not great at capturing linear relationships | | Fast to train and predict | Single trees often underperform ensembles |


第 15 课,共 16 课已完成 0%
←监督学习与无监督学习:关键区别详解
聚类:AI如何在没有标签的情况下发现规律→