AI EducademyAIEducademy
🌳

AI 学习路径

🌱
AI 种子

从零开始

🌿
AI 萌芽

打好基础

🌳
AI 枝干

付诸实践

🏕️
AI 树冠

深入探索

🌲
AI 森林

精通AI

🔨

工程技能路径

✏️
AI 草图

从零开始

🪨
AI 雕刻

打好基础

⚒️
AI 匠心

付诸实践

💎
AI 打磨

深入探索

🏆
AI 杰作

精通AI

查看所有学习计划→

实验室

已加载 7 个实验
🧠神经网络游乐场🤖AI 还是人类?💬提示实验室🎨图像生成器😊情感分析器💡聊天机器人构建器⚖️伦理模拟器
进入实验室→
📝

博客

关于AI、教育和技术的最新文章

阅读博客→
nav.faq
🎯
使命

让AI教育触达每一个人、每一个角落

💜
价值观

开源、多语言、社区驱动

⭐
Open Source

在 GitHub 上公开构建

认识创始人→在 GitHub 上查看
立即开始
AI EducademyAIEducademy

MIT 许可证。开源项目

学习

  • 学习计划
  • 课程
  • 实验室

社区

  • GitHub
  • 参与贡献
  • 行为准则
  • 关于
  • 常见问题

支持

  • 请我喝杯咖啡 ☕
AI & 工程学习计划›🌳 AI 枝干›课程›微调大语言模型:为您的使用场景定制AI
🎯
AI 枝干 • 高级⏱️ 40 分钟阅读

微调大语言模型:为您的使用场景定制AI

Fine-Tuning LLMs: Customising AI for Your Use Case 🎯

A pre-trained large language model is a generalist. It has read a substantial fraction of the internet and absorbed a staggering breadth of knowledge. But "generalist" is another word for "not specialised". If you need a model that speaks your company's voice, understands your industry's jargon, follows your specific output format, or refuses to discuss off-topic subjects — you need more than prompting.

This is where fine-tuning comes in.


🗺️ Three Ways to Specialise a Model

Before diving into fine-tuning mechanics, it helps to understand where it sits relative to the alternatives:

| Approach | How it works | Best for | |---|---|---| | Prompt engineering | Craft instructions that guide the base model | Quick iteration, no data required | | RAG (Retrieval-Augmented Generation) | Retrieve relevant documents at inference time and include them in the prompt | Keeping knowledge current, large knowledge bases | | Fine-tuning | Adjust the model's weights using task-specific examples | Consistent style/format, specialised reasoning, reducing prompt length |

Fine-tuning and RAG are not mutually exclusive — many production systems use both.


🔬 What Fine-Tuning Actually Does

A pre-trained LLM has billions of weights (parameters) — numbers that encode everything it learned from pre-training. Fine-tuning continues training the model on a smaller, curated dataset that reflects your specific task.

The model sees your examples, calculates how wrong its current weights are on them, and adjusts. The core training mechanics (forward pass, loss calculation, backpropagation) are identical to pre-training — you're just doing it again, on your data, for far fewer steps.

The result: a model whose weights have been nudged away from "generalist" towards "your specific use case".

\ud83e\udd2f

GPT-3 had 175 billion parameters. Fine-tuning all of them on a consumer GPU is completely impractical. This is precisely why parameter-efficient fine-tuning methods like LoRA were developed — they make fine-tuning accessible without requiring a data centre.


⚡ Full Fine-Tuning vs Parameter-Efficient Fine-Tuning

Full fine-tuning updates every weight in the model. This is the most powerful approach — but also the most expensive. For a 70-billion-parameter model, you'd need multiple high-end GPUs and hours or days of training.

Parameter-efficient fine-tuning (PEFT) freezes most of the model's weights and only trains a small number of additional parameters. The two dominant techniques:

LoRA (Low-Rank Adaptation)

Instead of modifying the original weight matrices directly, LoRA adds small trainable matrices alongside the frozen weights. These small matrices capture the task-specific adjustments.

The maths: a weight update matrix that would normally be (d × d) is approximated as the product of two much smaller matrices (d × r) × (r × d), where r is a small "rank" (typically 4–64). This dramatically reduces the number of trainable parameters.

The result: you train perhaps 0.1–1% of the total parameters, at a tiny fraction of the compute and memory cost, while achieving performance close to full fine-tuning.

QLoRA (Quantised LoRA)

QLoRA combines LoRA with quantisation — representing the frozen base model weights in 4-bit precision instead of 16-bit or 32-bit. This reduces memory usage by roughly 75%, enabling fine-tuning of very large models on a single consumer GPU.

QLoRA made it possible for researchers and developers outside large organisations to fine-tune 70B+ models. It democratised access considerably.

\ud83e\udd14
Think about it:

LoRA adds lightweight trainable matrices alongside frozen pretrained weights. When you merge them for inference, you get the same architecture with different values. Can you think of why you might want to keep the LoRA weights separate from the base model rather than merging them permanently?


📊 The Training Data Challenge

Fine-tuning is only as good as your training data. Common formats:

  • Instruction-following pairs: {"instruction": "Summarise in one sentence:", "input": "...", "output": "..."}
  • Conversational pairs: multi-turn dialogue examples
  • Domain text: for domain adaptation, plain text from your field may suffice

Quality matters far more than quantity. A curated dataset of 1,000 high-quality examples typically outperforms a noisy dataset of 100,000. Common data issues:

  • Inconsistent format or tone
  • Label errors
  • Data that's too narrow (model becomes too specialised) or too broad (minimal benefit)

🔄 RLHF: Fine-Tuning With Human Feedback

Reinforcement Learning from Human Feedback (RLHF) is a second stage of fine-tuning used to align model behaviour with human preferences. It's how ChatGPT was trained to be helpful and refuse harmful requests.

The process: humans rank model outputs for quality, a "reward model" learns to predict human preferences, and then the LLM is fine-tuned using reinforcement learning to maximise reward. It's expensive and complex, but crucial for producing models that are safe and pleasant to use.


🛠️ Practical Steps

  1. Choose a base model — Llama 3, Mistral, Qwen, Gemma. Consider licence, size, and base capability.
  2. Prepare your dataset — clean, consistent, formatted as instruction-response pairs. Aim for 500–10,000 high-quality examples.
  3. Choose a method — LoRA/QLoRA for most use cases; full fine-tuning only if you have the compute and the task demands it.
  4. Train — use Hugging Face transformers + peft, or higher-level frameworks like Axolotl or Unsloth.
  5. Evaluate — test on a held-out set. Compare against the base model and a strong prompt-engineered baseline.
  6. Iterate — fine-tuning rarely works perfectly first time. Improve your data, adjust hyperparameters, repeat.

🚫 When NOT to Fine-Tune

Fine-tuning is not always the right answer:

  • Your problem can be solved with prompting — save the effort; iterate on prompts first
  • Your knowledge needs to stay current — fine-tuning bakes in knowledge at training time; RAG is better for dynamic information
  • You have very little data — fewer than a few hundred examples often hurts more than helps
  • You need fast iteration — a fine-tuning run takes hours; prompt changes take seconds
\ud83e\udd2f

The Alpaca model (2023) was fine-tuned from LLaMA using just 52,000 instruction-following examples generated by GPT-3.5. The total data generation cost was under $500. It demonstrated that instruction-following behaviour could be learned from a surprisingly small dataset.


第 14 课,共 14 课已完成 0%
←强化学习:通过试错教导AI
🏕️ AI 树冠→