AI EducademyAIEducademy
🌳

AI基础

🌱
AI 种子

从零开始

🌿
AI 萌芽

打好基础

🌳
AI 枝干

付诸实践

🏕️
AI 树冠

深入探索

🌲
AI 森林

精通AI

🔨

AI精通

✏️
AI 草图

从零开始

🪨
AI 雕刻

打好基础

⚒️
AI 匠心

付诸实践

💎
AI 打磨

深入探索

🏆
AI 杰作

精通AI

🚀

职业准备

🚀
面试发射台

开启你的旅程

🌟
行为面试精通

掌握软技能

💻
技术面试

通过编程轮次

🤖
AI与ML面试

ML面试精通

🏆
Offer与未来

拿下最好的Offer

查看所有学习计划→

实验室

已加载 7 个实验
🧠神经网络游乐场🤖AI 还是人类?💬提示实验室🎨图像生成器😊情感分析器💡聊天机器人构建器⚖️伦理模拟器
🎯模拟面试进入实验室→
学习旅程博客
🎯
关于

让AI教育触达每一个人、每一个角落

❓
常见问题

Common questions answered

✉️
Contact

Get in touch with us

⭐
Open Source

在 GitHub 上公开构建

立即开始
AI EducademyAIEducademy

MIT 许可证。开源项目

学习

  • 学习计划
  • 课程
  • 实验室

社区

  • GitHub
  • 参与贡献
  • 行为准则
  • 关于
  • 常见问题

支持

  • 请我喝杯咖啡 ☕
  • 服务条款
  • 隐私政策
  • 联系我们
AI & 工程学习计划›🌳 AI 枝干›课程›计算机视觉
👁️
AI 枝干 • 中级⏱️ 18 分钟阅读

计算机视觉

Computer Vision - How AI Learns to See the World

You glance at a photo and instantly know it shows a dog on a beach. For a computer, that same image is nothing more than a giant grid of numbers. Computer vision is the branch of AI that teaches machines to extract meaning from those numbers - and it is already reshaping industries around you.

How Computers "See"

When you look at a photograph, your brain instantly recognises shapes, colours, and depth. A computer has none of that intuition. Instead, it works with raw numbers.

A digital image is a grid of pixels. Each pixel stores colour values - typically three channels: red, green, and blue (RGB). A 1920 × 1080 HD image contains over two million pixels, each with three values ranging from 0 to 255. Multiply those together and even a single frame contains millions of numbers.

Diagram showing an image broken into a pixel grid with RGB channels
Every image is just a grid of numbers across red, green, and blue channels.

Resolution determines how much detail the grid captures. Higher resolution means more pixels and richer detail - but also far more data for the AI to process. A 4K image has four times the pixels of HD, which means four times the computational cost.

Grayscale images have just one channel (brightness), while some specialised formats - like satellite imagery or medical scans - may have dozens of channels capturing wavelengths invisible to the human eye.

🤯

The human eye can distinguish roughly 10 million colours. A standard 8-bit RGB image can represent over 16.7 million unique colour combinations - more than we can actually perceive!

Convolutional Neural Networks (CNNs)

Early attempts at computer vision relied on hand-crafted rules - "look for edges here, match this template there." These brittle approaches failed whenever the scene changed. Modern systems use Convolutional Neural Networks (CNNs), which learn their own rules from thousands of labelled examples.

Think of a CNN as an assembly line of pattern detectors, each layer building on the one before it:

  1. Convolutional layers slide small filters across the image, detecting simple patterns like edges, corners, and textures.
  2. Pooling layers shrink the data down, keeping only the most important signals and discarding redundant detail.
  3. Deeper convolutional layers combine those simple patterns into more complex features - eyes, wheels, letters.
第 1 课,共 14 课已完成 0%
←返回学习计划

Discussion

Sign in to join the discussion

建议修改本课内容
  • Fully connected layers pull all the features together to make a final decision - "this is a cat" or "this is a tumour."
  • The beauty is that nobody programmes these filters by hand. The network learns them during training, starting from random noise and gradually sharpening into useful detectors.

    🤔
    Think about it:

    When you learn to recognise a friend's face, you do not memorise every pixel - you pick up on key features like eye shape, hairstyle, and expression. CNNs do something remarkably similar. What features do you think a CNN would learn first?

    Classification, Detection, and Segmentation

    Computer vision tackles three progressively harder tasks:

    | Task | Question it answers | Example | |------|-------------------|---------| | Image classification | What is in this image? | "This X-ray shows pneumonia." | | Object detection | What is in this image and where? | Drawing boxes around every pedestrian in a street scene. | | Semantic segmentation | Which pixels belong to which object? | Colouring every pixel of a road, pavement, car, and sky differently. |

    Self-driving cars need all three simultaneously - classifying objects, locating them precisely, and understanding the full scene pixel by pixel.

    Each task requires progressively more computational power and training data. Classification was largely solved by 2015; real-time segmentation on video remains an active area of research today.

    🧠小测验

    Which computer vision task assigns a label to every individual pixel in an image?

    Real-World Applications

    Computer vision is already embedded in industries you might not expect:

    • Tesla Autopilot uses eight cameras and vision-based AI to detect lanes, traffic lights, and obstacles in real time - processing millions of frames per journey.
    • Medical imaging - AI models now match or exceed radiologists at spotting early-stage breast cancer in mammograms, sometimes catching what six human experts missed.
    • Quality control - factories use vision systems to inspect thousands of products per minute, catching defects far too subtle or fast for human inspectors.
    • Agriculture - drones with computer vision identify diseased crops across vast fields, enabling targeted treatment that reduces pesticide use by up to 90%.
    • Retail - Amazon Go stores use computer vision to track which products shoppers pick up, enabling checkout-free shopping.
    🤯

    Google's DeepMind developed an AI that can detect over 50 eye diseases from retinal scans as accurately as world-leading ophthalmologists - in seconds rather than weeks.

    Ethical Concerns

    Computer vision is powerful, but it raises serious questions that society is still grappling with:

    • Surveillance - facial recognition enables mass tracking of citizens. Several cities, including San Francisco and parts of the EU, have banned or restricted its use by police.
    • Bias - landmark studies by Joy Buolamwini at MIT showed that commercial facial recognition systems were significantly less accurate for darker-skinned faces and women, because training data has historically over-represented lighter-skinned males.
    • Consent - should your face be scanned without your knowledge in shops, airports, or public spaces? Many countries are still drafting legislation to address this.
    • Deepfakes - AI-generated fake images and videos can spread misinformation and damage reputations, making visual evidence less trustworthy.
    🤔
    Think about it:

    Imagine a school installs facial recognition cameras to take attendance automatically. What are the benefits? What could go wrong? Would you be comfortable with this system?

    🧠小测验

    Why do some facial recognition systems perform worse on certain demographic groups?

    Key Takeaways

    • Images are grids of pixel values across colour channels - computers see numbers, not pictures.
    • CNNs learn to extract features automatically through training, starting from edges and building up to complex objects.
    • Classification, detection, and segmentation represent increasing levels of visual understanding.
    • Computer vision drives breakthroughs from healthcare diagnostics to autonomous vehicles and precision agriculture.
    • Bias in training data and surveillance concerns demand careful, ethical deployment - technology alone is never enough without responsible governance.
    🧠小测验

    In a CNN, what is the purpose of pooling layers?