AI EducademyAIEducademy
AcademicsLabBlogAbout
Sign In
AI EducademyAIEducademy

Free AI education for everyone, in every language.

Learn

  • Academics
  • Lessons
  • Lab
  • Dashboard
  • About

Community

  • GitHub
  • Contribute
  • Code of Conduct

Support

  • Buy Me a Coffee โ˜•

Free AI education for everyone

MIT Licence. Open Source

Programsโ€บ๐ŸŒฒ AI Forestโ€บLessonsโ€บOpen-Source AI โ€” The Tools, Models, and Communities Shaping the Future
๐ŸŒ
AI Forest โ€ข Advancedโฑ๏ธ 40 min read

Open-Source AI โ€” The Tools, Models, and Communities Shaping the Future

Why Open-Source Matters for AI ๐ŸŒ

In the early days of AI, cutting-edge research was locked behind corporate labs and expensive proprietary software. Today, the most transformative AI tools are open-source โ€” freely available for anyone to use, study, modify, and share.

This isn't just a philosophical choice. Open-source AI has practical consequences:

  • Transparency: You can inspect exactly how a model works
  • Reproducibility: Anyone can verify research claims
  • Innovation speed: Thousands of contributors improve tools faster than any single company
  • Accessibility: A student in Lagos has the same tools as a researcher at Stanford
A globe surrounded by interconnected open-source AI project logos
Open-source AI connects researchers, developers, and learners worldwide.

The Open-Source AI Landscape ๐Ÿ—บ๏ธ

The ecosystem is vast. Here's a map of the key projects and where they fit:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                  APPLICATION LAYER                    โ”‚
โ”‚  LangChain ยท LlamaIndex ยท Haystack ยท Streamlit       โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                    MODEL LAYER                        โ”‚
โ”‚  Llama ยท Mistral ยท Stable Diffusion ยท Whisper         โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                   PLATFORM LAYER                      โ”‚
โ”‚  Hugging Face ยท Ollama ยท vLLM ยท TGI                   โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                  FRAMEWORK LAYER                      โ”‚
โ”‚  PyTorch ยท TensorFlow ยท JAX ยท scikit-learn            โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                    DATA LAYER                         โ”‚
โ”‚  Common Crawl ยท The Pile ยท LAION ยท RedPajama          โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Key Open-Source Frameworks ๐Ÿ”ง

PyTorch โ€” The researcher's favourite

import torch.nn as nn

# Define a simple neural network in PyTorch
class SimpleClassifier(nn.Module):
    def __init__(self, input_size, num_classes):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(input_size, 128),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, num_classes),
        )

    def forward(self, x):
        return self.layers(x)

model = SimpleClassifier(input_size=768, num_classes=10)
print(f"Parameters: {sum(p.numel() for p in model.parameters()):,}")

Why PyTorch won: Pythonic API, dynamic computation graphs (easier debugging), massive community, and first-class GPU support.

Hugging Face โ€” The GitHub of AI

Hugging Face has become the central hub for sharing AI models, datasets, and applications.

from transformers import pipeline

# Sentiment analysis in 3 lines
classifier = pipeline("sentiment-analysis")
result = classifier("Open-source AI is transforming the world!")
print(result)
# [{'label': 'POSITIVE', 'score': 0.9998}]

# Summarisation in 3 lines
summariser = pipeline("summarization")
summary = summariser("Your long article text goes here...", max_length=50)
print(summary)

Key Hugging Face resources: Model Hub (500,000+ models), Datasets (100,000+), Spaces (free hosting), Transformers library (unified API)

LangChain and LlamaIndex โ€” Building AI applications

These frameworks help you build applications on top of language models:

๐Ÿคฏ

Hugging Face was originally a chatbot company for teenagers! They pivoted to become the central platform for open-source AI when they realised the real value was in the tools they'd built to manage NLP models. Today they're valued at over $4.5 billion.


Open-Source Models: The New Frontier ๐Ÿฆ™

The most exciting development in open-source AI is the release of powerful open-weight models that rival proprietary alternatives.

Llama (Meta)

Meta's Llama family changed the game by releasing models competitive with GPT-3.5:

Llama Model Family
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Llama 1 (Feb 2023)  โ†’ Research-only licence, leaked widely
Llama 2 (Jul 2023)  โ†’ Open licence, commercial use allowed
Llama 3 (Apr 2024)  โ†’ Major quality leap, 8B and 70B variants
Llama 3.1 (Jul 2024)โ†’ 405B parameter model, multilingual

Key advantage: Can be run locally, fine-tuned for specific tasks,
and deployed without API costs.

Mistral

A French AI company that punches above its weight:

  • Mistral 7B: Outperformed Llama 2 13B despite being half the size
  • Mixtral 8x7B: Mixture-of-experts architecture โ€” uses only 12B parameters per inference despite having 46B total
  • Known for releasing models via torrent links on Twitter/X โ€” unconventional but effective!

Stable Diffusion

The open-source image generation model that democratised AI art:

  • Run locally on a consumer GPU
  • Thousands of community fine-tunes and extensions
  • Spawned an entire ecosystem of tools (ControlNet, LoRA adapters, ComfyUI)

Running models locally

# Using Ollama โ€” the easiest way to run models locally
# Install: https://ollama.ai

# Download and run Llama 3
ollama run llama3

# Download and run Mistral
ollama run mistral

# Use in Python
# pip install ollama
import ollama

response = ollama.chat(
    model="llama3",
    messages=[{"role": "user", "content": "Explain quantum computing simply"}]
)
print(response["message"]["content"])
๐Ÿ’ก

Open-weight vs open-source: Many "open" models release the trained weights but not the training data or full training code. This is technically "open-weight" rather than truly open-source. It's an important distinction โ€” you can use the model, but you can't fully reproduce or audit how it was trained.


Contributing to Open-Source AI ๐Ÿค

You don't need to be a PhD researcher to contribute. Here's how to get started:

Contribution levels

Level 1 โ€” User (start here)

  • Use open-source tools in your projects
  • Report bugs when you find them
  • Star and share projects you find useful

Level 2 โ€” Documentation

  • Fix typos, improve examples, add tutorials
  • Translate documentation (hugely valuable!)
  • Write blog posts about your experience

Level 3 โ€” Code

  • Fix small bugs (look for "good first issue" labels)
  • Add tests for untested functionality
  • Improve error messages

Level 4 โ€” Features and Research

  • Implement new features or model architectures
  • Contribute model weights and datasets
  • Reproduce and extend research papers
๐Ÿค”
Think about it:

The most impactful open-source contributions are often not code at all. Clear documentation, helpful answers on forums, well-written bug reports, and translated tutorials lower the barrier for thousands of new users. If you can explain a concept clearly, you can contribute meaningfully to open-source AI today.


AI Licensing: Know the Rules ๐Ÿ“œ

When using or contributing to open-source AI, licensing matters:

Common licences

Licence          Commercial Use    Modify    Distribute    Patent Grant
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€    โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€    โ”€โ”€โ”€โ”€โ”€โ”€    โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€    โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
MIT              โœ… Yes            โœ… Yes    โœ… Yes        โŒ No
Apache 2.0       โœ… Yes            โœ… Yes    โœ… Yes        โœ… Yes
GPL              โœ… Yes*           โœ… Yes    โœ… Yes*       โŒ No
Llama Licence    โœ… Yes**          โœ… Yes    โœ… Yes**      โŒ No

* GPL requires derivative works to also be GPL (copyleft)
** Llama licence restricts use above 700M monthly active users
๐Ÿ’ก

Always check the licence before building a product on top of an open-source model. Some models that appear "open" have restrictions on commercial use, competition, or specific industries. When in doubt, consult your legal team.


Building with Open-Source: Composing Your AI Stack ๐Ÿ—๏ธ

Here's how to assemble a complete AI application from open-source components:

Example: AI-Powered Customer Support Bot
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

Frontend:       Next.js + Tailwind CSS (open-source)
                     โ†“
API Layer:      FastAPI (open-source)
                     โ†“
Orchestration:  LangChain (open-source)
                     โ†“
LLM:            Llama 3 via Ollama (open-source)
                     โ†“
Knowledge:      pgvector on PostgreSQL (open-source)
                     โ†“
Embeddings:     all-MiniLM-L6-v2 from Hugging Face (open-source)
                     โ†“
Monitoring:     Langfuse (open-source)
                     โ†“
Deployment:     Docker + Kubernetes (open-source)

Total licence cost: ยฃ0

The Open vs Closed Debate โš–๏ธ

For open: Safety through transparency, democratisation, faster innovation, sovereignty

For closed: Safety through control, concentrated quality, clear accountability

Most of the industry is moving towards a spectrum rather than a binary:

Fully Closed          Hybrid               Fully Open
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€          โ”€โ”€โ”€โ”€โ”€โ”€               โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
GPT-4 (OpenAI)       Llama 3 (Meta)       Pythia (EleutherAI)
Claude (Anthropic)    Mistral (Mistral)    BLOOM (BigScience)
Gemini (Google)       Gemma (Google)       OLMo (AI2)

โ† More control                     More freedom โ†’
โ† Less transparency               More transparency โ†’
๐Ÿคฏ

The term "open-source" in AI is hotly contested. The Open Source Initiative (OSI) argues that truly open-source AI must include training data, training code, and model weights. By this strict definition, even Meta's Llama โ€” which many call "open-source" โ€” would be classified as "open-weight" since the training data isn't fully disclosed.


Quick Recap ๐ŸŽฏ

  1. Open-source AI has democratised access to cutting-edge tools and models
  2. PyTorch, Hugging Face, LangChain, and LlamaIndex form the backbone of the open-source AI ecosystem
  3. Open models (Llama, Mistral, Stable Diffusion) offer powerful alternatives to proprietary APIs
  4. Contributing starts with using tools, reporting bugs, and improving documentation โ€” no PhD required
  5. Licensing matters โ€” MIT and Apache 2.0 are most permissive; always check custom AI licences
  6. Full AI stacks can be built entirely from open-source components at zero licence cost
  7. The open vs closed debate is evolving toward a spectrum of openness

What's Next? ๐Ÿ”ฎ

You've mastered the tools and ecosystem of modern AI. In the final lesson, we'll look ahead to the future of AI โ€” from AGI and autonomous agents to regulation and career paths. It's time to think about where AI is going and where you fit in that future. ๐ŸŒฒ

Lesson 2 of 30 of 3 completed
โ†Building AI Products โ€” From Prototype to ProductionThe Future of AI โ€” What's Next and Where You Fit Inโ†’