In the early days of AI, cutting-edge research was locked behind corporate labs and expensive proprietary software. Today, the most transformative AI tools are open-source โ freely available for anyone to use, study, modify, and share.
This isn't just a philosophical choice. Open-source AI has practical consequences:
The ecosystem is vast. Here's a map of the key projects and where they fit:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ APPLICATION LAYER โ
โ LangChain ยท LlamaIndex ยท Haystack ยท Streamlit โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ MODEL LAYER โ
โ Llama ยท Mistral ยท Stable Diffusion ยท Whisper โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ PLATFORM LAYER โ
โ Hugging Face ยท Ollama ยท vLLM ยท TGI โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ FRAMEWORK LAYER โ
โ PyTorch ยท TensorFlow ยท JAX ยท scikit-learn โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ DATA LAYER โ
โ Common Crawl ยท The Pile ยท LAION ยท RedPajama โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
import torch.nn as nn
# Define a simple neural network in PyTorch
class SimpleClassifier(nn.Module):
def __init__(self, input_size, num_classes):
super().__init__()
self.layers = nn.Sequential(
nn.Linear(input_size, 128),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(128, 64),
nn.ReLU(),
nn.Linear(64, num_classes),
)
def forward(self, x):
return self.layers(x)
model = SimpleClassifier(input_size=768, num_classes=10)
print(f"Parameters: {sum(p.numel() for p in model.parameters()):,}")
Why PyTorch won: Pythonic API, dynamic computation graphs (easier debugging), massive community, and first-class GPU support.
Hugging Face has become the central hub for sharing AI models, datasets, and applications.
from transformers import pipeline
# Sentiment analysis in 3 lines
classifier = pipeline("sentiment-analysis")
result = classifier("Open-source AI is transforming the world!")
print(result)
# [{'label': 'POSITIVE', 'score': 0.9998}]
# Summarisation in 3 lines
summariser = pipeline("summarization")
summary = summariser("Your long article text goes here...", max_length=50)
print(summary)
Key Hugging Face resources: Model Hub (500,000+ models), Datasets (100,000+), Spaces (free hosting), Transformers library (unified API)
These frameworks help you build applications on top of language models:
Hugging Face was originally a chatbot company for teenagers! They pivoted to become the central platform for open-source AI when they realised the real value was in the tools they'd built to manage NLP models. Today they're valued at over $4.5 billion.
The most exciting development in open-source AI is the release of powerful open-weight models that rival proprietary alternatives.
Meta's Llama family changed the game by releasing models competitive with GPT-3.5:
Llama Model Family
โโโโโโโโโโโโโโโโโโ
Llama 1 (Feb 2023) โ Research-only licence, leaked widely
Llama 2 (Jul 2023) โ Open licence, commercial use allowed
Llama 3 (Apr 2024) โ Major quality leap, 8B and 70B variants
Llama 3.1 (Jul 2024)โ 405B parameter model, multilingual
Key advantage: Can be run locally, fine-tuned for specific tasks,
and deployed without API costs.
A French AI company that punches above its weight:
The open-source image generation model that democratised AI art:
# Using Ollama โ the easiest way to run models locally
# Install: https://ollama.ai
# Download and run Llama 3
ollama run llama3
# Download and run Mistral
ollama run mistral
# Use in Python
# pip install ollama
import ollama
response = ollama.chat(
model="llama3",
messages=[{"role": "user", "content": "Explain quantum computing simply"}]
)
print(response["message"]["content"])
Open-weight vs open-source: Many "open" models release the trained weights but not the training data or full training code. This is technically "open-weight" rather than truly open-source. It's an important distinction โ you can use the model, but you can't fully reproduce or audit how it was trained.
You don't need to be a PhD researcher to contribute. Here's how to get started:
Level 1 โ User (start here)
Level 2 โ Documentation
Level 3 โ Code
Level 4 โ Features and Research
The most impactful open-source contributions are often not code at all. Clear documentation, helpful answers on forums, well-written bug reports, and translated tutorials lower the barrier for thousands of new users. If you can explain a concept clearly, you can contribute meaningfully to open-source AI today.
When using or contributing to open-source AI, licensing matters:
Licence Commercial Use Modify Distribute Patent Grant
โโโโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโ โโโโโโโโโโ โโโโโโโโโโโโ
MIT โ
Yes โ
Yes โ
Yes โ No
Apache 2.0 โ
Yes โ
Yes โ
Yes โ
Yes
GPL โ
Yes* โ
Yes โ
Yes* โ No
Llama Licence โ
Yes** โ
Yes โ
Yes** โ No
* GPL requires derivative works to also be GPL (copyleft)
** Llama licence restricts use above 700M monthly active users
Always check the licence before building a product on top of an open-source model. Some models that appear "open" have restrictions on commercial use, competition, or specific industries. When in doubt, consult your legal team.
Here's how to assemble a complete AI application from open-source components:
Example: AI-Powered Customer Support Bot
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Frontend: Next.js + Tailwind CSS (open-source)
โ
API Layer: FastAPI (open-source)
โ
Orchestration: LangChain (open-source)
โ
LLM: Llama 3 via Ollama (open-source)
โ
Knowledge: pgvector on PostgreSQL (open-source)
โ
Embeddings: all-MiniLM-L6-v2 from Hugging Face (open-source)
โ
Monitoring: Langfuse (open-source)
โ
Deployment: Docker + Kubernetes (open-source)
Total licence cost: ยฃ0
For open: Safety through transparency, democratisation, faster innovation, sovereignty
For closed: Safety through control, concentrated quality, clear accountability
Most of the industry is moving towards a spectrum rather than a binary:
Fully Closed Hybrid Fully Open
โโโโโโโโโโโโ โโโโโโ โโโโโโโโโโ
GPT-4 (OpenAI) Llama 3 (Meta) Pythia (EleutherAI)
Claude (Anthropic) Mistral (Mistral) BLOOM (BigScience)
Gemini (Google) Gemma (Google) OLMo (AI2)
โ More control More freedom โ
โ Less transparency More transparency โ
The term "open-source" in AI is hotly contested. The Open Source Initiative (OSI) argues that truly open-source AI must include training data, training code, and model weights. By this strict definition, even Meta's Llama โ which many call "open-source" โ would be classified as "open-weight" since the training data isn't fully disclosed.
You've mastered the tools and ecosystem of modern AI. In the final lesson, we'll look ahead to the future of AI โ from AGI and autonomous agents to regulation and career paths. It's time to think about where AI is going and where you fit in that future. ๐ฒ