Generative AI moved from research curiosity to production infrastructure in under three years. This lesson surveys the current state of the field — the players, the model categories, and the architectural patterns that have stabilised.
What a Foundation Model Is
A foundation model is a large neural network pre-trained on broad, internet-scale data (text, code, images, audio) and then adapted to many downstream tasks. The term — coined by Stanford in 2021 — captures the shift from training one model per task to training one big model and steering it via prompts, fine-tuning, or adapters.
Foundation models are characterised by:
- Hundreds of billions to trillions of parameters
- Training compute measured in millions of GPU-hours
- Emergent capabilities — abilities not explicitly taught but appearing at scale
- Generality — usable across language tasks, code, summarisation, reasoning
The Major Hosted Providers
| Provider | Flagship model (2026) | Differentiators |
|---|---|---|
| OpenAI | GPT-5 series | Broadest tooling ecosystem, function calling, Assistants API |
| Anthropic | Claude 4 series (Opus, Sonnet, Haiku) | Strong reasoning, large context, careful safety training |
| Google DeepMind | Gemini 2.x (Pro, Flash, Ultra) | Native multimodal, deep Google integration, very large context |
| Meta | Llama 4 (open weights) | Best open-weight option for self-hosting |
| Mistral AI | Mistral Large 2, Codestral | Strong open-weight + hosted hybrid |
| xAI | Grok 3 | Real-time X integration, long context |
| Cohere | Command R+ family | Enterprise focus, strong retrieval-augmented generation |
Cloud platforms (AWS Bedrock, Azure OpenAI, Google Vertex AI) host third-party models alongside their own — Bedrock alone hosts Anthropic, Meta, Mistral, Cohere, Stability, and Amazon's own Titan/Nova models.
Open-Weight vs Closed
"Open-weight" means the trained model parameters are downloadable and you can self-host. The leading open-weight families:
- Llama (Meta): 8B to 405B parameters; permissive licence with restrictions
- Mistral / Mixtral: 7B base, 8x22B mixture-of-experts
- Qwen (Alibaba): Excellent multilingual, strong at code
- DeepSeek: Cost-efficient reasoning models
- Gemma (Google): 2B to 27B, designed for on-device and edge
- Phi (Microsoft): Small, capable models for laptops/mobile
Open-weight models trail the absolute frontier by ~6-12 months but are competitive on cost when self-hosted and essential for regulated/sovereign use cases.
Multimodal Models
The 2024-2026 leap is true multimodality — models that natively process text + image + audio + video. Examples: Gemini 2.x accepts an hour of video in context; GPT-5 generates and edits images natively; Claude Opus 4 analyses chart screenshots and handwritten notes.
For developers this collapses what used to be multiple model calls (OCR → text model → TTS) into a single API request, with lower latency and higher fidelity.
Specialised Models
- Code models: GitHub Copilot, GPT-5-codex, Claude Code, Codestral, DeepSeek Coder
- Image generation: DALL-E 3, Midjourney v7, Stable Diffusion XL, Flux
- Video generation: Sora, Runway Gen-3, Veo 2, Kling
- Audio/music: ElevenLabs, Suno, Udio, Stable Audio
- Embedding models: text-embedding-3, Cohere Embed v3, Voyage AI — the foundation of RAG
The Agentic Shift
Until 2024 most LLM products were chatbots — text in, text out. The 2025-2026 wave is agents — LLMs that take actions: browse, search, run code, call APIs, edit files, deploy infrastructure. Examples: Claude Computer Use, OpenAI Operator, AutoGPT, LangGraph, CrewAI.
Agents combine three primitives:
- An LLM with strong reasoning
- A set of tools (function-calling API)
- A control loop that lets the LLM iterate until done
We cover the underlying mechanics in lesson 4.
Cost Trends
| Year | GPT-4-class input price | Output price |
|---|---|---|
| 2023 | ~$30 / 1M tokens | ~$60 / 1M tokens |
| 2024 | ~$10 / 1M tokens | ~$30 / 1M tokens |
| 2025 | ~$3 / 1M tokens | ~$10 / 1M tokens |
| 2026 (frontier) | ~$1-3 / 1M tokens | ~$5-15 / 1M tokens |
Prices vary by provider; small/fast tiers (GPT-4o-mini, Claude Haiku, Gemini Flash) are 10-50× cheaper than frontier tiers.
The cost drop has shifted the calculus — most applications can afford to use GenAI everywhere, not just in premium features.
The Application Layer
By 2026, every major SaaS product embeds GenAI:
- Productivity: Microsoft 365 Copilot, Google Workspace Gemini, Notion AI
- Code: GitHub Copilot, Cursor, Windsurf, Claude Code, Replit Agent
- Customer support: ServiceNow Now Assist, Salesforce Einstein, Intercom Fin
- Search: Perplexity, You.com, Google AI Overviews, Bing Copilot
- Design: Figma AI, Canva Magic, Adobe Firefly
What's Stable and What's Still Moving
Stable (you can build on it):
- The OpenAI-style chat completions API shape — adopted by every vendor
- Function calling / tool use
- Embeddings + vector DBs for retrieval
- Streaming responses
Still moving:
- Agent frameworks — too early to standardise on one
- Evaluation methodology — measurement is the bottleneck
- Model Context Protocol (MCP) — emerging standard for tools
- Long-context reliability (1M tokens technically works; quality varies)
With the landscape mapped, the next lesson opens the hood on how these models actually work — so prompting and architecture decisions in later lessons rest on a real mental model.