Module 1: The AI Landscape in 2026

🌏 What even is "AI"?

In 2026, "AI" mostly means Large Language Models (LLMs) — software trained on enormous amounts of text that can read, write, summarise, plan, and reason in plain language. They're not sentient. They don't "think." But they're extraordinarily capable at a specific kind of task: working with language and patterns.

You've probably already used one — ChatGPT, Siri, Google's smart suggestions. But what most people are using is the equivalent of only turning the taps on halfway.

The key shift happening right now: AI is moving from chatbots (you ask, it answers) to agents (you give it a goal, it figures out how to get there). That's the difference between asking someone a question and hiring them to do a job.

▶️ Watch: What is AI — the basics

CGP Grey · "Humans Need Not Apply" — 15 min · A brilliant introduction to how AI learns, made before the LLM explosion

Context note

This video was made before GPT-3 — it talks about "algorithms" and "machine learning" rather than LLMs specifically
The underlying principles are exactly the same: software that improves by being trained on enormous amounts of data, rather than being explicitly programmed with rules
What's changed since then is scale — more data, more compute, and the transformer architecture have made these systems dramatically more capable

✓ Video watched — progress updated

🏆 The main players

There are hundreds of AI models, but a handful dominate. Click any card to try it — opening 3 updates your progress.

OpenAI

ChatGPT / GPT-5

The most famous AI, launched publicly in late 2022 and now on its fifth major version. Strong all-rounder with excellent image understanding — photograph a document and ask questions about it.

GPT-5 is fast, capable, and familiar to most people. Free tier available; ChatGPT Plus is $20 USD/month.

Anthropic

Claude

Built with an explicit focus on being genuinely helpful rather than just agreeable. Excels at long documents, nuanced writing, and complex multi-part instructions.

Often the best choice when quality of output matters. Comes in Sonnet (fast, free tier) and Opus (more powerful, paid).

Note: if you have Claude for Desktop installed, this link may open the app instead of the browser — both work fine for this exercise.

Google

Gemini

Google's AI, now at Gemini 3.1 Pro. Deeply integrated with Gmail, Docs, Drive, and Search — if you live in Google Workspace, this is the natural choice.

The free tier is genuinely generous. Gemini Flash is the fast, lightweight version; Gemini Pro is the full-power flagship.

DeepSeek (China)

DeepSeek R2

A Chinese open-source model that shocked the Western AI industry in early 2025 by matching GPT-4 performance at a fraction of the training cost.

Extraordinarily strong at reasoning and mathematics. Completely free via deepseek.com, and free to run on your own hardware.

Meta (Facebook)

Llama 4

Meta's open-source model — the weights are publicly released, meaning anyone can download and run them. Llama runs completely on your own laptop, with no internet and no data leaving your machine.

Full privacy, no subscription, works offline. Best accessed via Ollama (ollama.com).

Alibaba (China)

Qwen 3

Alibaba's open-source model at Qwen 3 — genuinely world-class and often overlooked in English-speaking countries. Strong multilingual ability, useful for te reo Māori contexts.

Free via chat.qwen.ai, or run locally on your own hardware. Consistently ranks near the top of major benchmarks.

Raglan starting point: Begin with Gemini — free, no credit card needed, sign in with your Google account. Graduate to Claude when you need better writing or document analysis. Use DeepSeek or Qwen when you want power that costs nothing.

There are many more models worth knowing: Kimi 2 (Moonshot AI — powerful long-context model), Grok (xAI — Elon Musk's model, integrated with X/Twitter), Mistral (French open-source — excellent European alternative), and Manus (agentic AI that completes complex multi-step tasks autonomously).

Warning about "skins": The AI space is full of apps that give themselves catchy names and branding — but are simply one of the six models above with a custom interface on top. If something calls itself a "personalised AI assistant", check whether it discloses which underlying model it's using. Often it won't.

▶️ Optional Extended explanation: How LLMs really work 3Blue1Brown · 27 min · watch 10 minutes for the core idea

3Blue1Brown · "But what is a GPT? Visual intro to Transformers" — 27 min total · First 10 minutes covers the core idea

Key points from this video

LLMs work by predicting the next most likely word (or "token") over and over — that's the entire core mechanism
They don't store knowledge like a database — they learn statistical patterns from billions of examples
The "transformer" architecture (invented at Google in 2017) allows models to pay attention to different parts of a sentence simultaneously
Scale is everything: GPT-4 was trained on roughly the equivalent of reading the entire internet several times
This is why LLMs sometimes "hallucinate" — they're generating the most probable continuation, not looking something up

⚠️ This video goes deeper than you need for the workshop. Watch 10 minutes for the core concept, then move on.

✓ Video watched — progress updated

💰 Free vs Paid — the honest answer

Most people never need to pay. Here's the honest breakdown:

Free is genuinely good — Gemini Free, Claude.ai free tier, ChatGPT free tier all let you do real work.
Paid unlocks more — Faster responses, longer conversations, access to newer models, higher daily limits.
$20–$30/month gets you Claude Pro or Gemini Advanced — both are significantly more powerful.
API credits are the power user's route — pay only for what you use, access any model. We cover this in Module 6.

For this workshop series: We'll always show you the free path first. If paid is significantly better for your use case, we'll tell you and explain why.

Exercise 1.1

Try three models side by side

Open three browser tabs at once: claude.ai, gemini.google.com, chat.openai.com
Pick one of these prompts (or write your own) and paste the exact same text into all three:

Option A — local business "Write three Instagram captions for a Raglan surf school. Keep it local and unpretentious. No hashtags."

Option B — everyday task "Write a short out-of-office email for a small café closed this Saturday for a staff event. Keep it warm and local."

Option C — local knowledge test "Give me 5 ideas for a date night in a small coastal New Zealand town with no cinema or shopping mall."
Compare the results across all three — tone, quality, personality, local specificity. Which one feels most like something you'd actually use?
Note your preference and why. There's no right answer — this is about building your own intuition.

✓ That's the whole skill in miniature. You just did a model evaluation.

📊 How models compare — benchmarks explained

AI models are tested against standardised benchmarks — exam-style tests across different domains. Here's what the main benchmarks measure and how the top models stack up in 2026:

MMLU

Massive Multitask Language Understanding — 57 subjects from maths to law. Tests general knowledge breadth.

MATH

Competition-level mathematics problems. Tests multi-step reasoning and problem solving.

HumanEval

Real programming tasks — write working code to solve a problem. Tests coding ability.

GPQA

Graduate-level science questions (biology, chemistry, physics). Tests deep expert reasoning.

Model	MMLU	MATH	Coding
Claude Opus 4.6 flagship	93%	92%	95%
Gemini 3.1 Pro flagship	92%	95%	89%
GPT-5 flagship	92%	91%	93%
DeepSeek R2 free	90%	97%	91%
Claude Sonnet 4.6 mid-tier	90%	90%	93%
Qwen 3 free	89%	93%	88%
Gemini Flash 3.1 fast / free	86%	83%	82%
Llama 4 local / free	85%	82%	85%

Figures are approximate and change with every model release. Source: public benchmark leaderboards (LMSYS, HuggingFace). Benchmarks don't capture tone, NZ-specific context, or how natural a model sounds — which is why the exercise above matters more than these numbers.

The key takeaway: The top-tier models (Opus, Gemini Pro, GPT-5) are remarkably close. The free models (DeepSeek, Qwen, Gemini Flash) punch well above their price. Llama runs on your own machine. Pick based on your workflow, not just the leaderboard.

🚀 Where it's going — agents

The biggest shift in AI right now isn't better chatbots. It's agents — AI that can take actions, not just answer questions.

Instead of: "Write me a reply to this email" — an agent can read your inbox, draft replies, send them, and file the threads. Instead of: "Give me ideas for my café menu" — an agent can research local suppliers, check what's in season, draft the menu, and update your website.

We're at the very beginning of this. By Module 5 (Build Something Real) you'll be running agents that do real work on your computer.

🌏 What even is "AI"?

▶️ Watch: What is AI — the basics

🏆 The main players

💰 Free vs Paid — the honest answer

📊 How models compare — benchmarks explained

🚀 Where it's going — agents

You've finished Module 1 🌊