Day 01 — Phase 0 — 90 minutes

AnLLMisasystemthatpredictsthenexttoken,trainedonmostoftheinternet.

The only mental model you need to start. Every chatbot, every agent, every demo on your timeline — this one sentence.

Reading time
12 minutes
— or one cup of bad coffee
scroll
§ 01 — the one-liner
§ 01 / the one-liner

Every word it writes is a guess — a ranked bet over a vocabulary of roughly 100,000 tokens.

Ask Claude to complete "The capital of France is ___". Paris wins. But every plausible next token also got a number. Here is what the model actually saw, ranked.

prompt
> The capital of France is ___
Paris
76.0%
the
8.0%
a
5.0%
Lyon
4.0%
Marseille
3.0%
in
2.0%
temperature = 0.7∑ = 1.0
↑ this is the entire trick
§ 02 / the only map you need

The Four Pillars.

If you remember nothing else today — this. Every AI product you've ever used is a mix of these four.

01
prompts

The instructions.

System message, role, format. The smallest pillar — 100 lines of English doing the heavy lifting.

ROLEsenior engineer
TONEconcise, direct
RULESno emojis
OUTPUTbullet list
felt in →
Custom GPTs · Claude system prompts
02
context

What you stuff in.

The window. RAG lives here. Codebases, PDFs, search results. 80% of product quality lives here.

0 / 200,000 tokens
felt in →
Cursor · Perplexity · NotebookLM
03
tools

What it can do.

Web search, code execution, API calls. The shift from chatbot → agent happens entirely here.

felt in →
Claude Code · Devin · Operator
04
memory

What persists.

Across sessions. Across days. ChatGPT memory, Claude Projects. The newest pillar — most under-built.

felt in →
ChatGPT memory · Claude Projects
decompose any product

Click a product. Watch its fingerprint.

From now on — every AI product you see, you decompose into these four.

Heavy context + tools. Light on prompts. Barely any memory — state lives in your codebase.

Cursor — pillar fingerprint
prompts20%
context90%
tools80%
memory15%
§ 03 / how we got here
~1950s–80s
Rule-based

Humans wrote every rule. Chess worked. Language didn't — language has patterns, not rules.

~1990s
ML

Stop writing rules. Show examples. Computer figures out patterns. Spam filters, recommendations, fraud detection.

~2010s
Deep learning

Stack the pattern matchers. Many layers. Depth made everything weirdly better. Nobody fully knows why even now.

2017→
GenAI

Don't just classify — generate. That's the leap. The transformer made this possible at scale.

2017

Attention Is All You Need

Google publishes the paper. Eight authors. Zero still at Google. Started Anthropic, Character, Inflection. Paper is 8 years old. Built a $500B industry.

8 authors → 8 companies
2020

GPT-3 surprise

OpenAI just made GPT bigger. Didn't change the architecture — just scale. Translation, few-shot learning, reasoning emerged. Nobody trained it for that.

scale alone was the secret
Nov 2022

ChatGPT

Why this one when chatbots existed since the 60s? UX. Free, fast, chat, no signup friction. GPT-3.5 had been out months. That single UX decision is why we're here.

ux > model

"Distribution and UX > model capability. Almost always."

— the lesson from ChatGPT
§ 04 / four weird demos
demo 01 / tokens

Why your name costs 3× as much

English
Vaibhav= 3 tokens
the= 1 token
Devanagari — same meaning, 3–4× more tokens
namaste kaise hain aap~6 tokens
नमस्ते कैसे हैं आप~18 tokens
"3–4× more tokens for the same meaning. This is why Hindi users pay more than English users. Real economic discrimination baked into the API."
demo 02 / strawberry

The model can't count letters

youhow many R's are in strawberry?
llm
There are 2 R's in strawberry.
st-raw-be-rry — "raw" and "rry"
Model sees: strawberry— not individual letters.
"Half the weird LLM behavior makes sense once you get this. They're not reading characters — they're playing patterns at token level."
demo 03 / context power

Same model. Same question. Context did the work.

without context
"Klovr appears to be an indie band from Helsinki specialising in ambient electronica, releasing their debut EP in 2019…"
✗ hallucinated
with context (bio pasted)
"Klovr is a GenAI field-notes guide by Vaibhav Lodha — a practitioner's notes on building with AI, day by day."
✓ grounded
"Model didn't get smarter. You gave it the answer key. This is why RAG exists."
demo 04 / embeddings

Words as coordinates in meaning-space

dogpuppycatcartruckvehiclePETSVEHICLES
king − man + woman ≈ queen
arithmetic on meaning — this is what RAG uses under the hood
→ deep dive in Day 4 / Phase 3
§ 05 / rag, briefly

Retrieval-Augmented Generation

Not a new model skill. A pre-step: find the right text first, then ask. That's why context power demo worked — you did RAG manually by pasting the bio.

"It's not magic. It's retrieval. And you've already built it by hand today."
→ deep dive Day 04 / Phase 3
01
Ingest

Documents split into chunks → each chunk converted to a coordinate (embedding) → stored in a vector database.

02
Retrieve

Your question becomes a coordinate too → find the nearest chunks in the vector store → those are your context.

03
Generate

Top-k chunks + your original question → sent to the LLM → grounded, accurate answer. That's RAG.

§ 06 / wait, what?

Things that'll
break your brain

All tightly connected to Day 1 content. Screenshot-worthy on their own.

01 / 06
Hindi costs 3–4× more than English

Same sentence in Devanagari = 3–4× the tokens of English. Real economic discrimination baked into the API.

why it matters: Token costs are language-dependent. Non-English speakers pay a tax.
02 / 06
The model has no idea what today is

Frozen at training cutoff. Genuinely doesn't know what year it is unless told. Every 'real-time' AI product is doing retrieval or tool calls behind the scenes.

why it matters: Context is the only clock it has.
03 / 06
LLMs dream in probability

The model isn't choosing the 'right' word. It's sampling from a distribution over ~100K tokens for every single word. Same prompt, different outputs is the foundation.

why it matters: Temperature = how much randomness you inject into the sampling.
04 / 06
Models cannot say 'I don't know' by default

Trained to always predict the next token. The ability to refuse or admit uncertainty has to be specifically trained in via RLHF. Hallucination is the default.

why it matters: Refusal is a feature, not a limitation. And it costs extra to train.
05 / 06
The transformer paper has 8 authors. Zero still at Google.

All left. Started Anthropic, Character, Inflection, Adept, Sakana, Essential AI. Google published the breakthrough and watched it walk out the door.

why it matters: Distribution and UX > model capability. The paper was never enough.
06 / 06
There are tokens the model literally can't speak

'SolidGoldMagikarp' — a Reddit username scraped into GPT-3, filtered out before fine-tuning, stuck in unmapped embedding space. A glitch token. Causes erratic behavior when triggered.

why it matters: Training data shapes the model in unexpected ways. The internet is weird.
§ 07 / the cost reality
the biggest tech revolution in 20 years is a POST request
curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{"model":"claude-sonnet-4-6","max_tokens":1024,"messages":[{"role":"user","content":"hi"}]}'

Build a product. Do the math.

Users1,000 users
API calls / user / day100 calls
Input tokens / call2,000 tokens
Output tokens / call500 tokens
Your price / user / month10 $/mo
monthly breakdown
Input tokens6000.0M → $18000
Output tokens1500.0M → $22500
Total cost$40,500 / mo
Your revenue$10,000 / mo
"Charging users $10/month? You have $10,000 revenue, $40,500 cost. You die in a month."
Pricing: Claude Sonnet — $3/M input · $15/M output
"every token costs money — pricing AI products is hard"
§ 08 / recap
01
the mental model
"Predicts the next token, trained on most of the internet."
← memorize this
02
the four pillars
01
Prompts
02
Context
03
Tools
04
Memory
← every AI product, decomposed
§ 09 / homework

Two things before next session.

These are the only things that matter before Day 2. Don't skip them.

01

Pick an AI product you use. Tell me which pillar it leans on hardest, and where it's weak.

Cursor, ChatGPT, Perplexity, Notion AI, Copilot — anything. Come with an opinion.

02

Find one AI fail in the news. There's at least one every week. Bring it.

Hallucination, bias, cost spiral, UX disaster — any of these count. Bring the link.

next up
Day 2 → Tokens, embeddings, transformers, self-attention.
Math-free version of all the math that runs everything. Temperature, sampling, top-k.
Tokenization deep dive
Embedding space walkthrough
Transformer architecture (visual)
Attention mechanism (no equations)
Sampling: temperature, top-k, top-p