Words as Numbers.

Watch words become vectors in space — and discover that math can capture meaning.

After this lesson you'll know

How words become vectors (lists of numbers)
Why similar words cluster together in space
The famous king - man + woman = queen equation
Why embeddings are the foundation of modern AI

The Big Idea

Words have coordinates in a universe of meaning.

AI cannot read words the way you do. It reads numbers. So every word gets converted into a list of numbers that captures its meaning. This list of numbers is called a vector — think of it as GPS coordinates, but instead of pinpointing a place on Earth, they pinpoint a word's meaning in a universe of concepts.

"Cat" might have coordinates like [0.2, 0.8, -0.1, ...] across hundreds of dimensions. A dimension is just one aspect of meaning — one might roughly capture "is it alive?", another "is it big?", another "is it domestic?" The more dimensions, the more nuance the AI can express.

The magic: words that mean similar things end up close together on this map. "Happy," "joyful," and "delighted" are neighbors. "Sad" is far away. The map itself encodes meaning — and AI uses this map to understand language.

From Text to Numbers

Three ways machines have tried to read words.

Before embeddings, researchers tried simpler approaches. Understanding these older methods makes it clear why embeddings are such a breakthrough:

One-Hot Encoding — the dictionary approach (1990s)

Assign each word a unique position in a giant vector. "Cat" = [1, 0, 0, ...], "Dog" = [0, 1, 0, ...]. The problem: every word is equally distant from every other word. "Cat" is no closer to "dog" than to "algebra." And with 100,000 words, each vector is 100,000 numbers long with 99,999 zeros. Wasteful and meaningless.

Bag of Words — counting occurrences (2000s)

Count how many times each word appears in a document. "The cat sat on the mat" becomes {the: 2, cat: 1, sat: 1, on: 1, mat: 1}. Better than one-hot, but it ignores word order entirely. "Dog bites man" and "man bites dog" have the exact same representation despite meaning completely different things.

Embeddings — learned meaning (2013+)

Instead of hand-crafting representations, let the AI learn them from data. Train a model on billions of sentences, and words that appear in similar contexts develop similar vectors. "Cat" and "dog" end up near each other because they appear in similar sentences. "Cat" and "algebra" end up far apart. Dense, compact, and meaningful — this is the modern approach.

Modern AI also uses tokenization — splitting text into subword chunks before embedding:

  TOKENIZATION — HOW AI READS TEXT

  Word               Tokens              Why
  ──────────         ──────────          ─────────────────────
  "cat"              [cat]               Common word = 1 token
  "unbelievable"     [un, believ, able]  Rare word = 3 tokens
  "ChatGPT"          [Chat, G, PT]       Brand name = 3 tokens
  "123"              [1, 2, 3]           Numbers = 1 token each
  "    "             [    ]              Spaces = tokens too

  Rule of thumb: 1 token ≈ 4 characters ≈ 0.75 words
  100 tokens ≈ 75 words ≈ one short paragraph
  A 1-page document ≈ 300-400 tokens

Tokenization + embedding is the full pipeline: text gets split into tokens, each token gets an embedding vector, and the AI processes those vectors. Every word you type into ChatGPT or Claude goes through this exact pipeline — tokenize, embed, process, generate.

🔒

This lesson is for Pro members

Unlock all 520+ lessons across 52 courses with Academy Pro.

Go Pro — $49/mo ← Back to course

Already a member? Sign in to access your lessons.