Words as Numbers.
Watch words become vectors in space — and discover that math can capture meaning.
After this lesson you'll know
- How words become vectors (lists of numbers)
- Why similar words cluster together in space
- The famous king - man + woman = queen equation
- Why embeddings are the foundation of modern AI
Words have coordinates in a universe of meaning.
AI cannot read words the way you do. It reads numbers. So every word gets converted into a list of numbers that captures its meaning. This list of numbers is called a vector — think of it as GPS coordinates, but instead of pinpointing a place on Earth, they pinpoint a word's meaning in a universe of concepts.
"Cat" might have coordinates like [0.2, 0.8, -0.1, ...] across hundreds of dimensions. A dimension is just one aspect of meaning — one might roughly capture "is it alive?", another "is it big?", another "is it domestic?" The more dimensions, the more nuance the AI can express.
The magic: words that mean similar things end up close together on this map. "Happy," "joyful," and "delighted" are neighbors. "Sad" is far away. The map itself encodes meaning — and AI uses this map to understand language.
Three ways machines have tried to read words.
Before embeddings, researchers tried simpler approaches. Understanding these older methods makes it clear why embeddings are such a breakthrough:
Assign each word a unique position in a giant vector. "Cat" = [1, 0, 0, ...], "Dog" = [0, 1, 0, ...]. The problem: every word is equally distant from every other word. "Cat" is no closer to "dog" than to "algebra." And with 100,000 words, each vector is 100,000 numbers long with 99,999 zeros. Wasteful and meaningless.
Count how many times each word appears in a document. "The cat sat on the mat" becomes {the: 2, cat: 1, sat: 1, on: 1, mat: 1}. Better than one-hot, but it ignores word order entirely. "Dog bites man" and "man bites dog" have the exact same representation despite meaning completely different things.
Instead of hand-crafting representations, let the AI learn them from data. Train a model on billions of sentences, and words that appear in similar contexts develop similar vectors. "Cat" and "dog" end up near each other because they appear in similar sentences. "Cat" and "algebra" end up far apart. Dense, compact, and meaningful — this is the modern approach.
Modern AI also uses tokenization — splitting text into subword chunks before embedding:
TOKENIZATION — HOW AI READS TEXT
Word Tokens Why
────────── ────────── ─────────────────────
"cat" [cat] Common word = 1 token
"unbelievable" [un, believ, able] Rare word = 3 tokens
"ChatGPT" [Chat, G, PT] Brand name = 3 tokens
"123" [1, 2, 3] Numbers = 1 token each
" " [ ] Spaces = tokens too
Rule of thumb: 1 token ≈ 4 characters ≈ 0.75 words
100 tokens ≈ 75 words ≈ one short paragraph
A 1-page document ≈ 300-400 tokens
This lesson is for Pro members
Unlock all 520+ lessons across 52 courses with Academy Pro.
Already a member? Sign in to access your lessons.