What Are Embeddings?
Every word, sentence, and document can become a list of numbers — a vector — that captures its meaning. This is the foundation of everything in RAG.
The core idea: An embedding converts text into a vector (a list of numbers) where similar meanings are close together in space. The word "happy" and "joyful" end up near each other, while "happy" and "database" are far apart. Type a word below and watch it appear in semantic space.
Why does this matter? Imagine you have a library of 10,000 documents and a user asks a question. You need to find the right answer — fast. That's what RAG (Retrieval-Augmented Generation) does: it fetches the most relevant documents and hands them to an AI. Embeddings are what make that search smart — finding answers by meaning, not just matching keywords.
Key Concepts
Embedding Dimensions
Real embeddings have 768-3072 dimensions. We show 2D for visualization.
OpenAI's text-embedding-3-small uses 1536 dimensions. Each dimension captures a different aspect of meaning — sentiment, topic, formality, etc. More dimensions = more nuance.
Cosine Similarity
Measures how similar two vectors are by checking if they point in the same direction. 1.0 = identical meaning, 0 = completely unrelated. It ignores length and focuses purely on direction.
cos(A,B) = (A · B) / (||A|| × ||B||). We use the angle, not distance, because it's magnitude-independent. A long and short vector pointing the same direction are still "similar."
Semantic Space
The multi-dimensional space where embeddings live. Nearby = related meaning.
Embedding models learn this space from billions of text examples. They discover that "king - man + woman = queen" and similar analogies emerge naturally from the geometry.
Embedding Models
Neural networks trained to map text to vectors that preserve meaning.
Popular models: OpenAI text-embedding-3, Cohere embed-v3, BGE, E5. They're trained with contrastive learning — pulling similar texts together and pushing dissimilar texts apart.
Match the Concept
Tap one on the left, then its match on the right