Apple's Foundation Models framework gives you direct access to the on-device large language model that powers Apple Intelligence — a 3-billion parameter model running on the Neural Engine with a native Swift API. No API keys. No cloud costs. No data leaving the device. Just import the framework and start generating.
We integrated Foundation Models into our production Like One server — a Swift/Vapor application running on M3 Max — and deployed five AI endpoints in a single session. Text generation in 1.1 seconds. Structured quiz output in 3.8 seconds. Type-safe study notes with flashcards in 4.8 seconds. All on-device, all free. This guide shows you exactly how we did it.
If you are new to building with AI APIs, start with our Claude API guide. If you want to understand how Apple Intelligence compares to cloud models, keep reading — we run both and can tell you exactly where each one wins.
Getting Started
Foundation Models ships with macOS 26, iOS 26, and iPadOS 26. If your Mac runs Apple Silicon and you have Xcode 26+, you can start right now. Check availability with a single line:
import FoundationModels
let session = LanguageModelSession()
let response = try await session.respond(to: "What is retrieval-augmented generation?")
print(response.content)
That is a complete, working program. No API key setup. No dependency installation. No cloud account. The model runs entirely on your device's Neural Engine, optimized for Apple Silicon's unified memory architecture. On our M3 Max, that first response arrives in under 1 second.
Device Requirements
- iPhone 15 Pro or later
- Any M-series iPad
- Any M-series Mac
- Apple Intelligence must be enabled in Settings
The model is approximately 3 billion parameters — small enough to run on-device but large enough for practical tasks like summarization, classification, entity extraction, and structured output generation. For tasks requiring deeper reasoning or longer context, pair it with a cloud model like Claude.
Structured Output with @Generable
This is Apple Foundation Models' killer feature. The @Generable macro lets you define Swift structs that the model is constrained to produce. Not "asked nicely" — constrained at the token level through a technique called constrained decoding. The output is guaranteed to be valid, parseable, and type-safe.
import FoundationModels
@available(macOS 26.0, *)
@Generable
struct QuizQuestion {
@Guide(description: "The quiz question")
var question: String
@Guide(description: "Exactly 4 answer options")
var options: [String]
@Guide(description: "Zero-based index of the correct answer")
var correctIndex: Int
@Guide(description: "Why the correct answer is right")
var explanation: String
}
let session = LanguageModelSession()
let result = try await session.respond(
to: "Generate a quiz question about the Model Context Protocol.",
generating: QuizQuestion.self
)
print(result.content.question)
print(result.content.options)
print(result.content.correctIndex)
Compare this to asking any other LLM for JSON output — you get a string that might be valid JSON, might have trailing commas, might wrap itself in markdown code fences, might hallucinate extra fields. With @Generable, you get a compiled Swift struct. Every time. The compiler guarantees it.
The @Guide macro provides hints to the model about what each field should contain. It does not just add documentation — it steers the generation process. Use it to constrain string values, describe expected formats, and guide the model toward the output you need.
Nested Structures
@Generable supports nested types, arrays, and complex hierarchies:
@available(macOS 26.0, *)
@Generable
struct StudyNotes {
@Guide(description: "A 2-3 sentence overview")
var overview: String
@Guide(description: "Key concepts explained simply")
var concepts: [String]
@Guide(description: "Flashcard pairs for review")
var flashcards: [Flashcard]
}
@available(macOS 26.0, *)
@Generable
struct Flashcard {
var term: String
var definition: String
}
This is the pattern we use in production. Our /api/v1/ai/notes endpoint takes lesson content and returns structured study notes with an overview, concept list, and flashcard deck — all type-safe, all on-device, all in under 5 seconds.
Tool Calling
Foundation Models supports tool calling — the same pattern used in agentic loops with cloud models. You define tools that the model can invoke when it needs external information or wants to take an action.
@available(macOS 26.0, *)
struct SearchCourses: Tool {
let description = "Search for courses by topic"
@Argument(description: "The search query")
var query: String
func call() async throws -> String {
let results = CourseProvider().allCourses()
.filter { $0.title.lowercased().contains(query.lowercased()) }
return results.map { $0.title }.joined(separator: ", ")
}
}
Tool calling on-device means your AI features work offline, respond instantly, and never expose user data to external services. This is particularly powerful for accessibility tools, education apps, and enterprise applications where data privacy is non-negotiable.
Vapor Server Integration
If you run a Swift server — Vapor, Hummingbird, or any Swift backend — you can import Foundation Models directly. No HTTP API layer, no separate inference server, no Ollama. Just native Swift calling the Neural Engine.
import Vapor
import FoundationModels
struct AIController: RouteCollection {
func boot(routes: RoutesBuilder) throws {
let ai = routes.grouped("api", "v1", "ai")
ai.post("generate", use: generate)
ai.post("summarize", use: summarize)
}
@Sendable
func generate(req: Request) async throws -> Response {
guard #available(macOS 26.0, *) else {
throw Abort(.serviceUnavailable)
}
let body = try req.content.decode(GenerateRequest.self)
let session = LanguageModelSession()
let result = try await session.respond(to: body.prompt)
let res = Response(status: .ok)
try res.content.encode(["output": result.content])
return res
}
}
We run this exact pattern in production on likeone.ai. Five AI endpoints powered by Apple's Neural Engine, serving structured quiz questions, study notes, lesson summaries, and AI-powered course search. Zero API cost. The model runs on the same M3 Max that serves our website.
When to Use Foundation Models vs. Cloud Models
After running both Apple Foundation Models and Ollama (with qwen3 and llama models) on the same hardware, here is what we learned:
Apple Foundation Models Wins At
- Speed. 1.1 seconds for text generation vs. 10+ seconds for Ollama's 14B model. The Neural Engine is optimized for this exact workload.
- Structured output. @Generable guarantees valid types. No JSON parsing, no validation, no retry loops. This alone justifies the framework for any task that needs reliable structured data.
- Zero configuration. No model downloads, no GGUF files, no quantization decisions. Import and generate.
- Privacy. On-device by default. Private Cloud Compute for heavier tasks. Apple's security architecture means the data never touches a general-purpose server.
Cloud Models Win At
- Complex reasoning. Claude Opus and Sonnet handle multi-step reasoning, code generation, and nuanced analysis that exceeds a 3B model's capability. See our model comparison.
- Long context. Foundation Models supports 4,096 tokens. Claude supports 200K (and Opus up to 1M). For document analysis, conversation history, or code review, cloud models are necessary.
- Embeddings. Apple does not offer an embedding model through Foundation Models. For RAG systems and semantic search, you still need a separate embedding model.
- Fine-grained control. Cloud APIs offer temperature, top-p, system prompts, and conversation management. Foundation Models provides configuration but with fewer knobs.
The Hybrid Architecture
The best production systems use both. We route tasks based on their requirements:
- Apple FM: Quiz generation, classification, entity extraction, lesson summarization, course search — anything that needs fast structured output.
- Ollama (local): Embeddings, complex reasoning, reranking, long-context tasks — anything that needs a larger model or vector output.
- Claude API: Code generation, deep analysis, multi-step reasoning, content creation — anything where quality matters more than latency.
This three-tier architecture gives you the speed of on-device inference, the depth of local large models, and the intelligence of frontier cloud models. Each tier handles what it does best.
Python SDK
Apple also provides a Python SDK for Foundation Models on macOS:
pip install apple-fm-sdk
import apple_fm_sdk as fm
import asyncio
async def main():
model = fm.SystemLanguageModel()
available, reason = model.is_available()
if available:
session = fm.LanguageModelSession()
response = await session.respond("What is an agentic loop?")
print(response)
asyncio.run(main())
The Python SDK is useful for evaluation, testing, and integration with Python-based ML pipelines. It accesses the same on-device model as the Swift framework. We use it in our brain infrastructure alongside our existing embedding pipeline.
Custom Adapter Training
For specialized use cases, Apple provides a LoRA adapter training toolkit. You can fine-tune the on-device model with your own data to create a domain expert — an AI that knows your product, your terminology, your style.
The training process uses Low-Rank Adaptation (LoRA), the same technique we cover in our LoRA scaling factor guide. Each adapter is approximately 160MB and loads alongside the base model at runtime.
Requirements: a Mac with Apple Silicon and at least 32GB of RAM (or Linux with a GPU), 100-5,000 training samples in JSONL format, and the Foundation Models adapter training toolkit from Apple.
We are training a custom adapter from our 53 academy courses — over 500 lessons of AI education content. The goal is an on-device model that is an expert in AI architecture, MCP, agentic systems, and persistent memory — specialized knowledge that the base 3B model does not have.
App Intents: Making Your App Siri-Ready
Apple deprecated SiriKit at WWDC 2026. App Intents is now the only way Siri talks to your app. Without App Intents, your app is invisible in an AI-first operating system.
The integration has two parts: entity schemas (what your app contains) and intent schemas (what users can do).
import AppIntents
struct CourseEntity: AppEntity {
static let typeDisplayRepresentation = TypeDisplayRepresentation(name: "Course")
static let defaultQuery = CourseQuery()
var id: String
var title: String
var displayRepresentation: DisplayRepresentation {
DisplayRepresentation(title: "\(title)")
}
}
struct SearchCoursesIntent: AppIntent {
static let title: LocalizedStringResource = "Search Courses"
static let openAppWhenRun = true
@Parameter(title: "Search query")
var query: String
func perform() async throws -> some IntentResult & ProvidesDialog {
// Search logic here
.result(dialog: "Found 3 courses matching your search.")
}
}
Once your app exposes entities and intents, Siri can find your content, users can create Shortcuts automations, and Spotlight indexes your data. This is not optional in iOS 27 — it is foundational to how users will discover and interact with apps.
Private Cloud Compute
For tasks that exceed the on-device model's capabilities, Private Cloud Compute provides access to larger Apple Foundation Models running on Apple's secure servers. The privacy guarantees are remarkable: your data is processed but never stored, never accessible to Apple, and never used for training.
If you are enrolled in the App Store Small Business Program with fewer than 2 million first-time downloads, Private Cloud Compute access is free. This means small developers get cloud-scale AI with no API costs — a significant advantage over cloud providers that charge per token.
Building with Both Apple and Claude
Apple Intelligence and Claude are not competitors in your architecture — they are complementary layers. Apple handles the fast, private, structured tasks. Claude handles the deep, nuanced, long-context tasks. Together they create AI applications that are both responsive and intelligent.
This is the architecture we run at Like One: Apple Foundation Models for on-device inference in our Vapor server and iOS app, Ollama for local embeddings and reasoning, and the Claude API for content generation and complex analysis. Three tiers, each doing what it does best, all running on a single M3 Mac.
For more on the Claude side of this architecture, read our Agent SDK tutorial. For building the retrieval layer that connects them, see our RAG guide. And if you want to certify your skills across both Apple and Claude architectures, check our CCA exam prep guide.
Need Help Integrating Apple Intelligence?
From Foundation Models to App Intents to hybrid architectures — our consulting team builds AI-native iOS and macOS applications. We ship code, not slide decks.