What is the Apple Foundation Models framework?

The Foundation Models framework gives Swift developers direct access to Apple's on-device 3-billion parameter language model. It supports text generation, structured output via the @Generable macro, tool calling, and image input. The model runs on the Neural Engine with no API keys, no cloud costs, and no data leaving the device.

Which devices support Apple Foundation Models?

iPhone 15 Pro and later, any iPad with an M-series chip, and any Mac with Apple Silicon. The device must run iOS 26, iPadOS 26, or macOS 26 or later, and Apple Intelligence must be enabled in Settings.

Is Apple Foundation Models free to use?

Yes. On-device inference is completely free with no usage limits. Private Cloud Compute is also free for developers enrolled in the App Store Small Business Program with fewer than 2 million first-time downloads. There are no per-token charges.

What is @Generable in Apple Foundation Models?

@Generable is a Swift macro that constrains the model to produce output matching your Swift struct definition. Unlike asking a model for JSON and hoping it is valid, @Generable uses constrained decoding at the token level to guarantee structurally correct, type-safe output every time. It is the framework's most powerful feature.

Can I use Foundation Models in a Vapor server?

Yes. Import FoundationModels in any Swift server application running on macOS 26 with Apple Silicon. We run five AI endpoints in production on a Vapor server powered by an M3 Max. The model runs on the Neural Engine alongside the web server with no separate inference process needed.

How does Apple Foundation Models compare to Ollama?

Apple Foundation Models is faster for simple tasks (1.1s vs 10s for comparable output) and provides guaranteed structured output via @Generable. Ollama supports larger models (up to 70B+), longer context windows (128K+ vs 4K), and embeddings (which Apple does not offer). The best architecture uses both: Apple FM for fast structured output and Ollama for embeddings and complex reasoning.

Can I fine-tune Apple Foundation Models?

Yes, through LoRA adapter training. Apple provides a Python toolkit for training rank-32 adapters from your own data. Each adapter is approximately 160MB. Training requires a Mac with 32GB+ RAM or a Linux GPU machine, and 100-5,000 training samples in JSONL format.

Do I need App Intents for Apple Intelligence?

Yes. Apple deprecated SiriKit at WWDC 2026 and made App Intents mandatory for Siri integration in iOS 27. Without App Intents, your app is invisible to Siri, cannot participate in Shortcuts automations, and misses Spotlight semantic indexing. Every iOS app should adopt App Intents now.

Can Apple Foundation Models replace Claude or ChatGPT?

For simple tasks like classification, summarization, and structured output, yes — and it is faster and free. For complex reasoning, long documents, code generation, and nuanced analysis, cloud models like Claude are significantly more capable. The best approach is a hybrid architecture that uses each model for what it does best.

Apple Foundation Models: Swift Guide

Q: What is Private Cloud Compute?

Private Cloud Compute is Apple's secure server infrastructure for running larger Foundation Models. Your data is processed but never stored, never accessible to Apple, and never used for training. It provides cloud-scale AI while maintaining Apple's privacy guarantees. Free for Small Business Program developers.

Build AI features with Apple's on-device LLM. Foundation Models framework, @Generable output, tool calling, and Vapor integration.

Apple's Foundation Models framework gives you direct access to the on-device large language model that powers Apple Intelligence — a 3-billion parameter model running on the Neural Engine with a native Swift API. No API keys. No cloud costs. No data leaving the device. Just import the framework and start generating.

We integrated Foundation Models into our production Like One server — a Swift/Vapor application running on M3 Max — and deployed five AI endpoints in a single session. Text generation in 1.1 seconds. Structured quiz output in 3.8 seconds. Type-safe study notes with flashcards in 4.8 seconds. All on-device, all free. This guide shows you exactly how we did it.

If you are new to building with AI APIs, start with our Claude API guide. If you want to understand how Apple Intelligence compares to cloud models, keep reading — we run both and can tell you exactly where each one wins.

Getting Started

Foundation Models ships with macOS 26, iOS 26, and iPadOS 26. If your Mac runs Apple Silicon and you have Xcode 26+, you can start right now. Check availability with a single line:

import FoundationModels

let session = LanguageModelSession()
let response = try await session.respond(to: "What is retrieval-augmented generation?")
print(response.content)

That is a complete, working program. No API key setup. No dependency installation. No cloud account. The model runs entirely on your device's Neural Engine, optimized for Apple Silicon's unified memory architecture. On our M3 Max, that first response arrives in under 1 second.

Device Requirements

iPhone 15 Pro or later
Any M-series iPad
Any M-series Mac
Apple Intelligence must be enabled in Settings

The model is approximately 3 billion parameters — small enough to run on-device but large enough for practical tasks like summarization, classification, entity extraction, and structured output generation. For tasks requiring deeper reasoning or longer context, pair it with a cloud model like Claude.

Structured Output with @Generable

This is Apple Foundation Models' killer feature. The @Generable macro lets you define Swift structs that the model is constrained to produce. Not "asked nicely" — constrained at the token level through a technique called constrained decoding. The output is guaranteed to be valid, parseable, and type-safe.

import FoundationModels

@available(macOS 26.0, *)
@Generable
struct QuizQuestion {
    @Guide(description: "The quiz question")
    var question: String
    @Guide(description: "Exactly 4 answer options")
    var options: [String]
    @Guide(description: "Zero-based index of the correct answer")
    var correctIndex: Int
    @Guide(description: "Why the correct answer is right")
    var explanation: String
}

let session = LanguageModelSession()
let result = try await session.respond(
    to: "Generate a quiz question about the Model Context Protocol.",
    generating: QuizQuestion.self
)

print(result.content.question)
print(result.content.options)
print(result.content.correctIndex)

Compare this to asking any other LLM for JSON output — you get a string that might be valid JSON, might have trailing commas, might wrap itself in markdown code fences, might hallucinate extra fields. With @Generable, you get a compiled Swift struct. Every time. The compiler guarantees it.

The @Guide macro provides hints to the model about what each field should contain. It does not just add documentation — it steers the generation process. Use it to constrain string values, describe expected formats, and guide the model toward the output you need.

Nested Structures

@Generable supports nested types, arrays, and complex hierarchies:

@available(macOS 26.0, *)
@Generable
struct StudyNotes {
    @Guide(description: "A 2-3 sentence overview")
    var overview: String
    @Guide(description: "Key concepts explained simply")
    var concepts: [String]
    @Guide(description: "Flashcard pairs for review")
    var flashcards: [Flashcard]
}

@available(macOS 26.0, *)
@Generable
struct Flashcard {
    var term: String
    var definition: String
}

This is the pattern we use in production. Our /api/v1/ai/notes endpoint takes lesson content and returns structured study notes with an overview, concept list, and flashcard deck — all type-safe, all on-device, all in under 5 seconds.

Tool Calling

Foundation Models supports tool calling — the same pattern used in agentic loops with cloud models. You define tools that the model can invoke when it needs external information or wants to take an action.

@available(macOS 26.0, *)
struct SearchCourses: Tool {
    let description = "Search for courses by topic"
    
    @Argument(description: "The search query")
    var query: String
    
    func call() async throws -> String {
        let results = CourseProvider().allCourses()
            .filter { $0.title.lowercased().contains(query.lowercased()) }
        return results.map { $0.title }.joined(separator: ", ")
    }
}

Tool calling on-device means your AI features work offline, respond instantly, and never expose user data to external services. This is particularly powerful for accessibility tools, education apps, and enterprise applications where data privacy is non-negotiable.

Vapor Server Integration

If you run a Swift server — Vapor, Hummingbird, or any Swift backend — you can import Foundation Models directly. No HTTP API layer, no separate inference server, no Ollama. Just native Swift calling the Neural Engine.

import Vapor
import FoundationModels

struct AIController: RouteCollection {
    func boot(routes: RoutesBuilder) throws {
        let ai = routes.grouped("api", "v1", "ai")
        ai.post("generate", use: generate)
        ai.post("summarize", use: summarize)
    }

    @Sendable
    func generate(req: Request) async throws -> Response {
        guard #available(macOS 26.0, *) else {
            throw Abort(.serviceUnavailable)
        }
        let body = try req.content.decode(GenerateRequest.self)
        let session = LanguageModelSession()
        let result = try await session.respond(to: body.prompt)
        let res = Response(status: .ok)
        try res.content.encode(["output": result.content])
        return res
    }
}

We run this exact pattern in production on likeone.ai. Five AI endpoints powered by Apple's Neural Engine, serving structured quiz questions, study notes, lesson summaries, and AI-powered course search. Zero API cost. The model runs on the same M3 Max that serves our website.

When to Use Foundation Models vs. Cloud Models

After running both Apple Foundation Models and Ollama (with qwen3 and llama models) on the same hardware, here is what we learned:

Apple Foundation Models Wins At

Speed. 1.1 seconds for text generation vs. 10+ seconds for Ollama's 14B model. The Neural Engine is optimized for this exact workload.
Structured output. @Generable guarantees valid types. No JSON parsing, no validation, no retry loops. This alone justifies the framework for any task that needs reliable structured data.
Zero configuration. No model downloads, no GGUF files, no quantization decisions. Import and generate.
Privacy. On-device by default. Private Cloud Compute for heavier tasks. Apple's security architecture means the data never touches a general-purpose server.

Cloud Models Win At

Complex reasoning. Claude Opus and Sonnet handle multi-step reasoning, code generation, and nuanced analysis that exceeds a 3B model's capability. See our model comparison.
Long context. Foundation Models supports 4,096 tokens. Claude supports 200K (and Opus up to 1M). For document analysis, conversation history, or code review, cloud models are necessary.
Embeddings. Apple does not offer an embedding model through Foundation Models. For RAG systems and semantic search, you still need a separate embedding model.
Fine-grained control. Cloud APIs offer temperature, top-p, system prompts, and conversation management. Foundation Models provides configuration but with fewer knobs.

The Hybrid Architecture

The best production systems use both. We route tasks based on their requirements:

Apple FM: Quiz generation, classification, entity extraction, lesson summarization, course search — anything that needs fast structured output.
Ollama (local): Embeddings, complex reasoning, reranking, long-context tasks — anything that needs a larger model or vector output.
Claude API: Code generation, deep analysis, multi-step reasoning, content creation — anything where quality matters more than latency.

This three-tier architecture gives you the speed of on-device inference, the depth of local large models, and the intelligence of frontier cloud models. Each tier handles what it does best.

Python SDK

Apple also provides a Python SDK for Foundation Models on macOS:

pip install apple-fm-sdk

import apple_fm_sdk as fm
import asyncio

async def main():
    model = fm.SystemLanguageModel()
    available, reason = model.is_available()
    if available:
        session = fm.LanguageModelSession()
        response = await session.respond("What is an agentic loop?")
        print(response)

asyncio.run(main())

The Python SDK is useful for evaluation, testing, and integration with Python-based ML pipelines. It accesses the same on-device model as the Swift framework. We use it in our brain infrastructure alongside our existing embedding pipeline.

Custom Adapter Training

For specialized use cases, Apple provides a LoRA adapter training toolkit. You can fine-tune the on-device model with your own data to create a domain expert — an AI that knows your product, your terminology, your style.

The training process uses Low-Rank Adaptation (LoRA), the same technique we cover in our LoRA scaling factor guide. Each adapter is approximately 160MB and loads alongside the base model at runtime.

Requirements: a Mac with Apple Silicon and at least 32GB of RAM (or Linux with a GPU), 100-5,000 training samples in JSONL format, and the Foundation Models adapter training toolkit from Apple.

We are training a custom adapter from our 53 academy courses — over 500 lessons of AI education content. The goal is an on-device model that is an expert in AI architecture, MCP, agentic systems, and persistent memory — specialized knowledge that the base 3B model does not have.

App Intents: Making Your App Siri-Ready

Apple deprecated SiriKit at WWDC 2026. App Intents is now the only way Siri talks to your app. Without App Intents, your app is invisible in an AI-first operating system.

The integration has two parts: entity schemas (what your app contains) and intent schemas (what users can do).

import AppIntents

struct CourseEntity: AppEntity {
    static let typeDisplayRepresentation = TypeDisplayRepresentation(name: "Course")
    static let defaultQuery = CourseQuery()

    var id: String
    var title: String

    var displayRepresentation: DisplayRepresentation {
        DisplayRepresentation(title: "\(title)")
    }
}

struct SearchCoursesIntent: AppIntent {
    static let title: LocalizedStringResource = "Search Courses"
    static let openAppWhenRun = true

    @Parameter(title: "Search query")
    var query: String

    func perform() async throws -> some IntentResult & ProvidesDialog {
        // Search logic here
        .result(dialog: "Found 3 courses matching your search.")
    }
}

Once your app exposes entities and intents, Siri can find your content, users can create Shortcuts automations, and Spotlight indexes your data. This is not optional in iOS 27 — it is foundational to how users will discover and interact with apps.

Private Cloud Compute

For tasks that exceed the on-device model's capabilities, Private Cloud Compute provides access to larger Apple Foundation Models running on Apple's secure servers. The privacy guarantees are remarkable: your data is processed but never stored, never accessible to Apple, and never used for training.

If you are enrolled in the App Store Small Business Program with fewer than 2 million first-time downloads, Private Cloud Compute access is free. This means small developers get cloud-scale AI with no API costs — a significant advantage over cloud providers that charge per token.

Building with Both Apple and Claude

Apple Intelligence and Claude are not competitors in your architecture — they are complementary layers. Apple handles the fast, private, structured tasks. Claude handles the deep, nuanced, long-context tasks. Together they create AI applications that are both responsive and intelligent.

This is the architecture we run at Like One: Apple Foundation Models for on-device inference in our Vapor server and iOS app, Ollama for local embeddings and reasoning, and the Claude API for content generation and complex analysis. Three tiers, each doing what it does best, all running on a single M3 Mac.

For more on the Claude side of this architecture, read our Agent SDK tutorial. For building the retrieval layer that connects them, see our RAG guide. And if you want to certify your skills across both Apple and Claude architectures, check our CCA exam prep guide.

Need Help Integrating Apple Intelligence?

From Foundation Models to App Intents to hybrid architectures — our consulting team builds AI-native iOS and macOS applications. We ship code, not slide decks.