Foundry Local for Android, developed in partnership with Microsoft, creates a secure, dedicated, and flexible LLM server application that runs directly on Android devices. This architecture enables multiple applications to utilize a single, locally hosted model, which drastically improves privacy, guarantees consistent performance, and minimizes redundant resource usage. The system centralizes the model runtime into a single LLM server that any client application can connect to, avoiding the common pattern where each app ships its own runtime and model. Combined with DeliteAI, it provides context engineering, structured session memory, tool-calling capabilities, and multi-step agentic workflows—all running entirely on-device.
Runs as the on-device LLM server, hosting the model runtime, managing the model lifecycle, and handling inference requests from multiple client applications.
Integrated into client applications, abstracting all AIDL and IPC complexity, allowing developers to load and interact with on-device models through a simple, high-level API.
Acts as the bidirectional intelligence layer, maintaining structured context, interpreting model responses, detecting tool-calling intent, and invoking Kotlin functions through an efficient JNI bridge.
Enables complex reasoning, session memory, and tool-driven interactions that allow models to perform multi-step tasks without depending on cloud connectivity.