Foundry Local Android

Overview

Foundry Local for Android, developed in partnership with Microsoft, creates a secure, dedicated, and flexible LLM server application that runs directly on Android devices. This architecture enables multiple applications to utilize a single, locally hosted model, which drastically improves privacy, guarantees consistent performance, and minimizes redundant resource usage. The system centralizes the model runtime into a single LLM server that any client application can connect to, avoiding the common pattern where each app ships its own runtime and model. Combined with DeliteAI, it provides context engineering, structured session memory, tool-calling capabilities, and multi-step agentic workflows—all running entirely on-device.

Key Highlights

  • Centralized LLM server for multiple applications
  • Eliminates redundant memory usage
  • Consistent privacy boundary with on-device processing
  • DeliteAI integration for agentic workflows

Demo Video

Play

Features

Foundry Local Android Application (FLAA)

Runs as the on-device LLM server, hosting the model runtime, managing the model lifecycle, and handling inference requests from multiple client applications.

Foundry Local Android SDK (FLAS)

Integrated into client applications, abstracting all AIDL and IPC complexity, allowing developers to load and interact with on-device models through a simple, high-level API.

DeliteAI Integration

Acts as the bidirectional intelligence layer, maintaining structured context, interpreting model responses, detecting tool-calling intent, and invoking Kotlin functions through an efficient JNI bridge.

Multi-Step Agentic Workflows

Enables complex reasoning, session memory, and tool-driven interactions that allow models to perform multi-step tasks without depending on cloud connectivity.

Resources