How NimbleEdge cuts costs of real-time AI in mobile apps by >50%

Varun Khare, Nilotpal Pathak

Published on

January 16, 2025

Increasing adoption of AI and machine learning

In recent years, there has been a rapid surge in the adoption of Artificial Intelligence (AI) and Machine Learning (ML) in enterprises, with organizations implementing these technologies to enhance user experience as well as optimize operations. IBM estimates in their Global AI Adoption Index Report 2023 that ~35% of enterprises have already deployed AI in at least one business function, while another ~40% are actively exploring adoption. Adoption has been especially prominent in digital-first verticals such as E-commerce, Fintech, and Gaming, where customers’ expectations of mobile app experiences have increased steadily. For instance, customers of companies in these verticals have come to anticipate features such as personalized product recommendations and search optimization in mobile apps, which rely heavily on ML.

Lately, several enterprises have also started exploring real-time ML to provide users even more personalized experiences by generating relevant and session-aware predictions. Industry leaders such as Instacart, Doordash and Pinterest have deployed such systems at scale in recent times, and reaped significant benefits. For instance, Pinterest achieved an ~11% increase in key user engagement metrics by incorporating real-time user-product interactions in their personalized home page feed generation system. Such transformative outcomes from early adopters indicate the vast potential of real-time ML systems.

‍

Spiraling cloud costs

Even though ML has the potential to drive tremendous upside for enterprises, the cloud storage and compute requirements for deploying at scale are often enormous. This in turn leads to correspondingly large cloud bills, which can dampen ROI. Real-time ML exacerbates this issue, as its requirement for cloud compute resources is significantly larger than batch prediction systems. The cost of vast computational resources required for training and deploying intricate real-time ML models, and for real-time data processing can often be prohibitively expensive.

Hence, enterprises face a challenging problem: How can organizations deliver cutting-edge, hyper-personalized experiences to users without facing exorbitant cloud bills?

NimbleEdge provides an elegant solution by deploying ML workloads on mobile edge devices.

Typical cloud-based real-time ML ecosystem

*Typical cloud-based real-time ML pipeline architecture*

‍

A cloud-based real-time ML ecosystem consists of the following major components:

Stream processing: Event streaming platforms, such as Kafka, collect real-time events and publish those to relevant topics (e.g. products viewed/search queries in an e-commerce browsing session). These streams are passed through a stream processing engine, such as Flink, which extracts relevant real-time features for the ML model from this data‍

Feature Stores: These databases store a) real-time features (e.g. product views, session duration) as well as b) batch features (e.g. user demographics, historical purchase data), and require rapid computational capabilities to minimize latency

Inference platform: Online inference platforms, usually hosted as microservices, serve ML models which return relevant predictions based on near real-time inputs from the aforementioned feature stores

The components mentioned above are the key drivers of cloud costs in real-time ML. Our comprehensive assessment of cloud pipelines across various customers indicates that over 40% of cloud expenses in most real-time ML use-cases are driven by the need to manipulate real-time user engagement data, such as click-streams, product views and additions to cart. Contribution from real-time user engagement data may often be even higher, as exemplified by Alibaba’s recommender systems on edge, where >90% of the features are based on real-time user-product interactions.

‍

How NimbleEdge helps cut cloud costs of real-time ML

NimbleEdge offers an effective solution to reduce cloud costs of real-time ML in mobile apps by leveraging users’ mobile phones as edge devices for processing and storage:

Stream processing: Real-time, session-based features, such as product-user interactions serve as the basis for about half of all features in most real-time ML systems. NimbleEdge SDK enables processing of this data on users’ mobile devices, thereby reducing the volume of data handled by cloud-based processors such as Spark, Flink, and batch ETLs. As a result, using edge devices for this processing reduces the cost of stream processing by 40-50%
‍(Source: NimbleEdge customer conversations, proprietary research)

Feature Stores: NimbleEdge provides an Edge Feature Store that manages both a) on-device real-time streaming features (e.g. product user interactions) and b) global batch features from the cloud (e.g. time of day, user demographics), which together serve as the basis for the ML prediction engine. This leads to a significant reduction in the volume of read and write requests sent to cost intensive, cloud-based feature stores (e.g. Amazon SageMaker Feature Store). As a result, NimbleEdge helps customers reduce cost of feature stores for real-time ML by over 60%.
(Source: NimbleEdge customer conversations, proprietary research)

Inference platform: Devices themselves, powered by the NimbleEdge SDK, act as endpoints for delivering predictions, eliminating the need for additional cloud -based prediction and backend services that constitute inference platforms. This helps customers of NimbleEdge reduce the cloud cost of ML prediction and backend orchestration services by 90-100%.
(Source: NimbleEdge customer conversations, proprietary research)

‍

A Comparison: Cloud vs NimbleEdge

*Illustrative cost breakdown of a pure cloud based system vs a NimbleEdge based system*

Cumulatively, using edge devices for storage and processing across stream processing, feature stores, and inference platform results in ~50% reduction in overall cloud costs associated with real-time ML, as demonstrated in the illustrative schematic above (based on NimbleEdge’s work with a major e-commerce client).

If you are also struggling with the exorbitant cost of real-time cloud-based ML, get in touch with us at sales@nimbleedgehq.ai. We would be delighted to assist you in your journey to deploying real-time ML using edge. In case you have any thoughts or queries regarding this article, please also feel free to reach out to us at the email address above.

Get the full access to the Case study

Download Now

Table of Content

SOLUTIONS

Unleash the power of personalized, real-time AI on device

LEARN MORE:

E-Commerce

E-COMMERCE

Read your users' mind with personalized, truly real-time GenAI augmented search, copilot and recommendations

Boost conversion and average order value by delivering tailored, GenAI powered user experiences, that adapt in real-time based on user behavior

Nimble Edge Use Cases Graphical Representation

LEARN MORE:

Gaming

Elevate gamer experience with GenAI augmented copilot and real-time personalized recommendations

Improve gamer engagement and cut dropoff with GenAI driven experince, personalized to to incorporate in-session user behavior

LEARN MORE:

MEDIA & Entertainment

Deliver engaging user experiences with real-time GenAI driven co-pilot, search and recommendations

Optimize content discovery using GenAI, with highly personalized user experiences that adapt to in-session user interactions

Use Cases

Leverage the Intelligent Edge for Your Industry

Fintech

Betterment in transaction success rate through hyper-personalized fraud detection

Fintech

Fraud detection models that try to flag fraudulent transactions (applies to all the FinTech apps)

Speed & Reliability issues with transactions in non-real time ML systems on the cloud limit personalization levels, as it operates with Huge Costs of running Real-Time ML systems on the Cloud

Read Use Case

E-Commerce

Increase in models’ performances lead to a rise in Conversion carts with higher order size

E-Commerce

Search & Display recommendation models for product discovery for new and repeat orders Personalized offers and pricing

The non Real-time/Batch ML processing doesn't serve highly fluctuating or impulsive customer interests. Organizations need real-time ML systems but it is impossible to implement and scale them on the cloud with even five times the average cloud cost.

Read Use Case

Gaming

See uplifts in game retention metrics like gaming duration, completion, game cross-sells and LTV

Gaming

Contest SelectionMatchmaking and Ranking Cross-contests recommendationPersonalized offers and pricing

As a result of cloud’s limited infrastructure in providing scalability with respect to ML model deployments and processing in real-time, gaming apps adopt non real-time/batch processing that negatively affects click-through rates, game duration, completion, cross-sells, and lifetime value of players.

Read Use Case

Healthcare

Savings in the privacy budget with privacy preserving encryption algorithms

Healthcare

Personalized Search recommendations (Exercises, Nutrition, Services, Products)

User engagement metrics, customer acquisition and retention, NPS, and other business app metrics suffer. On-device/Edge processing can be a great solution but the data processing capacity is inherently limited due to resource constraints of edge devices.

Read Use Case

Travel & Stay

Increase in average booking value with new and repeat customers with higher NPS & savings in cost of acquisition

Travel & Stay

Search/Service recommendation models + Personalized offers and pricing

NimbleEdge’s HOME runs real-time ML - Inference & Training - on-device, ensuring performance uplifts in Search/Service recommendation and Personalized offers/pricing models at 1/5th of the cost to run them on the cloud.

Read Use Case

How NimbleEdge cuts costs of real-time AI in mobile apps by >50%

Increasing adoption of AI and machine learning

Spiraling cloud costs

Typical cloud-based real-time ML ecosystem

How NimbleEdge helps cut cloud costs of real-time ML

A Comparison: Cloud vs NimbleEdge

Table of Content

Unleash the power of personalized, real-time AI on device

Related Blogs

Mobile App SDK Size Reduction Techniques

Writing Maintainable JNI For Android Devs: Part I

Running AI On-device with a Lean, Performant Python Stack

Leverage the Intelligent Edge for Your Industry

Fintech

E-Commerce

Gaming

Healthcare

Travel & Stay