Cut Cloud Costs for Real-Time ML on Mobile Apps by ~50% with NimbleEdge

Siddharth Mittal, Nilotpal Pathak
Published on
May 21, 2024

Increasing adoption of AI and machine learning

In recent years, there has been a rapid surge in the adoption of Artificial Intelligence (AI) and Machine Learning (ML) in enterprises, with organizations implementing these technologies to enhance user experience as well as optimize operations. IBM estimates in their Global AI Adoption Index Report 2023 that ~35% of enterprises have already deployed AI in at least one business function, while another ~40% are actively exploring adoption. Adoption has been especially prominent in digital-first verticals such as E-commerce, Fintech, and Gaming, where customers’ expectations of mobile app experiences have increased steadily. For instance, customers of companies in these verticals have come to anticipate features such as personalized product recommendations and search optimization in mobile apps, which rely heavily on ML. 

Lately, several enterprises have also started exploring real-time ML to provide users even more personalized experiences by generating relevant and session-aware predictions. Industry leaders such as Instacart, Doordash and Pinterest have deployed such systems at scale in recent times, and reaped significant benefits. For instance, Pinterest achieved an ~11% increase in key user engagement metrics by incorporating real-time user-product interactions in their personalized home page feed generation system. Such transformative outcomes from early adopters indicate the vast potential of real-time ML systems.

Spiraling cloud costs 

Even though ML has the potential to drive tremendous upside for enterprises, the cloud storage and compute requirements for deploying at scale are often enormous. This in turn leads to correspondingly large cloud bills, which can dampen ROI. Real-time ML exacerbates this issue, as its requirement for cloud compute resources is significantly larger than batch prediction systems. The cost of vast computational resources required for training and deploying intricate real-time ML models, and for real-time data processing can often be prohibitively expensive.

Hence, enterprises face a challenging problem: How can organizations deliver cutting-edge, hyper-personalized experiences to users without facing exorbitant cloud bills? 

NimbleEdge provides an elegant solution by deploying ML workloads on mobile edge devices.


Typical cloud-based real-time ML ecosystem

Typical cloud-based real-time ML pipeline architecture

A cloud-based real-time ML ecosystem consists of the following major components:

  1. Stream processing: Event streaming platforms, such as Kafka, collect real-time events and publish those to relevant topics (e.g. products viewed/search queries in an e-commerce browsing session). These streams are passed through a stream processing engine, such as Flink, which extracts relevant real-time features for the ML model from this data
  1. Feature Stores: These databases store a) real-time features (e.g. product views, session duration) as well as b) batch features (e.g. user demographics, historical purchase data), and require rapid computational capabilities to minimize latency
  1. Inference platform: Online inference platforms, usually hosted as microservices, serve ML models which return relevant predictions based on near real-time inputs from the aforementioned feature stores

The components mentioned above are the key drivers of cloud costs in real-time ML. Our comprehensive assessment of cloud pipelines across various customers indicates that over 40% of cloud expenses in most real-time ML use-cases are driven by the need to manipulate real-time user engagement data, such as click-streams, product views and additions to cart. Contribution from real-time user engagement data may often be even higher, as exemplified by Alibaba’s recommender systems on edge, where >90% of the features are based on real-time user-product interactions.  

How NimbleEdge helps cut cloud costs of real-time ML 

NimbleEdge offers an effective solution to reduce cloud costs of real-time ML in mobile apps by leveraging users’ mobile phones as edge devices for processing and storage:

  1. Stream processing: Real-time, session-based features, such as product-user interactions serve as the basis for about half of all features in most real-time ML systems. NimbleEdge SDK enables processing of this data on users’ mobile devices, thereby reducing the volume of data handled by cloud-based processors such as Spark, Flink, and batch ETLs. As a result, using edge devices for this processing reduces the cost of stream processing by 40-50%
    (Source: NimbleEdge customer conversations, proprietary research)
  1. Feature Stores: NimbleEdge provides an Edge Feature Store that manages both a) on-device real-time streaming features (e.g. product user interactions) and b) global batch features from the cloud (e.g. time of day, user demographics), which together serve as the basis for the ML prediction engine. This leads to a significant reduction in the volume of read and write requests sent to cost intensive, cloud-based feature stores (e.g. Amazon SageMaker Feature Store). As a result, NimbleEdge helps customers reduce cost of feature stores for real-time ML by over 60%.
    (Source: NimbleEdge customer conversations, proprietary research)
  1. Inference platform: Devices themselves, powered by the NimbleEdge SDK, act as endpoints for delivering predictions, eliminating the need for additional cloud -based prediction and backend services that constitute inference platforms. This helps customers of NimbleEdge reduce the cloud cost of ML prediction and backend orchestration services by 90-100%.
    (Source: NimbleEdge customer conversations, proprietary research)

A Comparison: Cloud vs NimbleEdge

Illustrative cost breakdown of a pure cloud based system vs a NimbleEdge based system

Cumulatively, using edge devices for storage and processing across stream processing, feature stores, and inference platform results in ~50% reduction in overall cloud costs associated with real-time ML, as demonstrated in the illustrative schematic above (based on NimbleEdge’s work with a major e-commerce client).

If you are also struggling with the exorbitant cost of real-time cloud-based ML, please get in touch with us at We would be delighted to assist you in your journey to deploying real-time ML using edge. In case you have any thoughts or queries regarding this article, please also feel free to reach out to us at the email address above. 

Get the full access to the Case study
Download Now

Table of Content

Subscribe to newsletter

Thank you!
Your submission has been received!
Oops! Something went wrong while submitting the form.
Use Cases

Leverage the Intelligent Edge for Your Industry


Betterment in transaction success rate through hyper-personalized fraud detection
Fraud detection models that try to flag fraudulent transactions (applies to all the FinTech apps)
Speed & Reliability issues with transactions in non-real time ML systems on the cloud limit personalization levels, as it operates with Huge Costs of running Real-Time ML systems on the Cloud
Read Use Case


Increase in models’ performances lead to a rise in Conversion carts with higher order size
Search & Display recommendation models for product discovery for new and repeat orders Personalized offers and pricing
The non Real-time/Batch ML processing doesn't serve highly fluctuating or impulsive customer interests. Organizations need real-time ML systems but it is impossible to implement and scale them on the cloud with even five times the average cloud cost.
Read Use Case


See uplifts in game retention metrics like gaming duration, completion, game cross-sells and LTV
Contest SelectionMatchmaking and Ranking Cross-contests recommendationPersonalized offers and pricing
As a result of cloud’s limited infrastructure in providing scalability with respect to ML model deployments and processing in real-time, gaming apps adopt non real-time/batch processing that negatively affects click-through rates, game duration, completion, cross-sells, and lifetime value of players.
Read Use Case


Savings in the privacy budget with privacy preserving encryption algorithms
Personalized Search recommendations (Exercises, Nutrition, Services, Products)
User engagement metrics, customer acquisition and retention, NPS, and other business app metrics suffer. On-device/Edge processing can be a great solution but the data processing capacity is inherently limited due to resource constraints of edge devices.
Read Use Case

Travel & Stay

Increase in average booking value with new and repeat customers with higher NPS & savings in cost of acquisition
Travel & Stay
Search/Service recommendation models  + Personalized offers and pricing
NimbleEdge’s HOME runs real-time ML - Inference & Training - on-device, ensuring performance uplifts in Search/Service recommendation and Personalized offers/pricing models at 1/5th of the cost to run them on the cloud.
Read Use Case