12 Factor Spark Application

Time-tested principles for building robust Spark applications

Spark is a distributed data processing engine that is widely used in batch processing and stream processing platforms. Building such platforms comes with a fair share of challenges beyond those required for continuous delivery in the microservices world such as data drift, bad data, and data security.

Similar to the Twelve-Factor app that is an outstanding methodology/set of principles for building web apps, we at Sahaj have realized over time that a set of concepts/patterns can be applied to building data processing applications using frameworks like Spark. …


Out Of Home advertising has seen rapid growth in the UK. In 2019, OOH advertising revenue rose 7.6% from 2018 to £1.3bn. However, OOH, by and large, has operated as an offline channel until recently.

OOH agencies have been traditionally managing and running campaigns manually. From campaign planning to media buying, pricing, placement and tracking, every step of the process is manual. The lack of industry standards add another layer of complexity by creating information silos that in turn make measurement and accountability complicated. A wholesome reimagining of the medium has long been overdue.

Today, disruptive brands that adopt OOH…


Photo by gustavo Campos on Unsplash

There are innovative use cases that are currently driven using Data Science, and given the cheap cost of storage and cloud infrastructure, it is tempting for enterprises both big and small to start this journey. While the rest of engineering has reached a phase where Agile/Agility has become an abused term, Data Science, still needs to catch up on a lot of practices.

Here are three ways to accelerate your Data Science journey:

Iterate on the models

For any new model, start with simple, explainable models with a measure of accuracy and roll it out to your users. Improve the model in subsequent cycles…


We currently use quite a few AWS services and use AWS X-ray for distributed tracing. It has worked very well given that it provides an easy mechanism to record traces, visualise calls across services and analyse issues across distributed applications. It has SDKs for several languages which makes the integration trivial.

X-ray lets us filter traces by latency, status codes, failure/error and a lot more.

Default AWS Service map view

Trace Inbound HTTP Calls

We can easily configure the SDK middleware to intercept inbound requests. This adds basic information to the trace which includes the HTTP method, timing and a few more HTTP headers. The segment name can be…

Anay Nayak

Solution Consultant at https://sahaj.ai/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store