Data ingest with fan-out
Data ingest is a critical component of streaming pipelines and other systems that receive real-time events from web pages, mobile applications, or other backend services.
Creating a reliable data ingest API gets complex quickly when each event needs to be fanned-out to multiple backend services. A common pattern consist in adding a log like Kafka, then a new consumer for each destination. This causes an explosion of complexity as each consumer needs to be operated and monitored independently. It also makes it challenging to deploy those type of systems on serverless platforms.
In this example, we look at how we can create a reliable and high-performance data ingest API in Python with Dispatch and FastAPI, where each event received can be fanned-out to a list of URLs.
Setup
If you haven’t already, make sure you have a Dispatch account and CLI installed.
Install FastAPI to set up an HTTP server. We will also install Pydantic to represent data models in our Python code.
Ingesting single events
The simplest model consist in receiving a single event at a time. FastAPI makes it simple to setup the frontend POST API, while Dispatch handles all the background processing of each event.
Here is an example applications that would achieve this:
An important aspect to take into account here is that each concurrent operation is handled by the Dispatch scheduler, regardless of the transient errors that they observe, they get retried independently.
Note that in this example we left details like API authentication out of the code snippet since they aren’t specific to using Dispatch.
This topic is more specific to FastAPI and well-documented at https://fastapi.tiangolo.com/tutorial/security/
Ingesting batches of events
High-performance systems often resort to trading latency for better throughput by grouping operations into batches. This pattern is well-supported by Dispatch as well by creating a batch object, adding function calls to it, and submitting it to the scheduler in a single, atomic step.
Building on the previous code snippet, the application could add a new POST endpoint on /v1/batch
that expects payloads encoded as JSON arrays, for example:
Note that while the submission to Dispatch is atomic in this case, each function call is still run concurrently, submitting operations as a batch does not result in dependencies nor a blocking behavior between calls.
Running the program
After saving the code to a file called main.py
, run it with the Dispatch CLI:
Then to ingest data into the system, submit a POST request to /v1/event
, for example with a curl
command:
Or to use the batch endpoint:
This short program shows how easy it is to create reliable applications with just a few lines of code with Dispatch. We haven’t had to add extra infrastructure like queues, databases, or worker pools. Integration with Dispatch is entirely driven from the application code.