Estuary

The Real-time Data Landscape

There’s been major innovation throughout the entire real-time data landscape over the last few years. These are the major players, and how they fit together.

The Real-time Data Landscape
Share this article

Welcome to the 2023 Estuary Real-time Data landscape.  Want to get started with real-time insights and data products?  Here are the tools and how they fit together.

Blog Post Image

Want to get started in minutes? Try Estuary for end-to-end real-time data operations.

There has been major innovation throughout the entire real-time data landscape over the last few years.  Some of the most interesting, mature companies have emerged on the analytics side, but simpler, more powerful pipelines to get data from sources to destinations and enabling more companies to work with low-latency data. 

The above diagram has four sections, where hybrid denotes an open-source product that’s being provided as a managed service.

Capture

Extracting data from source systems.  For the real-time landscape, most systems are technologies like databases (using the write-ahead-log) and streams since most SaaS APIs are batch in nature.

Some SaaS APIs do support streaming.  For example, Salesforce has a streaming endpoint.

Transport

Moving data from point A to B. The de facto standard here is Kafka, but there are some emerging options – almost all require engineers, maintenance, and infrastructure.

Streaming transport is complex and doesn’t usually retain historical data.  For this reason, most streaming systems can be viewed as a “buffer” of current events.  Notable exceptions here are Pulsar, Gazette, and Estuary.

Operational Transforms

An in-pipeline transformation that one uses to massage data before getting it to either your production systems (as a data product) or analytics environment.  

Operational transforms in real-time systems usually come with some gotchas – calculating things like “lifetime customer value” can be very difficult because doing so requires state which grows without bounds in streaming systems. They are extremely important though since they get data into the right “shape” for analytics queries.

Analytic Transforms

The real-time equivalent of a data warehouse.  These are systems that can be loaded in real-time and provide up to the second answers for queries as you ask them. 

Note:

The diagram is oversimplified, and many companies straddle two or more areas.  For example, we at Estuary do offer Operational Transforms because we believe a pipeline needs to be end-to-end, but our logo is in the area that most people associate us.

Products Offered as SaaS Solutions

Company & ProductSolutionBackground
EstuaryCapture, Transport & Operational Transforms Easily capture data from systems using CDC (change data capture), transport, transform it in motion, and sync it where you want it, such as analytics or operational systems.
AblyTransportSimple transportation layer for events.
Amazon KinesisTransportAmazon’s Pub/Sub system.  Manages events produced by one system and subscribed to by another (or pub/sub). 
Azure Web PubSubTransportMicrosoft Azure’s Pub/Sub system.  
ArcionCaptureLow latency captures from databases using CDC.
BytewaxOperational TransformsBytewax makes it turnkey to transform streaming data using Python.
ClickhouseAnalytic TransformsReal-time SQL transforms on Clickhouse by the team that created it.
ConfluentTransport & Operational TransformsThe original company behind Kafka with a core business model of managing Kafka.
DatacaterTransport & Operational TransformsManaged Kafka stream to python transformations.
DecodableCapture & Operational TransformsCapture using managed Debezium and transform using managed Apache Flink.
DeltastreamAnalytic & Operational TransformsManaged service for analytic and operational transforms.
FireboltAnalytic TransformsReal-time analytic transforms using an improved version of managed Clickhouse.
ImplyAnalytic TransformsReal-time analytic transforms using managed Druid by the team that created it.
IOblendOperational TransformsManaged Spark
MaterializeAnalytic TransformsReal-time analytic transforms using open source SQL built on top of Timely Dataflow.
Memphis.devTransportSimple but powerful transport layer.
MeroxaCapture & Operational TransformsCapture and transform real-time data.
Google Cloud Pub/SubCaptureGoogle’s Pub/Sub system.
Google Cloud DataflowTransformManaged Apache Beam, allowing you to coordinate batch and streaming transforms using your favorite transformation system.
Oracle Golden GateCaptureCapture data from Oracle systems using their managed, proprietary product.
RocksetAnalytic TransformsSQL transformations in real-time by the creators of RocksDB.
RedpandaTransportTransport data using the Kafka protocol and a full rewrite of Kafka for greater efficiency.
SinglestoreAnalytic TransformsSQL transformations in real-time.
StartreeAnalytic TransformsSQL transformations in real-time built on top of managed Apache Pinot.
StreamnativeTransportManaged Apache Pulsar.
StreamsetsCapture & Operational TransformsCapture and transform data through a GUI.
StriimCapture & Operational TransformsCapture data from databases using managed CDC and transform it in motion.
UpsolverOperational TransformsTransform micro-batches using SQL.
QuixOperational TransformsReal-time Python transformations.
TimeplusAnalytic TransformsSQL-based analytic transforms on time series data.
TinybirdCapture, operational & Analytic TransformsManaged Clickhouse for the easy creation of real-time data APIs and analytics.  Some sources are available to capture from out of the box.

 

Open-Source Frameworks

ProjectSolutionBackground
Apache BeamOperational TransformsA framework that allows you to transform data from both batch and streaming systems.
Apache DruidAnalytic TransformsA real-time analytics engine that quickly indexes streaming data allowing for efficient, high-scale queries.
Apache FlinkOperational TransformsA stream processing framework that is natively event-based.
Apache KafkaTransportA highly popular streaming system built by Linkedin.
Apache PinotAnalytic TransformsA real-time analytics engine that offers real-time SQL queries on high-scale streaming data.
Apache PulsarTransportA streaming system that has native cloud storage options.
Apache SparkOperational TransformsA stream processing framework that is natively batch-based and expanded to near real-time micro-batches.
ClickhouseAnalytic TransformsA real-time analytics engine that offers real-time SQL queries on high-scale streaming data.
DebeziumCaptureA framework for capturing data from databases in real-time using their write-ahead-log.
FlowCapture, Transport & Operational TransformsAn end-to-end system that supports capturing data from databases in real-time using their write-ahead-log, transporting it, transforming it, and materializing into destination systems.
GazetteTransportA streaming system that natively stores data in cloud storage enabling unlimited lookback and direct reads by batch systems.

Start streaming your data for free

Build a Pipeline

About the author

Picture of David Yaffe
David YaffeCEO

David Yaffe is a co-founder and the CEO of Estuary. He previously served as the COO of LiveRamp and the co-founder / CEO of Arbor which was sold to LiveRamp in 2016. He has an extensive background in product management, serving as head of product for Doubleclick Bid Manager and Invite Media.

Popular Articles

Streaming Pipelines.
Simple to Deploy.
Simply Priced.
$0.50/GB of data moved + $.14/connector/hour;
50% less than competing ETL/ELT solutions;
<100ms latency on streaming sinks/sources.