Skip to content

Data Storage

Data moves through four storage layers as it progresses through the logical data flow.

mceclip5.png

Raw Data

During Ingest and Parse, data is copied into cloud storage (S3, ADLS, or Google Storage) in its original form. No transformations occur — the data is simply made accessible for downstream processing.

Data Lake

After Change Data Capture (CDC), data lands in the core processing location. The Data Lake ensures all data is in a uniform format with appropriate metadata so subsequent steps can focus strictly on processing.

Data Hub

Each source has a hub table representing the final processed data — one table per source, automatically generated and maintained by DataForge. Hub tables can be queried directly in Databricks or Snowflake and serve as an exposure point for data exploration. For use cases that only require exploration or data sharing, this may be the end of the flow.

Data Warehouse

The output layer for curated, cleansed data intended for enterprise reporting and analytics. Controlled by configured output mappings, the data warehouse typically resides on an external storage technology (SQL Server, Delta Lake, Snowflake, etc.).

The Outputs Mapping tab in the UI dictates how hub table data maps to the final warehouse location.