Skip to content

DataForge Documentation

Build

Build¶

Design, configure, and operate data pipelines in DataForge.

Pipeline building blocks¶

Sources — define inputs and parsing rules
Connections — Salesforce, JDBC, Kafka, Unity Catalog, and more
Processing — ingestion queue, job runs, processing queue, workflow queue
Outputs — write to Snowflake, Databricks, and other destinations

Configuration¶

Configuration overview — cluster, process, and system settings
Cluster Configuration — compute sizing and types
Process Configuration — per-process defaults
System Configuration — global system settings

Reusable patterns¶

Cloning — templated multi-tenant setups
Templates and Tokens — reusable rule and mapping logic

Scheduling and lineage¶

Schedules — when ingestion runs
Lineage — visualize how data flows
Projects — group sources and outputs into projects

Advanced¶

SDK — custom processing in Python, Scala, or notebooks
Talos AI — AI-powered data assistant
Agents — remote agents for on-prem ingestion
Users and Access — user and role administration