Build¶
Design, configure, and operate data pipelines in DataForge.
Pipeline building blocks¶
- Sources — define inputs and parsing rules
- Connections — Salesforce, JDBC, Kafka, Unity Catalog, and more
- Processing — ingestion queue, job runs, processing queue, workflow queue
- Outputs — write to Snowflake, Databricks, and other destinations
Configuration¶
- Configuration overview — cluster, process, and system settings
- Cluster Configuration — compute sizing and types
- Process Configuration — per-process defaults
- System Configuration — global system settings
Reusable patterns¶
- Cloning — templated multi-tenant setups
- Templates and Tokens — reusable rule and mapping logic
Scheduling and lineage¶
- Schedules — when ingestion runs
- Lineage — visualize how data flows
- Projects — group sources and outputs into projects
Advanced¶
- SDK — custom processing in Python, Scala, or notebooks
- Talos AI — AI-powered data assistant
- Agents — remote agents for on-prem ingestion
- Users and Access — user and role administration