Skip to content

Creating Sources

Sources manage the full data flow — ingestion through output. In this step you'll create two sources from connection metadata, configure their refresh settings, and run a data pull.

Step 1: Create sources from connection metadata

Navigate to Connections, open the Sample Datasets connection, and go to the Connection Metadata tab.

Search for TPCH (or TPCH_SF1 on Snowflake). Check the boxes for customer and orders.

Open the triple-dot menu and select Create Source(s).

In the dialog:

  • Change the naming pattern to ${table}
  • Toggle Initiate Data Pull on

Click OK. Navigate to the Sources page — both sources will begin ingesting and processing.

Step 2: Configure the Customer source

Open the Customer source and go to the Settings tab. Most fields are pre-populated from connection metadata.

Key settings to know:

Setting Purpose
Process Config Which compute configuration to use (details)
Connection Type Type of data source (Table, File, API, etc.)
Source Query Query to run against the source system
Data Refresh How to handle incremental data (Full, Key, Timestamp, etc.)
Schedule When to initiate new data pulls

For full details, see Source Settings.

Change Data Refresh to Key and set the Key Column to c_custkey.

Click Save. A popup will prompt you to reset CDC — select Save Changes & Reset to reprocess with the new refresh type.

Step 3: Pull new data into Customer

On the Inputs tab, click Pull Now to ingest fresh data.

The input progresses through Ingestion → CDC → Enrichment → Refresh → Recalculation. A green checkmark appears when complete.

Step 4: Configure the Orders source

Open the Orders source and change Data Refresh to Key with Key Column o_orderkey.

Save and select Save Changes & Reset CDC. Data already exists in this source, so no additional pull is needed.

Continue to Creating Relations and Rules.