Skip to content

Creating Relations and Rules

Relations define connections between sources (like SQL JOINs) and enable cross-source rule logic and output mappings. Rules are either Validations (boolean data quality checks) or Enrichments (computed columns using any Spark SQL expression).

Step 1: Create a relation between Customer and Orders

Open the Customer source and go to the Relations tab. Click New +.

Configure the relation:

  • Name: Customer - Orders - custkey
  • [Related] Source: select the Orders source

Enter the relation expression:

[This].c_custkey = [Related].o_custkey

[This] refers to the current source (Customer). [Related] refers to the source selected in the Related dropdown. Click Save.

After saving, DataForge calculates cardinality — 1 to Many in this case, since c_custkey is unique in Customer but not in Orders.

Step 2: Create a validation rule

Validations flag records as passed, warned, or failed based on a boolean expression.

Navigate to the Rules tab and click New +. Select type Validation.

Configure:

  • Name: Market Segment is Automobile
  • Description: Flag when Market Segment is Automobile
  • When expression is false, set to: Warn
  • Expression:
[This].c_mktsegment = 'AUTOMOBILE'

Click Save.

Tip

Type ` (backtick) in the expression editor to see all available Spark SQL functions. DataForge also supports Rule Templates for reusable rule patterns.

Step 3: Create an enrichment rule

Enrichments add computed columns, optionally referencing related sources.

Click New + on the Rules tab and configure:

  • Name: Total Orders Price
  • Description: Sum total price of orders per customer from Orders Source
  • Recalculation Mode: Snapshot (calculate only on ingestion, not retroactively)
  • Expression:
SUM([orders].o_totalprice)

The [orders] reference uses the relation created in Step 1. When referencing other sources, a relation path dropdown appears in Expression Parameters.

For more on rule parameters, see Rules and Relations.

Click Save.

Step 4: Recalculate rules

Since data was ingested before these rules existed, you need to recalculate. Go to the Inputs tab, open the triple-dot header menu, and select Recalculate.

Wait for the green checkmark to confirm completion.

Step 5: Verify the results

Open the Data View tab to query or preview the source data, which now includes your new rule columns.

Every source in DataForge has a hub table and a source view you can query directly:

Catalog/Database Schema Name
Hub table System > datalake-db-name System > datalake-schema-name hub_<source_id>
Source view System > datalake-db-name Project > Schema Name Source Settings > Hub view name

Continue to Creating Outputs.