Skip to content

Getting Familiar with the Data

The first step is to explore the TPC-H sample dataset you'll use throughout this tutorial. You'll work with two tables — Customer and Orders — and notice that both share a c_custkey / o_custkey column for joining.

If your workspace uses Databricks, follow the steps below. If your workspace uses Snowflake, skip to Snowflake steps.

Databricks

Step 1: Open the Databricks Catalog Explorer

From the DataForge main menu, select Databricks to launch the attached workspace. Then select Catalog in the left-hand menu.

Step 2: Query the TPCH tables

Expand Samples > TPCH, click the Customer table, then select Create > Query to open a new SQL editor.

Attach a SQL Warehouse if prompted, then run the query to preview customer data.

Add a second query tab and run:

SELECT * FROM samples.tpch.orders

Scan both tables to get familiar with the data and column names. When ready, proceed to Setting up Connections.

Snowflake

Step 1: Open the Snowflake workspace

From the DataForge main menu, select Databricks to open the workspace, then navigate to Projects > Workspaces.

Step 2: Query the TPCH_SF1 tables

Expand SNOWFLAKE_SAMPLE_DATA > TPCH_SF1, right-click the Customer table and select Preview Table.

Add a new query tab and run:

SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.ORDERS

Scan both tables to get familiar with the data and column names. When ready, proceed to Setting up Connections.