Using Multiple Languages in a Databricks Notebook¶

Working With Multiple Languages¶

Custom processes require a DataForge SDK-supported language, but Databricks notebooks natively support Scala, Python, R, and SQL — all usable together. For example, you can run ML preprocessing in Python or R and then ingest the result into DataForge using Scala.

Begin each cell with the language magic command %<language>.

Reference data across languages using Temporary Views: create a GlobalTempView in one cell and read it with spark.table("global_temp.<name>") in the next.

Cell 1:

%python

#Creating a dataframe, Dataset would be referencing an existing dataset
pyDF = spark.createDataFrame(Dataset, schema)

#Turn dataframe into Temp View to use across languages
pyDF.createOrReplaceGlobalTempView('pyDF_temp')

Cell 2:

%scala

//Reference temp view from Python cell into Scala cell df1 dataframe for further use
val df1 = spark.table("global_temp.pyDF_temp")