8.0.x Upgrade Guide (AWS)¶
Pre-Upgrade Process¶
DataForge¶
DataForge will no longer allow certain type casts in output column mappings directly to prevent data loss when data types are mismatched incorrectly. Reach out to DataForge support for a query to run to identify data type changes needed. These changes can be made ahead of or after the upgrade. Output processes will fail until these data type changes are made.
Databricks¶
The previous SDK is no longer supported. All custom notebooks must now use the DataForge SDK. Custom processes that do not use the DataForge SDK will fail.
Follow the DataForge SDK migration guide for more information on switching notebook and cluster references.
Terraform and Databricks¶
If you previously ran Terraform using Databricks Username and Password variables, you'll need to migrate to a Service Principal and secret. Databricks stopped supporting Databricks-managed passwords on July 10th. Follow the Updating Terraform for Databricks Authentication guide to complete this transition. If this is not completed, Terraform runs will fail to apply due to authentication issues.
Docker¶
The DataForge team will need to invite your original Docker user to a new Docker hub. You will need to accept the Docker invitation from the original email address used in Terraform or sign up for a new Docker account so the DataForge team can invite the new email address.
AWS IAM¶
For the instance profile role in IAM (named like db-instance-profile), copy the existing s3 policy and create a second s3 policy with the contents you've copied. Attach the second policy to the instance-profile role. The original s3 policy will be modified as part of the upgrade so any buckets listed outside of what DataForge manages will be wiped out.
AWS Elastic Container Service¶
DataForge recommends turning off ECS containers for API, Core, and Agent prior to starting the upgrade. The Postgres metadata database will upgrade to version 16.1. Turning off ECS containers ensures no processes will attempt to run during the database upgrade.
Upgrade Process¶
After completing the pre-upgrade steps, follow the standard upgrade guide, then proceed to post-upgrade.
Post-Upgrade Process¶
Open Cluster Configurations (System Configurations -> Cluster Configurations) and check for any clusters renamed as "CHECK DATABRICKS VERSION ...". Open each cluster that has this in the name and in the Parameters -> Cluster Configuration, change the Spark Version to "14.3.x-.scala2.12" and save the changes.
Confirm the environment is alive and working as usual. Submit a support request if something is not working as intended.