Skip to content

Cleanup Help

The Cleanup process puts a lock on all Source hub tables when it starts running to avoid data issues.

If a source is processing when Cleanup starts, it is skipped and retried next run. Repeated skips lead to ever-increasing storage costs. This most commonly happens when source schedules overlap with Cleanup. Skipped source IDs appear in the Cleanup process logs — use these to identify conflicts.

DataForge recommends a 1-hour daily window with no sources scheduled, and sources starting no later than 30 minutes before Cleanup. This gives in-flight sources time to finish. Adjust Schedules to eliminate overlap.

If schedules can't be adjusted due to SLAs, run Cleanup manually from the Service Configurations page.

Run Cleanup at least once per week — it removes unneeded datalake objects and controls cloud costs. Sources that haven't had Cleanup run in 7 days show a garbage can icon in their status.