#724 Switch the behavior of Hive repair table on reruns. Do it only if explicitly asked.#725
#724 Switch the behavior of Hive repair table on reruns. Do it only if explicitly asked.#725
Conversation
…f explicitly asked.
WalkthroughA new configuration flag Changes
Sequence DiagramsequenceDiagram
participant CLI as CLI Arguments
participant CmdLine as CmdLineConfig
participant Config as RuntimeConfig
participant TaskRunner as TaskRunnerBase
participant Hive as Hive Table Operations
CLI->>CmdLine: --force-recreate-hive-tables
CmdLine->>CmdLine: Parse option to forceReCreateHiveTables
CmdLine->>Config: applyCmdLineToConfig(forceReCreateHiveTables)
Config->>Config: Set forceReCreateHiveTables flag
TaskRunner->>Config: Check runtimeConfig.forceReCreateHiveTables
alt forceReCreateHiveTables == true
TaskRunner->>Hive: createOrRefreshHiveTable(schema, date, recreate=true)
else forceReCreateHiveTables == false
TaskRunner->>Hive: createOrRefreshHiveTable(schema, date, recreate=schemaChanged)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
pramen/core/src/main/scala/za/co/absa/pramen/core/cmd/CmdLineConfig.scala (1)
141-142: Minor: Inconsistent variable naming.The local variable
forcereCreateHiveTablesuses lowercase 'r' in "recreate", while the field and constant useforceReCreateHiveTableswith uppercase 'R'. Consider aligning the casing for consistency.✏️ Suggested fix
- for (forcereCreateHiveTables <- cmd.forceReCreateHiveTables) - accumulatedConfig = accumulatedConfig.withValue(FORCE_RECREATE_HIVE_TABLES, ConfigValueFactory.fromAnyRef(forcereCreateHiveTables)) + for (forceReCreateHiveTables <- cmd.forceReCreateHiveTables) + accumulatedConfig = accumulatedConfig.withValue(FORCE_RECREATE_HIVE_TABLES, ConfigValueFactory.fromAnyRef(forceReCreateHiveTables))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pramen/core/src/main/scala/za/co/absa/pramen/core/cmd/CmdLineConfig.scala` around lines 141 - 142, Rename the local pattern variable forcereCreateHiveTables to match the casing used elsewhere (forceReCreateHiveTables) so naming is consistent; update the for-comprehension binding in CmdLineConfig (the for (...) <- cmd.forceReCreateHiveTables) and any references such as accumulatedConfig.withValue(FORCE_RECREATE_HIVE_TABLES, ConfigValueFactory.fromAnyRef(...)) to use forceReCreateHiveTables.pramen/core/src/main/scala/za/co/absa/pramen/core/app/config/RuntimeConfig.scala (1)
168-169: Minor: Extra space in assignment.There's a double space before
ConfigUtilswhich appears to be a typo.✏️ Suggested fix
maxAttempts, - forceReCreateHiveTables = ConfigUtils.getOptionBoolean(conf, FORCE_RECREATE_HIVE_TABLES).getOrElse(false) + forceReCreateHiveTables = ConfigUtils.getOptionBoolean(conf, FORCE_RECREATE_HIVE_TABLES).getOrElse(false)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pramen/core/src/main/scala/za/co/absa/pramen/core/app/config/RuntimeConfig.scala` around lines 168 - 169, Fix the minor whitespace typo in RuntimeConfig.scala by removing the extra space before the ConfigUtils call in the forceReCreateHiveTables assignment so it reads with a single space between the equals sign and the call (symbol: forceReCreateHiveTables and ConfigUtils.getOptionBoolean). Ensure no other spacing changes are introduced around that assignment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@pramen/core/src/main/scala/za/co/absa/pramen/core/runner/task/TaskRunnerBase.scala`:
- Around line 413-415: Test coverage is missing for the new recreate logic in
TaskRunnerBase: add unit tests that call the code path that reaches
task.job.createOrRefreshHiveTable and assert the boolean passed for the recreate
parameter; specifically, add one test where isRerun = true,
runtimeConfig.forceReCreateHiveTables = false and both
schemaChangesBeforeTransform and schemaChangesAfterTransform are empty and
assert createOrRefreshHiveTable was called with recreate = false, and another
where runtimeConfig.forceReCreateHiveTables = true and assert recreate = true;
target the TaskRunnerBase behavior (mock task.job and verify the
createOrRefreshHiveTable(...) call and its recreate argument) so changes to
recreate logic are validated.
---
Nitpick comments:
In
`@pramen/core/src/main/scala/za/co/absa/pramen/core/app/config/RuntimeConfig.scala`:
- Around line 168-169: Fix the minor whitespace typo in RuntimeConfig.scala by
removing the extra space before the ConfigUtils call in the
forceReCreateHiveTables assignment so it reads with a single space between the
equals sign and the call (symbol: forceReCreateHiveTables and
ConfigUtils.getOptionBoolean). Ensure no other spacing changes are introduced
around that assignment.
In `@pramen/core/src/main/scala/za/co/absa/pramen/core/cmd/CmdLineConfig.scala`:
- Around line 141-142: Rename the local pattern variable forcereCreateHiveTables
to match the casing used elsewhere (forceReCreateHiveTables) so naming is
consistent; update the for-comprehension binding in CmdLineConfig (the for (...)
<- cmd.forceReCreateHiveTables) and any references such as
accumulatedConfig.withValue(FORCE_RECREATE_HIVE_TABLES,
ConfigValueFactory.fromAnyRef(...)) to use forceReCreateHiveTables.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 22f7ec76-144f-4f47-bca9-b088bc7b0f5e
📒 Files selected for processing (6)
pramen/core/src/main/resources/reference.confpramen/core/src/main/scala/za/co/absa/pramen/core/app/config/RuntimeConfig.scalapramen/core/src/main/scala/za/co/absa/pramen/core/cmd/CmdLineConfig.scalapramen/core/src/main/scala/za/co/absa/pramen/core/runner/task/TaskRunnerBase.scalapramen/core/src/test/scala/za/co/absa/pramen/core/RuntimeConfigFactory.scalapramen/core/src/test/scala/za/co/absa/pramen/core/cmd/CmdLineConfigSuite.scala
| val hiveWarnings = if (task.job.outputTable.hiveTable.nonEmpty) { | ||
| val recreate = schemaChangesBeforeTransform.nonEmpty || schemaChangesAfterTransform.nonEmpty || task.reason == TaskRunReason.Rerun | ||
| val recreate = schemaChangesBeforeTransform.nonEmpty || schemaChangesAfterTransform.nonEmpty || runtimeConfig.forceReCreateHiveTables | ||
| task.job.createOrRefreshHiveTable(dfTransformed.schema, task.infoDate, recreate) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for tests that verify the recreate parameter value in createOrRefreshHiveTable calls
echo "=== Searching for tests that check recreate parameter ==="
rg -n "recreate" --type scala pramen/core/src/test/
echo ""
echo "=== Check JobSpy implementation for recreate handling ==="
rg -n -A5 "createOrRefreshHiveTable" pramen/core/src/test/scala/za/co/absa/pramen/core/mocks/job/JobSpy.scalaRepository: AbsaOSS/pramen
Length of output: 3190
🏁 Script executed:
#!/bin/bash
echo "=== Search for TaskRunnerBaseSuite ==="
fd -t f "TaskRunnerBaseSuite" pramen/core/src/test/
echo ""
echo "=== Search for tests that verify recreate behavior based on schema changes or rerun reason ==="
rg -n "schemaChanges|forceReCreateHiveTables|recreate.*true|recreate.*false" --type scala pramen/core/src/test/ | grep -i "task.*runner\|runner.*test" | head -30
echo ""
echo "=== Check if TaskRunnerBaseSuite exists and its content ==="
find pramen/core/src/test -name "*TaskRunner*" -type fRepository: AbsaOSS/pramen
Length of output: 812
🏁 Script executed:
#!/bin/bash
echo "=== Read TaskRunnerBaseSuite to examine test coverage ==="
wc -l pramen/core/src/test/scala/za/co/absa/pramen/core/tests/runner/task/TaskRunnerBaseSuite.scala
echo ""
echo "=== Show content with line numbers ==="
cat -n pramen/core/src/test/scala/za/co/absa/pramen/core/tests/runner/task/TaskRunnerBaseSuite.scalaRepository: AbsaOSS/pramen
Length of output: 33640
Core behavioral change looks correct, but test coverage gap should be addressed.
The change correctly implements the PR objective: Hive table recreation now occurs only when schema changes are detected OR when explicitly requested via runtimeConfig.forceReCreateHiveTables, rather than automatically on every rerun.
However, existing tests only verify that createOrRefreshHiveTable is called, not the value of the recreate parameter. Add tests that explicitly verify:
recreate = falsewhenisRerun = true,forceReCreateHiveTables = false, and no schema changesrecreate = truewhenforceReCreateHiveTables = true
This ensures the behavioral change is properly validated and prevents regressions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@pramen/core/src/main/scala/za/co/absa/pramen/core/runner/task/TaskRunnerBase.scala`
around lines 413 - 415, Test coverage is missing for the new recreate logic in
TaskRunnerBase: add unit tests that call the code path that reaches
task.job.createOrRefreshHiveTable and assert the boolean passed for the recreate
parameter; specifically, add one test where isRerun = true,
runtimeConfig.forceReCreateHiveTables = false and both
schemaChangesBeforeTransform and schemaChangesAfterTransform are empty and
assert createOrRefreshHiveTable was called with recreate = false, and another
where runtimeConfig.forceReCreateHiveTables = true and assert recreate = true;
target the TaskRunnerBase behavior (mock task.job and verify the
createOrRefreshHiveTable(...) call and its recreate argument) so changes to
recreate logic are validated.
Unit Test Coverage
Files
|
Closes #724
Summary by CodeRabbit
New Features
--force-recreate-hive-tablesCLI option to control Hive table recreation behaviorpramen.runtime.hive.force.recreate(defaults to disabled)Tests