Skip to content

#724 Switch the behavior of Hive repair table on reruns. Do it only if explicitly asked.#725

Open
yruslan wants to merge 1 commit intomainfrom
feature/724-hive-re-create-only-on-schema-change
Open

#724 Switch the behavior of Hive repair table on reruns. Do it only if explicitly asked.#725
yruslan wants to merge 1 commit intomainfrom
feature/724-hive-re-create-only-on-schema-change

Conversation

@yruslan
Copy link
Collaborator

@yruslan yruslan commented Mar 20, 2026

Closes #724

Summary by CodeRabbit

  • New Features

    • Added --force-recreate-hive-tables CLI option to control Hive table recreation behavior
    • New runtime configuration property pramen.runtime.hive.force.recreate (defaults to disabled)
  • Tests

    • Added test coverage for new CLI option

@coderabbitai
Copy link

coderabbitai bot commented Mar 20, 2026

Walkthrough

A new configuration flag forceReCreateHiveTables is introduced to control Hive table re-creation behavior independently of the rerun trigger. The flag propagates through CLI arguments, runtime configuration, and ultimately determines whether tables are forcefully re-created in the task runner.

Changes

Cohort / File(s) Summary
Configuration & CLI
pramen/core/src/main/resources/reference.conf, pramen/core/src/main/scala/za/co/absa/pramen/core/app/config/RuntimeConfig.scala, pramen/core/src/main/scala/za/co/absa/pramen/core/cmd/CmdLineConfig.scala
Added new boolean configuration property pramen.runtime.hive.force.recreate (default false), integrated into RuntimeConfig as public field forceReCreateHiveTables, and exposed via CLI option --force-recreate-hive-tables.
Task Execution Logic
pramen/core/src/main/scala/za/co/absa/pramen/core/runner/task/TaskRunnerBase.scala
Modified table recreation decision logic to depend on runtimeConfig.forceReCreateHiveTables instead of TaskRunReason.Rerun trigger, allowing schema-based decisions to take precedence.
Test Support & Coverage
pramen/core/src/test/scala/za/co/absa/pramen/core/RuntimeConfigFactory.scala, pramen/core/src/test/scala/za/co/absa/pramen/core/cmd/CmdLineConfigSuite.scala
Extended test factory to accept the new parameter and added test case validating CLI option parsing and propagation to runtime config.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI Arguments
    participant CmdLine as CmdLineConfig
    participant Config as RuntimeConfig
    participant TaskRunner as TaskRunnerBase
    participant Hive as Hive Table Operations

    CLI->>CmdLine: --force-recreate-hive-tables
    CmdLine->>CmdLine: Parse option to forceReCreateHiveTables
    CmdLine->>Config: applyCmdLineToConfig(forceReCreateHiveTables)
    Config->>Config: Set forceReCreateHiveTables flag
    TaskRunner->>Config: Check runtimeConfig.forceReCreateHiveTables
    alt forceReCreateHiveTables == true
        TaskRunner->>Hive: createOrRefreshHiveTable(schema, date, recreate=true)
    else forceReCreateHiveTables == false
        TaskRunner->>Hive: createOrRefreshHiveTable(schema, date, recreate=schemaChanged)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A flag hops into being, so tiny and true,
No more forced tables on reruns we do,
Schema changes lead the way now instead,
The HMS sighs with relief, no longer in dread! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title accurately describes the main change: switching Hive table repair behavior from automatic on reruns to explicit/schema-change-based.
Linked Issues check ✅ Passed All coding requirements from issue #724 are met: removed automatic Hive re-creation on rerun, added explicit config flag, preserved schema-change detection.
Out of Scope Changes check ✅ Passed All changes are directly related to the primary objective of controlling Hive table re-creation behavior; no out-of-scope modifications detected.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/724-hive-re-create-only-on-schema-change
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
pramen/core/src/main/scala/za/co/absa/pramen/core/cmd/CmdLineConfig.scala (1)

141-142: Minor: Inconsistent variable naming.

The local variable forcereCreateHiveTables uses lowercase 'r' in "recreate", while the field and constant use forceReCreateHiveTables with uppercase 'R'. Consider aligning the casing for consistency.

✏️ Suggested fix
-    for (forcereCreateHiveTables <- cmd.forceReCreateHiveTables)
-      accumulatedConfig = accumulatedConfig.withValue(FORCE_RECREATE_HIVE_TABLES, ConfigValueFactory.fromAnyRef(forcereCreateHiveTables))
+    for (forceReCreateHiveTables <- cmd.forceReCreateHiveTables)
+      accumulatedConfig = accumulatedConfig.withValue(FORCE_RECREATE_HIVE_TABLES, ConfigValueFactory.fromAnyRef(forceReCreateHiveTables))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pramen/core/src/main/scala/za/co/absa/pramen/core/cmd/CmdLineConfig.scala`
around lines 141 - 142, Rename the local pattern variable
forcereCreateHiveTables to match the casing used elsewhere
(forceReCreateHiveTables) so naming is consistent; update the for-comprehension
binding in CmdLineConfig (the for (...) <- cmd.forceReCreateHiveTables) and any
references such as accumulatedConfig.withValue(FORCE_RECREATE_HIVE_TABLES,
ConfigValueFactory.fromAnyRef(...)) to use forceReCreateHiveTables.
pramen/core/src/main/scala/za/co/absa/pramen/core/app/config/RuntimeConfig.scala (1)

168-169: Minor: Extra space in assignment.

There's a double space before ConfigUtils which appears to be a typo.

✏️ Suggested fix
       maxAttempts,
-      forceReCreateHiveTables =  ConfigUtils.getOptionBoolean(conf, FORCE_RECREATE_HIVE_TABLES).getOrElse(false)
+      forceReCreateHiveTables = ConfigUtils.getOptionBoolean(conf, FORCE_RECREATE_HIVE_TABLES).getOrElse(false)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@pramen/core/src/main/scala/za/co/absa/pramen/core/app/config/RuntimeConfig.scala`
around lines 168 - 169, Fix the minor whitespace typo in RuntimeConfig.scala by
removing the extra space before the ConfigUtils call in the
forceReCreateHiveTables assignment so it reads with a single space between the
equals sign and the call (symbol: forceReCreateHiveTables and
ConfigUtils.getOptionBoolean). Ensure no other spacing changes are introduced
around that assignment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@pramen/core/src/main/scala/za/co/absa/pramen/core/runner/task/TaskRunnerBase.scala`:
- Around line 413-415: Test coverage is missing for the new recreate logic in
TaskRunnerBase: add unit tests that call the code path that reaches
task.job.createOrRefreshHiveTable and assert the boolean passed for the recreate
parameter; specifically, add one test where isRerun = true,
runtimeConfig.forceReCreateHiveTables = false and both
schemaChangesBeforeTransform and schemaChangesAfterTransform are empty and
assert createOrRefreshHiveTable was called with recreate = false, and another
where runtimeConfig.forceReCreateHiveTables = true and assert recreate = true;
target the TaskRunnerBase behavior (mock task.job and verify the
createOrRefreshHiveTable(...) call and its recreate argument) so changes to
recreate logic are validated.

---

Nitpick comments:
In
`@pramen/core/src/main/scala/za/co/absa/pramen/core/app/config/RuntimeConfig.scala`:
- Around line 168-169: Fix the minor whitespace typo in RuntimeConfig.scala by
removing the extra space before the ConfigUtils call in the
forceReCreateHiveTables assignment so it reads with a single space between the
equals sign and the call (symbol: forceReCreateHiveTables and
ConfigUtils.getOptionBoolean). Ensure no other spacing changes are introduced
around that assignment.

In `@pramen/core/src/main/scala/za/co/absa/pramen/core/cmd/CmdLineConfig.scala`:
- Around line 141-142: Rename the local pattern variable forcereCreateHiveTables
to match the casing used elsewhere (forceReCreateHiveTables) so naming is
consistent; update the for-comprehension binding in CmdLineConfig (the for (...)
<- cmd.forceReCreateHiveTables) and any references such as
accumulatedConfig.withValue(FORCE_RECREATE_HIVE_TABLES,
ConfigValueFactory.fromAnyRef(...)) to use forceReCreateHiveTables.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 22f7ec76-144f-4f47-bca9-b088bc7b0f5e

📥 Commits

Reviewing files that changed from the base of the PR and between 253f431 and c6a2ed6.

📒 Files selected for processing (6)
  • pramen/core/src/main/resources/reference.conf
  • pramen/core/src/main/scala/za/co/absa/pramen/core/app/config/RuntimeConfig.scala
  • pramen/core/src/main/scala/za/co/absa/pramen/core/cmd/CmdLineConfig.scala
  • pramen/core/src/main/scala/za/co/absa/pramen/core/runner/task/TaskRunnerBase.scala
  • pramen/core/src/test/scala/za/co/absa/pramen/core/RuntimeConfigFactory.scala
  • pramen/core/src/test/scala/za/co/absa/pramen/core/cmd/CmdLineConfigSuite.scala

Comment on lines 413 to 415
val hiveWarnings = if (task.job.outputTable.hiveTable.nonEmpty) {
val recreate = schemaChangesBeforeTransform.nonEmpty || schemaChangesAfterTransform.nonEmpty || task.reason == TaskRunReason.Rerun
val recreate = schemaChangesBeforeTransform.nonEmpty || schemaChangesAfterTransform.nonEmpty || runtimeConfig.forceReCreateHiveTables
task.job.createOrRefreshHiveTable(dfTransformed.schema, task.infoDate, recreate)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for tests that verify the recreate parameter value in createOrRefreshHiveTable calls

echo "=== Searching for tests that check recreate parameter ==="
rg -n "recreate" --type scala pramen/core/src/test/

echo ""
echo "=== Check JobSpy implementation for recreate handling ==="
rg -n -A5 "createOrRefreshHiveTable" pramen/core/src/test/scala/za/co/absa/pramen/core/mocks/job/JobSpy.scala

Repository: AbsaOSS/pramen

Length of output: 3190


🏁 Script executed:

#!/bin/bash
echo "=== Search for TaskRunnerBaseSuite ==="
fd -t f "TaskRunnerBaseSuite" pramen/core/src/test/

echo ""
echo "=== Search for tests that verify recreate behavior based on schema changes or rerun reason ==="
rg -n "schemaChanges|forceReCreateHiveTables|recreate.*true|recreate.*false" --type scala pramen/core/src/test/ | grep -i "task.*runner\|runner.*test" | head -30

echo ""
echo "=== Check if TaskRunnerBaseSuite exists and its content ==="
find pramen/core/src/test -name "*TaskRunner*" -type f

Repository: AbsaOSS/pramen

Length of output: 812


🏁 Script executed:

#!/bin/bash
echo "=== Read TaskRunnerBaseSuite to examine test coverage ==="
wc -l pramen/core/src/test/scala/za/co/absa/pramen/core/tests/runner/task/TaskRunnerBaseSuite.scala

echo ""
echo "=== Show content with line numbers ==="
cat -n pramen/core/src/test/scala/za/co/absa/pramen/core/tests/runner/task/TaskRunnerBaseSuite.scala

Repository: AbsaOSS/pramen

Length of output: 33640


Core behavioral change looks correct, but test coverage gap should be addressed.

The change correctly implements the PR objective: Hive table recreation now occurs only when schema changes are detected OR when explicitly requested via runtimeConfig.forceReCreateHiveTables, rather than automatically on every rerun.

However, existing tests only verify that createOrRefreshHiveTable is called, not the value of the recreate parameter. Add tests that explicitly verify:

  1. recreate = false when isRerun = true, forceReCreateHiveTables = false, and no schema changes
  2. recreate = true when forceReCreateHiveTables = true

This ensures the behavioral change is properly validated and prevents regressions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@pramen/core/src/main/scala/za/co/absa/pramen/core/runner/task/TaskRunnerBase.scala`
around lines 413 - 415, Test coverage is missing for the new recreate logic in
TaskRunnerBase: add unit tests that call the code path that reaches
task.job.createOrRefreshHiveTable and assert the boolean passed for the recreate
parameter; specifically, add one test where isRerun = true,
runtimeConfig.forceReCreateHiveTables = false and both
schemaChangesBeforeTransform and schemaChangesAfterTransform are empty and
assert createOrRefreshHiveTable was called with recreate = false, and another
where runtimeConfig.forceReCreateHiveTables = true and assert recreate = true;
target the TaskRunnerBase behavior (mock task.job and verify the
createOrRefreshHiveTable(...) call and its recreate argument) so changes to
recreate logic are validated.

@github-actions
Copy link

Unit Test Coverage

Overall Project 84.4% 🍏
Files changed 95.51% 🍏

Module Coverage
pramen:core Jacoco Report 86.36% 🍏
Files
Module File Coverage
pramen:core Jacoco Report CmdLineConfig.scala 95.17% -0.67% 🍏
RuntimeConfig.scala 92.22% 🍏
TaskRunnerBase.scala 82.74% 🍏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Do not re-create Hive tables on rerun, only when schema has changed

1 participant