Skip to content

Conversation

@chideat
Copy link
Collaborator

@chideat chideat commented Jan 12, 2026

Summary by CodeRabbit

  • Documentation
    • Added a comprehensive migration guide for moving clusters from OCP to ACP. Covers applicability and a multi-phase workflow (prep, backup, transform, restore, deploy, cutover), environment and tool preparation, backup/restore setup, incremental and final backups, resource transformation and post-processing patterns (image refs, storage-class/selector mapping, Route→Gateway conversion), restore procedures with modifiers, verification, deployment, cutover, and rollback; includes command examples and configuration snippets.

✏️ Tip: You can customize this high-level summary in your review settings.

Copilot AI review requested due to automatic review settings January 12, 2026 07:32
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 12, 2026

Walkthrough

Adds a new, detailed migration guide at docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md documenting a multi-phase OCP→ACP migration workflow (prep, backups, transform, restore, deploy, cutover, verification, rollback) using Move2Kube and Velero (documentation-only change, +1,215 lines).

Changes

Cohort / File(s) Summary
Migration Documentation
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md
New comprehensive guide (~1,215 added lines) covering applicability, environment/tool preparation, Velero/OADP setup on source/target clusters, incremental & final backups with DNS TTL notes, Move2Kube transformation details (image handling, transformer configuration, Starlark cleanup script, fsGroup injection, storageClass mapping, selector fixes), Route→Gateway post-processing script, transformation execution order, Velero restore with resource modifiers/storage-class changes, deployment/verification, cutover and rollback procedures; includes command blocks, example YAML, and step-by-step instructions.

Sequence Diagram(s)

Skipped.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

  • tyzhou111

Poem

🐇 I nudge the YAML, tidy and bright,
Velero hums softly through day and night,
Move2Kube dances, routes find new ways,
Little rabbit cheers on migration days,
Hops of joy — clusters reunited in light.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding a migration solution documentation from OCP to ACP, which matches the 1215+ lines of new migration guide content.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Fix all issues with AI agents
In @docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md:
- Around line 547-550: The code is currently removing all annotations by popping
the entire "annotations" dict; instead iterate the metadata["annotations"] keys
and remove only OCP-specific keys (e.g., keys that match a known list or
prefixes like "openshift.io", "service.alpha.openshift.io", or other
OCP-specific patterns) while preserving user-defined annotations; update the
dict in place (e.g., build a new dict filtered_annotations or delete matching
keys) and set modified = True only if any OCP-specific annotation was removed;
target the block that checks "if 'annotations' in metadata" and replace the
unconditional metadata.pop("annotations") with this selective-filter logic.
- Line 1034: The hardcoded "runAsUser": 1000 may mismatch a customized APP_GID
in the Starlark script; update the docs to either replace the hardcoded 1000
with a reference to APP_GID (so readers know to set runAsUser equal to APP_GID)
or add a clear note near the runAsUser field and the APP_GID definition
explaining that runAsUser must match APP_GID if users change APP_GID, and
include guidance on where APP_GID is defined in the Starlark script.
- Around line 1093-1098: The restore command risks re-creating system-generated
secrets/configmaps/serviceaccounts; update the velero restore invocation that
builds RESTORE_NAME/BACKUP_NAME to exclude those system resources (e.g., add
--exclude-resources secrets,configmaps,serviceaccounts or add specific
name/label exclusions for default-token-*, kube-root-ca.crt, builder-*,
deployer-*), or alternatively document that after running the restore with
--include-resources ... and --resource-modifier-configmap
restore-helper-modifiers you must manually remove those system resources to
avoid conflicts.
- Line 285: The current one-liner oc scale --replicas=0 dc,deploy,statefulset
--all -n ${SOURCE_NS} will unconditionally scale everything to zero; add a
pre-scale verification that lists the DeploymentConfigs/Deployments/StatefulSets
in ${SOURCE_NS} (e.g., using oc get dc,deploy,statefulset -n ${SOURCE_NS} and/or
filter by non-zero replicas) and require explicit confirmation before running oc
scale --replicas=0 dc,deploy,statefulset --all -n ${SOURCE_NS}; implement this
as an extra step or small wrapper that prints the resources to be affected and
prompts the user (or fails on a missing/incorrect ${SOURCE_NS}) before executing
the oc scale command.
🧹 Nitpick comments (4)
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md (4)

360-365: Consider cleanup of sed backup files.

The sed -i.bak command creates .bak backup files in the source/ directory (line 363) that are not cleaned up. These backup files will accumulate and might cause confusion.

♻️ Suggested cleanup

Add cleanup after the replacement loop:

   done
 done < image_mappings.txt

+# Clean up sed backup files
+find source/ -name "*.bak" -type f -delete
+
 echo "✅ Image Replacement Completed"

824-828: Improve port extraction and validation.

The port extraction (line 826) and validation (line 828) could fail silently or produce unexpected results:

  1. yq '.spec.port.targetPort // 80' might return a named port (string)
  2. The regex [0-9]+ check only validates format, not range (1-65535)
  3. Named ports would pass through if they happen to be numeric strings
♻️ Enhanced port handling
-    local port=$(yq '.spec.port.targetPort // 80' "$rf")
-    local tls=$(yq '.spec.tls.termination // "none"' "$rf")
-    [[ ! "$port" =~ ^[0-9]+$ ]] && port=80
+    local port=$(yq '.spec.port.targetPort // 80' "$rf")
+    local tls=$(yq '.spec.tls.termination // "none"' "$rf")
+    
+    # Handle named ports and validate numeric ports
+    if [[ "$port" =~ ^[0-9]+$ ]]; then
+        # Validate port range
+        if (( port < 1 || port > 65535 )); then
+            log "Warning: Invalid port $port for route $name, using 80"
+            port=80
+        fi
+    else
+        # Named port or invalid value
+        log "Warning: Non-numeric port '$port' for route $name, using 80"
+        port=80
+    fi

1013-1036: Consider pinning the Alpine image version.

The restore helper container uses alpine:latest (line 1016), which:

  1. Can change unpredictably between restores
  2. Might introduce breaking changes
  3. Violates immutability best practices
📌 Recommended image pinning
                 {
                   "name": "restore-data",
-                  "image": "alpine:latest",
+                  "image": "alpine:3.19",
                   "command": ["sleep", "3600"],

Or use a minimal distroless/static image for better security:

-                  "image": "alpine:latest",
+                  "image": "gcr.io/distroless/static:nonroot",
                   "command": ["sleep", "3600"],

1113-1119: Fix Markdown blockquote formatting.

There's a blank line inside the blockquote (line 1117 between lines 1113-1119), which may cause rendering issues in some Markdown processors.

♻️ Fix blockquote formatting
 > [!NOTE] **Restore Principle**:
 >
 > - Velero File System Backup requires **Running Pod** to restore data to PVC
 > - Resource Modifier replaces original app container with simple `sleep` container
 > - Velero injected restore init container runs **first** to restore data to PVC
 > - After restore completes, `sleep` container starts and keeps Pod running
-
+
 > [!WARNING] **About Pod Start Failure**:

Remove the blank line or close and reopen the blockquote.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b739efb and 39cfd00.

📒 Files selected for processing (1)
  • docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md
🧰 Additional context used
🪛 LanguageTool
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md

[grammar] ~1122-~1122: Use a hyphen to join words.
Context: ...mal app will be deployed using Move2Kube converted YAMLs): ```bash # 1. Delete t...

(QB_NEW_EN_HYPHEN)


[style] ~1164-~1164: Using many exclamation marks might seem excessive (in this case: 15 exclamation marks for a text that’s 10822 characters long)
Context: ...app=my-core-app -n ${TARGET_NS} ``` > [!WARNING] app=my-core-app -> Your appli...

(EN_EXCESSIVE_EXCLAMATION)

🪛 markdownlint-cli2 (0.18.1)
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md

1113-1113: Blank line inside blockquote

(MD028, no-blanks-blockquote)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a comprehensive migration guide for moving applications from Red Hat OpenShift (OCP) to Alauda Container Platform (ACP). The document provides step-by-step instructions using Move2Kube for resource transformation and Velero for data migration.

Changes:

  • Added detailed migration documentation covering the entire process from preparation to cutover
  • Includes automated transformation scripts (Starlark) to convert OCP-specific resources to ACP-compatible formats
  • Provides bash scripts for Route to Gateway API conversion and resource management

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md:
- Around line 324-369: The ImageStream processing block lacks validation and
error handling causing silent failures; add checks after producing
image_mappings.txt (verify file exists and is non-empty, prompt or exit on
failure), wrap the while-read replacement loop to detect and log grep/sed errors
(check grep exit status and capture files_found, check sed exit code and
preserve .bak on error), and add a post-replacement validation step that
searches for leftover INTERNAL_REG occurrences (referencing INTERNAL_REG and
image_mappings.txt, the grep/sed loop and the while IFS='=' read -r key value
loop) to report unreplaced internal registry references and return a non-zero
exit code or warning if any remain; optionally add a --dry-run flag to run the
same logic without writing changes.
- Around line 797-894: The docs should call out API stability differences used
by convert_route: the script generates HTTPRoute with apiVersion
gateway.networking.k8s.io/v1 (created as ${name}-httproute.yaml) which is
GA/stable, while TLSRoute is generated with gateway.networking.k8s.io/v1alpha2
(created as ${name}-tlsroute.yaml) which is experimental; update the
documentation near the convert_route description to explicitly state this
maturity difference, warn about using TLSRoute in production, and suggest
alternatives (e.g., use HTTPRoute with TLS handled at the Gateway, test feature
gates, or track Gateway API releases) so readers understand the risk and can
make informed migration choices.
🧹 Nitpick comments (4)
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md (4)

76-89: Add version requirements and validation checks for tools.

The tool preparation section lists required tools but doesn't specify version requirements or provide validation checks. This could lead to compatibility issues during migration.

📋 Suggested improvements

Add a validation section after the download links:

# Verify tool installations and versions
echo "Checking tool versions..."

# Check oc
oc version --client || { echo "ERROR: oc not found"; exit 1; }

# Check kubectl  
kubectl version --client || { echo "ERROR: kubectl not found"; exit 1; }

# Check velero (require v1.12+)
velero version --client-only || { echo "ERROR: velero not found"; exit 1; }

# Check move2kube (require v0.3+)
move2kube version || { echo "ERROR: move2kube not found"; exit 1; }

# Check yq (require v4+)
yq --version || { echo "ERROR: yq not found"; exit 1; }

echo "✅ All tools are installed"

Also document minimum version requirements in the tool list.


1010-1034: Pin the Alpine image version instead of using 'latest' tag.

Using alpine:latest (line 1014) in the restore container is not recommended for production migrations, as the image content can change unpredictably and may introduce compatibility issues.

📦 Suggested fix

Replace with a pinned version:

               {
                 "name": "restore-data",
-                "image": "alpine:latest",
+                "image": "alpine:3.19",
                 "command": ["sleep", "3600"],

Alternatively, based on learnings from the repository, consider using an existing image from your infrastructure if available, rather than introducing external dependencies.

Based on learnings, users prefer to avoid introducing external Docker images when possible.


1043-1043: Clarify terminology: runAsUser vs UID.

The warning mentions "UID Consistency" but the actual field being discussed is runAsUser. While related, using consistent terminology would improve clarity.

✏️ Suggested rewording
-[!WARNING] **UID Consistency**: The `runAsUser: 1000` must match the `APP_GID` configured in the Starlark transformer. If you modified APP_GID, update this value accordingly.
+[!WARNING] **User ID Consistency**: The `runAsUser: 1000` must match the `APP_GID` configured in the Starlark transformer. If you modified APP_GID, update this value accordingly.

Or more technically accurate:

-[!WARNING] **UID Consistency**: The `runAsUser: 1000` must match the `APP_GID` configured in the Starlark transformer. If you modified APP_GID, update this value accordingly.
+[!WARNING] **GID/UID Consistency**: The `runAsUser: 1000` in the restore container must match the `APP_GID` value (also 1000) configured in the Starlark transformer for fsGroup. If you modified APP_GID, update this runAsUser value accordingly.

1112-1118: Minor: Remove blank line inside blockquote.

Line 1112 has a blank line inside the blockquote which violates markdown linting rules (MD028).

📝 Formatting fix
 > [!WARNING] **About Pod Start Failure**:
 >
-> Since Resource Modifier **does not clean original initContainers**, some initContainer might fail due to specific configs (e.g., missing ConfigMap dependency).
+> Since Resource Modifier **does not clean original initContainers**, some initContainer might fail due to specific configs (e.g., missing ConfigMap dependency).
 >
 > **This does not affect data restore!** Velero's restore init container executes **first**. As long as it completes successfully, data is restored to PVC. Subsequent failures in other initContainers or main container do not jeopardize restored data.

Remove the blank line after the first > on line 1113.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 39cfd00 and 1a5df56.

📒 Files selected for processing (1)
  • docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-23T02:11:19.999Z
Learnt from: davidwtf
Repo: alauda/knowledge PR: 60
File: docs/en/solutions/How_to_Use_Pipeline_to_Train_AI_Models.md:829-833
Timestamp: 2025-09-23T02:11:19.999Z
Learning: User prefers not to introduce or maintain external Docker images in their pipeline configurations. They would rather use existing images from their infrastructure or accept some fragility rather than add new external dependencies.

Applied to files:

  • docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md
🪛 LanguageTool
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md

[grammar] ~1121-~1121: Use a hyphen to join words.
Context: ...mal app will be deployed using Move2Kube converted YAMLs): ```bash # 1. Delete t...

(QB_NEW_EN_HYPHEN)


[style] ~1163-~1163: Using many exclamation marks might seem excessive (in this case: 16 exclamation marks for a text that’s 10995 characters long)
Context: ...app=my-core-app -n ${TARGET_NS} ``` > [!WARNING] app=my-core-app -> Your appli...

(EN_EXCESSIVE_EXCLAMATION)

🪛 markdownlint-cli2 (0.18.1)
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md

1112-1112: Blank line inside blockquote

(MD028, no-blanks-blockquote)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (5)
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md (5)

1-27: LGTM! Clear overview and applicability matrix.

The document structure and applicability table provide a clear understanding of supported migration scenarios and limitations.


202-291: Well-structured backup strategy with warm-up and rollback preparation.

The incremental backup approach with warm-up phase and replica snapshot provides a robust foundation for minimizing downtime and enabling quick rollback if needed.


403-793: Comprehensive Starlark transformer with proper resource cleanup and fsGroup injection.

The transformation logic correctly handles:

  • OCP-specific field removal
  • Dynamic field cleanup (clusterIP, nodePort)
  • Label/selector mismatch fixes (critical for DeploymentConfig→Deployment conversion)
  • Conditional fsGroup injection for workloads with PVCs
  • StorageClass mapping for StatefulSet volumeClaimTemplates

1141-1188: Clear verification and rollback procedures.

The verification steps provide basic health checks, and the rollback procedure correctly references the replica snapshot created during the backup phase (line 273), enabling quick recovery if issues arise.


1049-1072: StorageClass mappings are currently consistent across the documentation—no action needed.

The mappings defined in the Starlark transformer (lines 419-423) and Velero ConfigMap (lines 1059-1061) match. However, this is a documentation consistency risk: users updating one location without updating the other could cause migration failures. Consider adding a note in the documentation reminding maintainers to keep both definitions synchronized.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🤖 Fix all issues with AI agents
In @docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md:
- Around line 985-1038: Add a WARNING note before the restore-helper-modifiers
ConfigMap explaining that the current modifier.yaml's
resourceModifierRules/mergePatches replaces all containers with a single alpine
sleep container and may break multi-container/stateful apps (mentioning
initContainers, sidecars, and differing volume mounts); then include an
alternative approach that preserves existing volumes and per-container mounts by
switching from mergePatches to a patches entry that iterates over
.spec.containers and retains volumeMounts and other critical fields while only
replacing image/command, referencing resourceModifierRules, modifier.yaml, and
restore-helper-modifiers so reviewers can find and update the ConfigMap.
- Line 1040: Replace the non-standard "[!WARNING]" admonition with the
document's blockquote style: change the line containing "runAsUser: 1000" and
the note about APP_GID to a standard blockquote (e.g., prefix with ">") and keep
the same wording about ensuring runAsUser matches the APP_GID in the Starlark
transformer; reference the "runAsUser: 1000" token and "APP_GID" so reviewers
can find and verify the updated warning format.
- Around line 1172-1183: The rollback script reads replicas_backup.txt and
blindly runs oc scale, so add pre-checks and error handling: verify
replicas_backup.txt exists and is readable, validate each parsed line matches
the expected "resource=replicas" format and that replicas is a non-negative
integer, check the target resource exists in the cluster/namespace (use
SOURCE_NS) before calling oc scale, and handle oc scale failures by logging the
failing resource and continuing or aborting based on a configurable flag; ensure
exit codes reflect failure when any critical step fails.
- Around line 543-547: The current block unconditionally removes all annotations
by checking "annotations" in metadata and popping it, which can drop important
user annotations; instead, filter metadata["annotations"] to remove only
OCP/system-specific keys (e.g., keys starting with "openshift.io/",
"kubectl.kubernetes.io/", or other known OCP prefixes) while preserving
user-defined annotations, update the annotations mapping only if any keys were
removed, and set modified = True only when the annotations dict actually
changes; locate the check that references metadata, "annotations", and modified
and replace the unconditional pop with a selective-key removal routine.
- Around line 411-420: Add concrete examples and discovery commands for
STORAGECLASS_MAPPING: before the Starlark script section, insert a short note
showing how to discover storage classes on source and target clusters (commands
to run on OCP and ACP) and instruct users to update STORAGECLASS_MAPPING
accordingly; then expand the STORAGECLASS_MAPPING comment block around the
constants APP_GID and FS_GROUP_CHANGE_POLICY to include a worked example mapping
pair (e.g., "ocs-storagecluster-ceph-rbd": "sc-topolvm") and a brief note
explaining to match provisioner types and add additional mappings as needed so
users know how to populate STORAGECLASS_MAPPING in transformer.star.
- Around line 105-111: The current oc create secret generic cloud-credentials
... --from-file=cloud=<(cat <<EOF ...) pattern exposes credentials to shell
history and process listings; replace the here-doc approach by creating the
credential file on disk with restricted permissions (chmod 600) and pass that
file to the secret command (or use --from-literal with stdin-safe tooling or a
secrets manager/vault) so credentials are not visible in command history or
process listings; update the deployment docs to show using the file-based or
vault-backed flow instead of the inline here-doc.
- Line 282: The doc currently uses a broad command "oc scale --replicas=0
dc,deploy,statefulset --all -n ${SOURCE_NS}" which may unintentionally scale
infrastructure; add a clear warning paragraph immediately before this command
stating that the scale will affect ALL DeploymentConfigs, Deployments, and
StatefulSets in the namespace and advise reviewing resources first, and update
the example to recommend using a label selector (e.g., replace or supplement the
--all usage with a targeted form using --selector=<label>=<value>) so readers
are guided to target only migration-related applications when running oc scale.
- Line 1109: There is a markdown lint error caused by a blank line inside a
blockquote in the document (around line 1109); open the blockquote in
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md, remove the empty line(s)
inside the blockquote or ensure each continued line begins with '>' so the
blockquote remains contiguous (alternatively delete the blank line if it serves
no purpose) to satisfy the markdown linter.
- Around line 809-812: The loop currently deletes Ingress YAMLs silently; update
the deletion flow in the while IFS read loop to (1) create a timestamped backup
directory and copy matching files there before removing, (2) require an explicit
confirmation flag or interactive prompt (e.g., confirm variable) before
performing rm -f, and (3) add a safe "dry-run" mode that only logs which files
would be deleted; reference the existing loop and commands (the while IFS= read
-r -d '' f; do ... yq -e '.kind == "Ingress"' "$f" ... rm -f "$f"; log "Deleted:
$f"; ((count++)) ... done and the final log "Deleted $count Ingress resources")
to implement these changes so deletions are reversible and require explicit
consent.
🧹 Nitpick comments (3)
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md (3)

220-232: Consider adding backup validation step.

The warm-up backup process should include validation that the backup completed successfully before relying on incremental backups during the final cutover.

💡 Add validation step

Add after the backup describe command:

# Verify backup completed successfully
BACKUP_STATUS=$(velero -n openshift-adp backup describe migration-warmup-$(date +%F) --details | grep "Phase:" | awk '{print $2}')
if [ "$BACKUP_STATUS" != "Completed" ]; then
  echo "ERROR: Backup failed with status: $BACKUP_STATUS"
  exit 1
fi

# Check for errors
BACKUP_ERRORS=$(velero -n openshift-adp backup describe migration-warmup-$(date +%F) --details | grep "Errors:" | awk '{print $2}')
if [ "$BACKUP_ERRORS" != "0" ]; then
  echo "WARNING: Backup completed with $BACKUP_ERRORS errors"
  velero -n openshift-adp backup logs migration-warmup-$(date +%F)
fi

354-364: sed replacement pattern may cause unintended modifications.

The sed command uses .bak extension for backups, which creates clutter. Additionally, the regex pattern might not handle all edge cases (e.g., images with quotes, multi-line YAML).

♻️ Improved replacement with validation
 # 1.2 Execute Batch Replacement
 echo "Executing image address replacement..."
 # Backup source directory
 cp -r source source_bak_images

+# Track replacements for review
+REPLACEMENT_LOG="image_replacements.log"
+echo "Image Replacement Log - $(date)" > "$REPLACEMENT_LOG"
+
 # Iterate mappings for replacement
 # Note: Only replace exact matches after 'image: ' in yaml
 while IFS='=' read -r key value; do
   if [[ -n "$key" && -n "$value" ]]; then
      # Find files containing this key reference (Improve efficiency)
      grep -l "image: .*[\"']\?${key}[\"']\?$" source/*.yaml 2>/dev/null | sort | uniq | while read -r file; do
        # Use sed to replace: match image: followed by optional quote, then key, then optional quote, end of line
        # Replace with image: value
-       sed -i.bak "s|image: \([\"']\{0,1\}\)${key}\1$|image: $value|g" "$file"
+       sed -i "s|image: \([\"']\{0,1\}\)${key}\1$|image: $value|g" "$file"
-       echo "Updated $file: $key"
+       echo "Updated $file: $key -> $value" | tee -a "$REPLACEMENT_LOG"
      done
   fi
 done < image_mappings.txt

+# Clean up sed backup files
+find source -name "*.bak" -delete
+
 echo "✅ Image Replacement Completed"
+echo "Review replacement log: $REPLACEMENT_LOG"

816-869: Route conversion may fail for complex routing scenarios.

The script only handles basic Route configurations. Complex scenarios like weighted backends, multiple paths, custom headers, or rewrite rules are not supported.

Add validation and warnings for unsupported features:

✅ Add validation for unsupported Route features
# Add this function before convert_route
validate_route() {
    local rf="$1"
    local name=$(yq '.metadata.name' "$rf")
    local warnings=()
    
    # Check for alternate backends (weighted routing)
    local alt_backends=$(yq '.spec.alternateBackends // []' "$rf")
    [[ "$alt_backends" != "[]" ]] && warnings+=("Alternate backends not supported")
    
    # Check for TLS edge/reencrypt (only passthrough supported via TLSRoute)
    local tls=$(yq '.spec.tls.termination // "none"' "$rf")
    if [[ "$tls" == "edge" ]] || [[ "$tls" == "reencrypt" ]]; then
        warnings+=("TLS $tls termination - review TLS configuration in Gateway")
    fi
    
    # Check for custom timeout
    local timeout=$(yq '.metadata.annotations["haproxy.router.openshift.io/timeout"] // ""' "$rf")
    [[ -n "$timeout" ]] && warnings+=("Custom timeout annotation - configure in HTTPRoute timeouts")
    
    if [ ${#warnings[@]} -gt 0 ]; then
        echo -e "\033[0;33m[WARN]\033[0m Route $name has unsupported features:"
        printf '  - %s\n' "${warnings[@]}"
        echo "  Manual review required after conversion"
    fi
}

# Then call it in convert_route before conversion
convert_route() {
    local rf="$1" od="$2"
+   validate_route "$rf"
    local name=$(yq '.metadata.name' "$rf")
    # ... rest of function
}
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1a5df56 and 7f97a89.

📒 Files selected for processing (1)
  • docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-23T02:11:19.999Z
Learnt from: davidwtf
Repo: alauda/knowledge PR: 60
File: docs/en/solutions/How_to_Use_Pipeline_to_Train_AI_Models.md:829-833
Timestamp: 2025-09-23T02:11:19.999Z
Learning: User prefers not to introduce or maintain external Docker images in their pipeline configurations. They would rather use existing images from their infrastructure or accept some fragility rather than add new external dependencies.

Applied to files:

  • docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md
🪛 LanguageTool
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md

[grammar] ~1118-~1118: Use a hyphen to join words.
Context: ...mal app will be deployed using Move2Kube converted YAMLs): ```bash # 1. Delete t...

(QB_NEW_EN_HYPHEN)


[style] ~1160-~1160: Using many exclamation marks might seem excessive (in this case: 16 exclamation marks for a text that’s 10992 characters long)
Context: ...app=my-core-app -n ${TARGET_NS} ``` > [!WARNING] app=my-core-app -> Your appli...

(EN_EXCESSIVE_EXCLAMATION)

🪛 markdownlint-cli2 (0.18.1)
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md

1109-1109: Blank line inside blockquote

(MD028, no-blanks-blockquote)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md:
- Around line 998-1022: The container definition for "restore-data" uses the
mutable image tag "alpine:latest"; update the image field in the restore-data
container to a pinned Alpine version (for example "alpine:3.19" or another
specific minor version) so the restore-data container uses a reproducible,
stable image; leave the rest of the container spec (command, resources,
securityContext, name) unchanged.
🧹 Nitpick comments (4)
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md (4)

104-111: Strengthen credential security guidance.

The credential secret creation uses placeholder values (YOUR_ACCESS_KEY, YOUR_SECRET_KEY). Consider adding explicit warnings about:

  • Never committing actual credentials to version control
  • Using environment variables or secret management tools
  • Ensuring credentials have minimum required permissions (principle of least privilege)
📝 Enhanced security guidance

Add a warning block before the credential creation:

> [!CAUTION] **Credential Security**
> - Replace `YOUR_ACCESS_KEY` and `YOUR_SECRET_KEY` with actual values
> - Never commit credentials to version control
> - Use environment variables: `aws_access_key_id=$S3_ACCESS_KEY`
> - Ensure S3 credentials have minimum required permissions (read/write to backup bucket only)

344-361: Clean up temporary .bak files created by sed.

The sed command at line 356 creates .bak backup files but the script doesn't clean them up afterward. This could leave many temporary files in the workspace.

🧹 Add cleanup step

After the image replacement loop (after line 360), add:

# Clean up sed backup files
find source -name "*.bak" -type f -delete
echo "Cleaned up temporary .bak files"

10-74: Consider adding pre-migration testing guidance.

While the guide is comprehensive, it would benefit from an explicit recommendation to:

  • Test the entire migration process in a non-production environment first
  • Validate tool version compatibility (oc, kubectl, velero CLI versions with cluster versions)
  • Verify network connectivity between clusters and object storage
📋 Suggested addition

Add a "Prerequisites" subsection before section 0:

## Prerequisites

> [!IMPORTANT]
> - **Test First**: Execute this migration in a staging/test environment before production
> - **Tool Versions**: Verify compatibility between CLI tools and cluster versions
>   - `velero` CLI version should match server version
>   - `oc` CLI compatible with OCP cluster version
>   - `move2kube` version: Latest stable recommended
> - **Network**: Ensure both clusters can access the shared S3/MinIO endpoint
> - **Access**: Verify cluster-admin or equivalent permissions on both clusters

1100-1100: Fix markdown blockquote formatting.

There's a blank line inside the blockquote at line 1100, which violates markdown best practices (markdownlint MD028).

🔧 Fix formatting

Remove the blank line within the > [!WARNING] blockquote or ensure it's consistently formatted. Most markdown renderers handle this, but fixing it ensures compliance with linters.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7f97a89 and 6e5bca2.

📒 Files selected for processing (1)
  • docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-23T02:11:19.999Z
Learnt from: davidwtf
Repo: alauda/knowledge PR: 60
File: docs/en/solutions/How_to_Use_Pipeline_to_Train_AI_Models.md:829-833
Timestamp: 2025-09-23T02:11:19.999Z
Learning: User prefers not to introduce or maintain external Docker images in their pipeline configurations. They would rather use existing images from their infrastructure or accept some fragility rather than add new external dependencies.

Applied to files:

  • docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md
🪛 LanguageTool
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md

[grammar] ~1109-~1109: Use a hyphen to join words.
Context: ...mal app will be deployed using Move2Kube converted YAMLs): ```bash # 1. Delete t...

(QB_NEW_EN_HYPHEN)


[style] ~1151-~1151: Using many exclamation marks might seem excessive (in this case: 16 exclamation marks for a text that’s 10983 characters long)
Context: ...app=my-core-app -n ${TARGET_NS} ``` > [!WARNING] app=my-core-app -> Your appli...

(EN_EXCESSIVE_EXCLAMATION)

🪛 markdownlint-cli2 (0.18.1)
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md

1100-1100: Blank line inside blockquote

(MD028, no-blanks-blockquote)

🔇 Additional comments (8)
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md (8)

1-27: LGTM! Clear scope and applicable scenarios.

The overview and applicability table provide excellent guidance, especially the warnings about database and cache migrations. The distinction between what should use Velero vs. native tools is valuable.


265-286: Excellent backup strategy with rollback preparation.

The combination of incremental warm-up backups and replica snapshot export (line 270) provides robust rollback capability. The --wait flag on the final backup ensures completion before shutdown.


391-781: Well-structured Starlark transformer with comprehensive transformations.

The transformer handles multiple critical migration tasks:

  • Cleanup of OCP-specific fields
  • fsGroup injection for PVC permissions
  • StorageClass mapping for cross-cluster migration
  • Selector/label consistency fixes

The modular function structure makes it maintainable and easy to understand.

Ensure the APP_GID = 1000 value (line 402) matches the runAsUser value in the restore modifier ConfigMap (line 1019). Inconsistency here could cause permission issues during data restoration.


785-881: LGTM! Robust Route to Gateway API conversion.

The script correctly handles the conversion of OCP Routes to Gateway API resources:

  • Passthrough TLS → TLSRoute with proper port handling (defaulting to 443)
  • Other termination types → HTTPRoute
  • Configurable Gateway references via environment variables
  • Good error handling with set -euo pipefail

1031-1031: Excellent consistency check guidance.

The warning about runAsUser: 1000 matching APP_GID is crucial for avoiding permission issues during restoration. This cross-reference between configurations demonstrates good attention to detail.


1094-1106: Clear explanation of restoration mechanics.

The note about restore principles and the warning about initContainer failures demonstrate deep understanding of Velero's operation. The clarification that data restoration completes even if subsequent containers fail is crucial for troubleshooting.


1107-1127: Thorough cleanup procedure preserving critical data.

The cleanup steps correctly:

  • Delete temporary Pods while preserving PVCs
  • Remove associated temp resources (SA, Secrets, ConfigMaps)
  • Verify PVC integrity before proceeding
  • Include optional Velero config cleanup

The explicit warning that "Deleting Pod does not delete PVC and its data" (line 1127) is essential.


1161-1177: Complete rollback procedure with clear steps.

The rollback scheme provides:

  • Fast DNS cutover (60s TTL pays off here)
  • Automated replica restoration using the snapshot file
  • DNS TTL restoration to normal

This gives confidence that the migration can be reversed quickly if issues arise.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In @docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md:
- Around line 1019-1043: The container definition using image "alpine:latest"
(container name "restore-data") should be updated to pin a specific, immutable
tag (e.g., alpine:3.19) or point to an internal registry image (e.g.,
registry.internal.example.com/library/alpine:3.19) to avoid mutable tags,
improve reproducibility and security, and ensure accessibility in air-gapped
environments; update the "image" field for the restore-data container and add a
brief note recommending use of the org's internal registry if external pulls are
not allowed.
- Around line 554-557: The blanket removal of annotations via
metadata.pop("annotations") in cleanup_resource should be replaced with a
denylist-based cleanup: introduce ANNOTATION_PATTERNS_TO_REMOVE and, inside
cleanup_resource where metadata and annotations are accessed, iterate annotation
keys, collect keys that match any denylist pattern (e.g., "openshift.io/",
"pv.kubernetes.io/", "kubectl.kubernetes.io/",
"deployment.kubernetes.io/revision",
"deprecated.daemonset.template.generation"), remove only those keys from
metadata["annotations"], set modified = True when removals occur, and only call
metadata.pop("annotations") if the annotations dict becomes empty; leave all
other annotations intact.
- Line 301: Add a prerequisite note clarifying that the namespace referenced by
SOURCE_NS in the command "oc scale --replicas=0 dc,deploy,statefulset --all -n
${SOURCE_NS}" must contain only the application workloads being migrated and
must not host infrastructure components, operators, or shared/multi-tenant
services; update the docs to include a "**Prerequisites**" bullet with that
exact guidance and a warning that the command scales ALL matching resources.
Also add an optional alternative example showing how to target only specific
workloads using a label selector, e.g. demonstrate "oc scale --replicas=0
dc,deploy,statefulset --selector=app.kubernetes.io/part-of=myapp -n
${SOURCE_NS}" and explain when to use the selector approach for mixed-workload
namespaces.
🧹 Nitpick comments (9)
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md (9)

104-111: Consider adding secret management guidance.

The documentation shows credentials in plaintext heredocs and YAML stringData. While the placeholder values are appropriate for documentation, consider adding a note about secure secret management practices:

  • Using secret management tools (Vault, Sealed Secrets, etc.)
  • Never committing actual credentials to version control
  • Rotating credentials after migration completion

Also applies to: 154-167


369-379: Consider using yq for safer YAML manipulation.

The current approach uses sed with regex patterns to replace image references in YAML files. This is fragile because:

  • YAML has multiple valid syntaxes (quoted, unquoted, multiline, flow style)
  • The regex pattern might not handle all edge cases correctly
  • sed creates .bak files that accumulate

Since the documentation already requires yq as a tool dependency (line 88), consider using it for more robust YAML manipulation:

♻️ Alternative approach using yq
# More robust image replacement using yq
while IFS='=' read -r key value; do
  if [[ -n "$key" && -n "$value" ]]; then
    for file in source/*.yaml; do
      # Check if file contains the key and replace
      if yq eval ".. | select(has(\"image\")) | .image" "$file" 2>/dev/null | grep -qF "$key"; then
        yq eval "(.. | select(has(\"image\") and .image == \"$key\") | .image) = \"$value\"" -i "$file"
        echo "Updated $file: $key -> $value"
      fi
    done
  fi
done < image_mappings.txt

Note: This approach is safer but may be slower on large codebases.


1052-1052: Clarify the relationship between runAsUser and APP_GID.

The warning mentions "UID Consistency" between runAsUser: 1000 and APP_GID, but the relationship could be clearer. The configuration involves two distinct security context fields:

  • fsGroup (set by Starlark to APP_GID = 1000): Controls the group ownership of mounted volumes
  • runAsUser (set here to 1000): Controls which user ID the container process runs as

For proper file access, the container's user should belong to the fsGroup. Consider rewording the warning to:

[!WARNING] **User and Group Consistency**: The `runAsUser: 1000` should match or belong to the `fsGroup` (APP_GID) configured in the Starlark transformer to ensure the restore container can access volume data.

1101-1106: Reconsider including PersistentVolumes in namespace-scoped restore.

Line 1103 includes persistentvolumes in the --include-resources list. However:

  • PersistentVolumes are cluster-scoped resources (not namespace-scoped)
  • The restore uses --namespace-mappings which is for namespace-scoped resources
  • Velero automatically recreates PVs when restoring PVCs with volume data
  • Including PVs explicitly might cause conflicts or be ignored

Consider removing persistentvolumes from the include list unless there's a specific reason to include them:

velero -n cpaas-system restore create ${RESTORE_NAME} \
  --from-backup ${BACKUP_NAME} \
  --include-resources pods,configmaps,secrets,serviceaccounts,persistentvolumeclaims \
  --namespace-mappings ${SOURCE_NS}:${TARGET_NS} \
  --resource-modifier-configmap restore-helper-modifiers \
  --wait

Or add a note explaining why PVs are included if there's a specific architectural reason.


1144-1146: Exercise caution when cleaning shared Velero ConfigMaps.

Line 1145 deletes the change-storage-class and restore-helper-modifiers ConfigMaps from the cpaas-system namespace. If multiple migrations or restores are running in parallel, deleting these shared ConfigMaps could affect other restore operations.

Consider adding a note about timing:

# 4. Clean Velero Config ConfigMaps (Only if no other restores are in progress)
# Check for other active restores first:
# velero -n cpaas-system restore get --status InProgress
kubectl delete cm change-storage-class restore-helper-modifiers -n cpaas-system

Alternatively, use unique names for each migration batch to avoid conflicts.


1183-1198: Consider expanding rollback procedure documentation.

The rollback procedure covers the essential steps (DNS revert and replica restoration), but consider adding guidance for:

  1. Data consistency checks: How to verify source cluster data wasn't modified during migration window
  2. Partial migration scenarios: What if some users accessed the target cluster before rollback?
  3. Time window limitations: How long after cutover is rollback still safe?
  4. Post-rollback validation: Steps to verify source cluster is functioning correctly
  5. Target cluster cleanup: Whether to clean up resources in target cluster after rollback

Adding a rollback decision matrix or checklist would help users make informed decisions during incident response when time is critical.


1130-1130: Minor grammar: Consider hyphenating compound adjective.

The phrase "Move2Kube converted YAMLs" could be written as "Move2Kube-converted YAMLs" for better readability when "Move2Kube converted" acts as a compound adjective modifying "YAMLs".


1121-1126: Minor markdown formatting: Blank line in blockquote.

The blockquote starting at line 1122 contains a blank line (line 1124), which triggers markdownlint rule MD028. While this doesn't affect rendering, you can remove the blank line for cleaner markdown or reconsider the blockquote structure.


836-879: Document Gateway API version requirements and add prerequisite checks.

The script generates TLSRoute with apiVersion: gateway.networking.k8s.io/v1alpha2, which is still an experimental/alpha API as of 2025. The target ACP cluster must support this alpha resource, which may not be enabled by default.

Add a prerequisite section to verify Gateway API support before migration:

  • Confirm Gateway API CRDs are installed and the TLSRoute v1alpha2 resource is available
  • Verify the referenced Gateway resource exists in the specified namespace
  • Document the minimum Gateway API version required for this migration path
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6e5bca2 and 4004e92.

📒 Files selected for processing (1)
  • docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-23T02:11:19.999Z
Learnt from: davidwtf
Repo: alauda/knowledge PR: 60
File: docs/en/solutions/How_to_Use_Pipeline_to_Train_AI_Models.md:829-833
Timestamp: 2025-09-23T02:11:19.999Z
Learning: User prefers not to introduce or maintain external Docker images in their pipeline configurations. They would rather use existing images from their infrastructure or accept some fragility rather than add new external dependencies.

Applied to files:

  • docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md
🪛 LanguageTool
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md

[grammar] ~1130-~1130: Use a hyphen to join words.
Context: ...mal app will be deployed using Move2Kube converted YAMLs): ```bash # 1. Delete t...

(QB_NEW_EN_HYPHEN)

🪛 markdownlint-cli2 (0.18.1)
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md

1121-1121: Blank line inside blockquote

(MD028, no-blanks-blockquote)

🔇 Additional comments (2)
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md (2)

1-27: Clear overview and applicability matrix.

The document structure is well-organized with proper frontmatter metadata and a helpful applicability table that sets clear expectations about what scenarios are supported and which require special handling.


1-1202: Comprehensive migration guide with solid technical depth.

This is an excellent, detailed migration guide that covers the complete end-to-end process. The document demonstrates:

✅ Clear phase separation with visual flow diagram
✅ Detailed command examples with explanations
✅ Good use of warnings and tips throughout
✅ Includes both transformation and data migration aspects
✅ Rollback procedures documented

The review comments above focus on improving robustness, security, and handling edge cases. Overall, this is production-quality documentation that will serve teams well during migration planning and execution.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md (4)

1022-1022: Consider using an internal image instead of external alpine:latest.

The restore helper uses alpine:latest from Docker Hub. Based on learnings, using images from your infrastructure is preferred over external dependencies.

Based on learnings, users prefer to use existing images from their infrastructure rather than introduce external dependencies.

♻️ Suggested alternative

Replace with an image that's already available in your ACP registry or internal infrastructure. For example:

-                  "image": "alpine:latest",
+                  "image": "registry.example.com/base/busybox:stable",

Or use any lightweight image that's already mirrored in your environment. The container only needs to run sleep and doesn't require specific functionality.


1052-1052: Enhance the UID consistency warning with specific reference.

The warning about runAsUser: 1000 matching APP_GID is important but could be more actionable.

💡 Suggested enhancement
-[!WARNING] **UID Consistency**: The `runAsUser: 1000` must match the `APP_GID` configured in the Starlark transformer. If you modified APP_GID, update this value accordingly.
+[!WARNING] **UID Consistency**: The `runAsUser: 1000` must match the `APP_GID` configured in the Starlark transformer (line 421). If you modified APP_GID, update this value accordingly.

This helps users quickly locate the related configuration.


1130-1130: Consider hyphenating compound adjective.

For improved readability, compound adjectives that modify a noun should be hyphenated.

📝 Suggested fix
-# 1. Delete temp Pods used for restore (PVC and Data retained)
+# 1. Delete temp Pods used for restore (PVC and data retained)

Also consider:

-Once data restore is done, delete these temp Pods (Formal app will be deployed using Move2Kube converted YAMLs):
+Once data restore is done, delete these temp Pods (Formal app will be deployed using Move2Kube-converted YAMLs):

1121-1121: Remove blank line inside blockquote.

Markdown linting detects a blank line inside the blockquote block which can cause inconsistent rendering.

📝 Suggested fix
 > [!NOTE] **Restore Principle**:
->
 > - Velero File System Backup requires **Running Pod** to restore data to PVC
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4004e92 and 4dad0be.

📒 Files selected for processing (1)
  • docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-23T02:11:19.999Z
Learnt from: davidwtf
Repo: alauda/knowledge PR: 60
File: docs/en/solutions/How_to_Use_Pipeline_to_Train_AI_Models.md:829-833
Timestamp: 2025-09-23T02:11:19.999Z
Learning: User prefers not to introduce or maintain external Docker images in their pipeline configurations. They would rather use existing images from their infrastructure or accept some fragility rather than add new external dependencies.

Applied to files:

  • docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md
🪛 LanguageTool
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md

[grammar] ~1130-~1130: Use a hyphen to join words.
Context: ...mal app will be deployed using Move2Kube converted YAMLs): ```bash # 1. Delete t...

(QB_NEW_EN_HYPHEN)

🪛 markdownlint-cli2 (0.18.1)
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md

1121-1121: Blank line inside blockquote

(MD028, no-blanks-blockquote)

🔇 Additional comments (7)
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md (7)

10-27: Excellent scenario applicability table.

The table clearly documents supported and unsupported migration scenarios with appropriate warnings for databases and caches. This helps users understand the tool's capabilities upfront.


100-197: Well-structured Velero configuration guidance.

The side-by-side OADP (source) and BSL (target) configuration with clear emphasis on matching parameters is helpful. Credential placeholders are appropriately marked.


410-800: Well-structured Starlark transformer with comprehensive transformations.

The transformer handles multiple critical migration concerns:

  • Cleanup of OCP-specific fields (SecurityContext, PriorityClass, RuntimeClass)
  • Automatic fsGroup injection for workloads with PVCs
  • StorageClass mapping for volumeClaimTemplates
  • Selector/label mismatch fixes from DeploymentConfig conversions

The logic is sound and well-commented.


804-900: Effective Route to Gateway API conversion script.

The script correctly handles both TLS passthrough (→ TLSRoute) and standard routes (→ HTTPRoute) with appropriate port defaults and configurable Gateway references.


1101-1106: Correct Velero restore configuration.

The restore command appropriately limits resources to those needed for data restoration and applies the necessary transformations via resource modifiers and namespace mappings.


1132-1162: Proper cleanup and deployment sequence.

The cleanup correctly removes temporary restore resources while preserving PVCs, and the deployment properly restores replica counts from the backup file.


369-379: Fix regex pattern in grep command.

The grep pattern on line 372 has a regex issue: [\"']\? treats the ? as a literal character inside the bracket expression, not as a quantifier. This could cause the grep to miss files that need image reference updates, leading to unresolved image references in the target cluster.

🔧 Proposed fix for the regex pattern
-     grep -l "image: .*[\"']\?${key}[\"']\?$" source/*.yaml 2>/dev/null | sort | uniq | while read -r file; do
+     grep -l "image: .*[\"']\{0,1\}${key}[\"']\{0,1\}$" source/*.yaml 2>/dev/null | sort | uniq | while read -r file; do

Alternatively, for better readability and compatibility with extended regex:

-     grep -l "image: .*[\"']\?${key}[\"']\?$" source/*.yaml 2>/dev/null | sort | uniq | while read -r file; do
+     grep -E -l "image: .*[\"']?${key}[\"']?$" source/*.yaml 2>/dev/null | sort | uniq | while read -r file; do

This ensures the ? quantifier (0 or 1 occurrences) is correctly applied to the preceding character class.

Likely an incorrect or invalid review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md:
- Around line 554-557: The current logic unconditionally removes the entire
annotations map (metadata.pop("annotations")) which can delete important non-OCP
annotations; instead implement a selective removal strategy: define an
ANNOTATIONS_TO_REMOVE list of OCP-specific prefixes, iterate the annotations
dict in the cleanup_resource function to collect keys whose names start with any
prefix and pop only those keys (marking modified = True), then if the
annotations dict becomes empty remove metadata["annotations"]; keep existing
variables metadata, annotations and modified names to integrate cleanly.
- Around line 369-379: The current loop uses grep/sed to match and replace image
keys (the while IFS='=' read -r key value; do ... sed -i.bak "s|image:
\([\"']\{0,1\}\)${key}\1$|image: $value|g" "$file"") which breaks when $key or
$value contain regex-special characters; replace this with a yq-based in-place
update for safe YAML handling (e.g., iterate files and run yq -i '..|select(. ==
"'"$key"'") = "'"$value"'"' "$file"') or, if you must keep sed, ensure you
properly escape regex metacharacters in both $key and $value before using them
in the sed pattern and use a robust sed substitution that anchors the image
field only.
🧹 Nitpick comments (6)
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md (6)

105-111: Add security warning about credential handling.

The inline credential creation is convenient for documentation, but consider adding a note about handling secrets securely in production environments (e.g., using sealed secrets, external secret managers, or at minimum ensuring the command history is cleared after execution).

📋 Suggested addition

Add a warning note before or after this command block:

> [!WARNING] **Security Notice**: The credentials shown are examples. In production:
> - Use a secrets management solution (Vault, AWS Secrets Manager, etc.)
> - Avoid storing credentials in shell history (use `HISTCONTROL=ignorespace` or clear history after)
> - Ensure proper RBAC on the namespace containing these secrets

301-301: Emphasize the destructive nature of this command.

This command will shut down all workloads in the namespace simultaneously. Consider adding an additional verification step or breaking this into phases for critical applications.

💡 Suggested enhancement

Add a confirmation step before the shutdown:

# 3a. Preview what will be scaled down
echo "The following resources will be scaled to 0:"
oc get dc,deploy,statefulset -n ${SOURCE_NS} -o name

# 3b. Prompt for confirmation (optional but recommended)
read -p "Proceed with shutdown? (yes/no): " confirmation
[[ "$confirmation" != "yes" ]] && echo "Aborted" && exit 1

# 3c. Execute shutdown
oc scale --replicas=0 dc,deploy,statefulset --all -n ${SOURCE_NS}

1024-1024: Pin the Alpine image version for reproducibility.

Using alpine:latest as the restore helper container can lead to unexpected behavior if the latest tag changes between restores. Pin to a specific version for consistency and reproducibility.

                 {
                   "name": "restore-data",
-                  "image": "alpine:latest",
+                  "image": "alpine:3.19",
                   "command": ["sleep", "3600"],

Note: Line 1056 already suggests replacing the image with one accessible in the environment, which is good guidance.


1160-1163: Consider applying resources in a specific order.

The kubectl apply -f command on a directory applies resources in alphabetical filename order, which may not respect dependencies (e.g., ConfigMaps should be created before Deployments that reference them).

⚡ Suggested improvements

Option 1: Apply resources in dependency order:

# Apply in logical order: configs → storage → workloads → networking
kubectl -n ${TARGET_NS} apply -f myproject/source/source-versionchanged/ --selector=config=true || true
kubectl -n ${TARGET_NS} apply -f myproject/source/source-versionchanged/configmap-*.yaml || true
kubectl -n ${TARGET_NS} apply -f myproject/source/source-versionchanged/secret-*.yaml || true
kubectl -n ${TARGET_NS} apply -f myproject/source/source-versionchanged/service-*.yaml || true
kubectl -n ${TARGET_NS} apply -f myproject/source/source-versionchanged/deployment-*.yaml || true
kubectl -n ${TARGET_NS} apply -f myproject/source/source-versionchanged/statefulset-*.yaml || true
kubectl -n ${TARGET_NS} apply -f myproject/source/source-versionchanged/httproute-*.yaml || true

Option 2: Use server-side apply with wait:

kubectl -n ${TARGET_NS} apply --server-side -f myproject/source/source-versionchanged/
kubectl -n ${TARGET_NS} wait --for=condition=available --timeout=300s deployment --all

Option 3: Convert to Kustomize (if not already structured as such):

kubectl -n ${TARGET_NS} apply -k myproject/source/source-versionchanged/

1196-1201: Add validation and error handling to rollback procedure.

The rollback script reads from replicas_backup.txt without validating the file format or handling potential errors during scaling operations. A corrupted file or failed scaling operation could leave the system in an inconsistent state.

🛡️ Enhanced rollback script
# 2. Source Recovery with validation
if [[ ! -f replicas_backup.txt ]]; then
  echo "ERROR: replicas_backup.txt not found!"
  exit 1
fi

echo "Starting rollback from backup..."
failed_resources=()

while IFS='=' read -r resource replicas; do
  # Skip empty lines and validate format
  [[ -z "$resource" || -z "$replicas" ]] && continue
  [[ ! "$replicas" =~ ^[0-9]+$ ]] && {
    echo "WARNING: Invalid replica count for $resource: $replicas"
    continue
  }
  
  echo "Scaling $resource to $replicas replicas..."
  if oc scale --replicas=$replicas $resource -n ${SOURCE_NS} 2>&1; then
    echo "  ✓ Success"
  else
    echo "  ✗ Failed"
    failed_resources+=("$resource")
  fi
done < replicas_backup.txt

# Report results
if [[ ${#failed_resources[@]} -gt 0 ]]; then
  echo ""
  echo "WARNING: Failed to scale the following resources:"
  printf '  - %s\n' "${failed_resources[@]}"
  exit 1
else
  echo ""
  echo "✓ All resources scaled successfully"
fi

1123-1123: Minor formatting improvements (optional).

Static analysis tools flagged:

  1. Line 1123: Blank line inside blockquote (markdownlint MD028)
  2. Line 1132: Consider hyphenating "Move2Kube-converted" for clarity

These are stylistic nitpicks and don't impact functionality.

Also applies to: 1132-1132

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4dad0be and 2cb1e4e.

📒 Files selected for processing (1)
  • docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-23T02:11:19.999Z
Learnt from: davidwtf
Repo: alauda/knowledge PR: 60
File: docs/en/solutions/How_to_Use_Pipeline_to_Train_AI_Models.md:829-833
Timestamp: 2025-09-23T02:11:19.999Z
Learning: User prefers not to introduce or maintain external Docker images in their pipeline configurations. They would rather use existing images from their infrastructure or accept some fragility rather than add new external dependencies.

Applied to files:

  • docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md
🪛 LanguageTool
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md

[grammar] ~1132-~1132: Use a hyphen to join words.
Context: ...mal app will be deployed using Move2Kube converted YAMLs): ```bash # 1. Delete t...

(QB_NEW_EN_HYPHEN)

🪛 markdownlint-cli2 (0.18.1)
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md

1123-1123: Blank line inside blockquote

(MD028, no-blanks-blockquote)

🔇 Additional comments (2)
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md (2)

1-1208: Overall: Comprehensive and well-structured migration guide.

This document provides excellent coverage of the OCP to ACP migration process with thoughtful considerations for:

  • ✅ Incremental backup strategy to minimize downtime
  • ✅ Clear separation of data and application layer restoration
  • ✅ Automated resource transformation with Starlark
  • ✅ fsGroup injection for PVC permission handling
  • ✅ Rollback procedures for risk mitigation

The step-by-step approach with verification points at each phase will help teams execute migrations safely. The inclusion of both warm-up and final backup strategies demonstrates operational maturity.

Recommended next steps (based on the issues flagged above):

  1. Enhance robustness of the image reference replacement logic
  2. Make annotation removal in Starlark more selective
  3. Add Gateway API version validation to the route conversion script
  4. Strengthen rollback procedure with error handling

The document is ready for use with the suggested improvements applied to reduce operational risk.


804-901: Add Gateway API and TLSRoute availability checks before script execution.

The script uses TLSRoute apiVersion: gateway.networking.k8s.io/v1alpha2 (line 839), which remains experimental and may not be installed in all Kubernetes clusters. Additionally, it assumes a Gateway exists at ${GATEWAY_NS}/${GATEWAY_NAME} without validation. Add these checks at the start of the main() function:

Suggested validation
# Validate Gateway API CRDs are installed
if ! kubectl get crd gateways.gateway.networking.k8s.io >/dev/null 2>&1; then
    echo "ERROR: Gateway API CRDs not found. Install from:"
    echo "  https://gateway-api.sigs.k8s.io/guides/#installing-gateway-api"
    exit 1
fi

# Validate TLSRoute support
if ! kubectl get crd tlsroutes.gateway.networking.k8s.io >/dev/null 2>&1; then
    echo "WARNING: TLSRoute CRD not found. Passthrough routes will fail."
fi

# Validate Gateway exists
if ! kubectl get gateway "${GATEWAY_NAME}" -n "${GATEWAY_NS}" >/dev/null 2>&1; then
    echo "WARNING: Gateway ${GATEWAY_NS}/${GATEWAY_NAME} not found."
    echo "  Ensure it is created before applying generated routes."
fi

@shibalu
Copy link

shibalu commented Jan 13, 2026

/test-pass

@chideat chideat requested a review from sinbadonline January 13, 2026 09:44
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md (4)

105-111: Consider documenting security best practices for credential handling.

While this credential creation method is functional, the credentials are exposed in shell history and process listings. Consider adding a note about alternative secure methods.

🔒 More secure alternative

Add a note suggesting users can:

  1. Create a separate credentials file and reference it: --from-file=cloud=/path/to/secure/credentials
  2. Clear shell history after running sensitive commands: history -d $((HISTCMD-1))
  3. Use a password manager or secrets management tool to inject credentials

554-557: Review the aggressive annotation removal strategy.

The Starlark script removes ALL annotations from every resource. While this eliminates OCP-specific annotations, it also removes potentially valuable application-specific annotations (e.g., monitoring configs, custom business metadata, documentation links).

Consider either:

  1. Selective removal - Only remove known OCP-specific annotation prefixes:

    OCP_ANNOTATION_PREFIXES = [
        "openshift.io/",
        "kubectl.kubernetes.io/",
        "alpha.image.policy.openshift.io/",
        "image.openshift.io/",
    ]
    
    if "annotations" in metadata:
        annotations = metadata["annotations"]
        if annotations != None:
            for key in list(annotations.keys()):
                for prefix in OCP_ANNOTATION_PREFIXES:
                    if key.startswith(prefix):
                        annotations.pop(key)
                        modified = True
  2. Document the behavior - Add a clear note that all annotations are removed and users should re-apply custom annotations if needed after migration.


1061-1061: Clarify the relationship between runAsUser and fsGroup.

The warning states "runAsUser: 1000 must match the APP_GID" which is technically imprecise. APP_GID sets the fsGroup (Group ID), while runAsUser sets the User ID. These are different POSIX concepts:

  • fsGroup (set to APP_GID = 1000) controls the group ownership of mounted volumes
  • runAsUser controls the user ID the process runs as

They don't need to be the same value; the user just needs to be a member of the group. However, using the same value (1000) for simplicity is a common pattern.

📝 Suggested clarification

Rewrite the warning as:

UID and GID Consistency: The runAsUser: 1000 should correspond to a user that is a member of the group specified by fsGroup (set via APP_GID = 1000 in the Starlark transformer). For simplicity, this guide uses the same value (1000) for both the user ID and group ID. If you modified APP_GID, consider updating runAsUser accordingly or ensure the specified user is a member of the target group.


1139-1139: Minor grammar improvement for compound modifier.

Consider hyphenating "Move2Kube converted" to "Move2Kube-converted" when used as a compound adjective modifying "YAMLs".

-# 1. Delete temp Pods used for restore (PVC and Data retained)
+# 1. Delete temp Pods used for restore (Note: Formal app will be deployed using Move2Kube-converted YAMLs; PVC and Data retained)

Based on static analysis hints.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2cb1e4e and b8788d8.

📒 Files selected for processing (1)
  • docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-23T02:11:19.999Z
Learnt from: davidwtf
Repo: alauda/knowledge PR: 60
File: docs/en/solutions/How_to_Use_Pipeline_to_Train_AI_Models.md:829-833
Timestamp: 2025-09-23T02:11:19.999Z
Learning: User prefers not to introduce or maintain external Docker images in their pipeline configurations. They would rather use existing images from their infrastructure or accept some fragility rather than add new external dependencies.

Applied to files:

  • docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md
🪛 LanguageTool
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md

[grammar] ~1139-~1139: Use a hyphen to join words.
Context: ...mal app will be deployed using Move2Kube converted YAMLs): ```bash # 1. Delete t...

(QB_NEW_EN_HYPHEN)

🪛 markdownlint-cli2 (0.18.1)
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md

1130-1130: Blank line inside blockquote

(MD028, no-blanks-blockquote)

🔇 Additional comments (2)
docs/en/solutions/How_to_Migrate_from_ocp_to_acp.md (2)

301-301: Verify the scope of scale-down operation.

The --all flag will scale down every DeploymentConfig, Deployment, and StatefulSet in the namespace. If ${SOURCE_NS} contains multiple applications or shared infrastructure components outside the migration scope, this could cause unintended downtime.

Consider using label selectors to target only the resources being migrated:

# Example: Only scale down resources with specific label
oc scale --replicas=0 dc,deploy,statefulset -l app=my-migration-app -n ${SOURCE_NS}

Alternatively, add a clear warning in the documentation that users should verify the namespace contains only resources intended for migration.


857-862: Document path matching conversion limitations.

The path type detection uses a heuristic (no trailing slash or wildcard → Exact, otherwise → PathPrefix). This works for common cases but may not handle all OpenShift Route path matching semantics, especially Routes using regex patterns or specific path matching configurations.

Add a note in the documentation advising users to:

  1. Review the generated HTTPRoute/TLSRoute resources
  2. Verify path matching behavior aligns with the original Route configuration
  3. Manually adjust path.type in HTTPRoute if needed

For complex routing scenarios, consider documenting a manual review checklist.

@chideat chideat enabled auto-merge (squash) January 14, 2026 09:03
@chideat chideat removed the request for review from sinbadonline January 14, 2026 09:04
@chideat chideat disabled auto-merge January 14, 2026 09:04
@chideat chideat enabled auto-merge (squash) January 14, 2026 09:04
@chideat chideat requested a review from sinbadonline January 14, 2026 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants