Skip to content

[LFX Term : 01 ]Restoration: CItyscape-Sythia Curb detection#441

Open
NishantSinghhhhh wants to merge 2 commits into
kubeedge:mainfrom
NishantSinghhhhh:Restoration-CityScapeSynthia-Curb-Detection
Open

[LFX Term : 01 ]Restoration: CItyscape-Sythia Curb detection#441
NishantSinghhhhh wants to merge 2 commits into
kubeedge:mainfrom
NishantSinghhhhh:Restoration-CityScapeSynthia-Curb-Detection

Conversation

@NishantSinghhhhh
Copy link
Copy Markdown
Contributor

What type of PR is this?

/kind bug
/kind cleanup

What this PR does / why we need it:

This PR restores and fixes the cityscapes-synthia curb-detection lifelong learning benchmark so it runs end-to-end without manual environment-specific setup.

Key changes:

  • Replace all absolute paths (/home/nishant/...) in benchmarkingjob.yaml and testenv/testenv.yaml with ./ relative paths so the example works on any machine out of the box
  • Fix accuracy.py — remove dependency on make_data_loader and tqdm; directly read ground-truth labels from file paths via PIL for simpler, more robust evaluation
  • Fix task_allocation_by_origin.py — make task_extractor optional with a sensible default, handle None/empty samples gracefully, and simplify origin detection logic
  • Fix basemodel.py — auto-insert RFNet directory into sys.path so internal imports resolve without a manual PYTHONPATH export; set pin_memory=False to fix DataLoader errors on
    CPU-only machines
  • Fix lifelong_learning.py — correct round index passed to _train, use dtype=object for ragged numpy arrays, and fix edge_task_index construction from the eval output
    directory path
  • Add sedna_src/ to .gitignore

Which issue(s) this PR fixes:

Fixes #230

@kubeedge-bot
Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: NishantSinghhhhh
To complete the pull request process, please assign jaypume after the PR has been reviewed.
You can assign the PR to them by writing /assign @jaypume in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubeedge-bot kubeedge-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 21, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the lifelong learning paradigm and the curb-detection example, including path updates in YAML configurations, refactoring dataset loading and evaluation logic, and improving the robustness of task allocation and metric calculations. Feedback was provided regarding the replacement of exceptions with warnings when datasets are empty, which could lead to silent failures during execution.

@NishantSinghhhhh
Copy link
Copy Markdown
Contributor Author

Screencast.from.2026-05-21.13-19-18.webm

Complete running of the example

@NishantSinghhhhh
Copy link
Copy Markdown
Contributor Author

PR — Cityscapes-Synthia Curb Detection Lifelong Learning Benchmark

Got the cityscapes-synthia/lifelong_learning_bench/curb-detection example running end-to-end. The version on main dies immediately on wrong module paths before any training begins. Once those were fixed a chain of follow-on issues appeared — wrong hyperparameter wiring, a broken accuracy.py, shape mismatches in metric computation, a deprecated torchvision API call, and two bugs in the core lifelong learning paradigm itself. All addressed below.


Summary

File What changed Why
testalgorithms/rfnet/rfnet_algorithm.yaml 3 wrong module paths + 4 missing hyperparameters paths pointed at a folder that doesn't exist; hyperparams were never passed to TrainArgs
benchmarkingjob.yaml workspace absolute → relative path hardcoded /home/nishant/... breaks the example on every other machine
testenv/testenv.yaml train_index / test_index absolute → relative paths same reason as above
testalgorithms/rfnet/basemodel.py auto sys.path insert + pin_memory=False RFNet internal imports fail without PYTHONPATH; pin_memory=True crashes on CPU
testalgorithms/rfnet/task_allocation_by_origin.py task_extractor optional, None-safe sample handling, simplified loop task_extractor was sometimes None; nested city loop crashed on non-list samples
testenv/accuracy.py remove make_data_loader/tqdm; read labels directly via PIL make_data_loader rebuilt the full DataLoader unnecessarily and was the wrong call path for evaluation
core/.../lifelong_learning.py fix round index, dtype=object, fix edge_task_index three separate bugs that together caused the eval round to fail silently
RFNet/dataloaders/datasets/cityscapes.py guard data.x parsing against empty datasets, downgrade exceptions to warnings bare Exception on empty split killed the process instead of skipping
RFNet/utils/args.py workers, base_size, crop_size, batch_size wired to kwargs values were hardcoded so YAML hyperparameters were silently ignored
RFNet/utils/metrics.py guard FWIoU against empty confusion matrix, fix shape mismatch loop division-by-zero crash when matrix is empty; shape check was too brittle
RFNet/utils/summaries.py range=value_range= in make_grid calls range was removed in torchvision ≥ 0.13; causes TypeError on import
.gitignore add sedna_src/ local sedna source checkout was showing as untracked in every git status

Per-file walkthrough

testalgorithms/rfnet/rfnet_algorithm.yaml — wrong paths + missing hyperparameters **Wrong module URLs**

The directory ./examples/curb-detection/ does not exist anywhere in the repo. Ianvs resolves module URLs at startup and immediately raises FileNotFoundError — the benchmark dies before a single line of training code runs.

- url: "./examples/curb-detection/lifelong_learning_bench/testalgorithms/rfnet/basemodel.py"
+ url: "./examples/cityscapes-synthia/lifelong_learning_bench/curb-detection/testalgorithms/rfnet/basemodel.py"
 
- url: "./examples/curb-detection/lifelong_learning_bench/testalgorithms/rfnet/task_definition_by_origin.py"
+ url: "./examples/cityscapes-synthia/lifelong_learning_bench/curb-detection/testalgorithms/rfnet/task_definition_by_origin.py"
 
- url: "./examples/curb-detection/lifelong_learning_bench/testalgorithms/rfnet/task_allocation_by_origin.py"
+ url: "./examples/cityscapes-synthia/lifelong_learning_bench/curb-detection/testalgorithms/rfnet/task_allocation_by_origin.py"

Missing hyperparameters

TrainArgs in args.py picks up base_size, crop_size, batch_size, and workers from **kwargs. If they aren't declared here, ianvs never passes them. Adding them to the YAML with their original default values makes these parameters visible and overridable per-benchmark.

  hyperparameters:
    - learning_rate:
        values: [0.0001]
    - epochs:
        values: [1]
+   - base_size:
+       values: [1024]
+   - crop_size:
+       values: [768]
+   - batch_size:
+       values: [4]
+   - workers:
+       values: [4]
benchmarkingjob.yaml — hardcoded absolute path The workspace path was hardcoded to a specific user's home directory. Anyone else cloning the repo gets `PermissionError` or silent failures because ianvs tries to write output to a path that doesn't exist on their machine.
- workspace: "/home/nishant/LOCAL_DISK_D/1/ianvs/workspace/curb-detection"
+ workspace: "./workspace/curb-detection"
testenv/testenv.yaml — hardcoded absolute paths Same problem. `core/testenvmanager/dataset/dataset.py` checks `os.path.isfile(url)` — on any other machine this raises `RuntimeError: dataset file is not a local file and not an absolute path`.
- train_index: "/home/nishant/LOCAL_DISK_D/1/ianvs/dataset/curb-detection/train_data/index.txt"
- test_index:  "/home/nishant/LOCAL_DISK_D/1/ianvs/dataset/curb-detection/test_data/index.txt"
+ train_index: "./dataset/curb-detection/train_data/index.txt"
+ test_index:  "./dataset/curb-detection/test_data/index.txt"
testalgorithms/rfnet/basemodel.py — sys.path + pin_memory **Auto sys.path insertion**

basemodel.py imports from RFNet.dataloaders, RFNet.utils, etc. Without the RFNet/ subdirectory on sys.path, every user has to manually export PYTHONPATH=... before invoking ianvs. Inserting the path at import time makes the example self-contained.

+ import sys
+
+ _rfnet_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), "RFNet")
+ if _rfnet_dir not in sys.path:
+     sys.path.insert(0, _rfnet_dir)

pin_memory=False

pin_memory=True is a GPU-only optimisation. On CPU-only machines PyTorch raises RuntimeError: cannot pin 'torch.FloatTensor' only dense CPU tensors can be pinned. Setting it to False is safe in all environments.

- self.validator.test_loader = DataLoader(data, ..., pin_memory=True)
+ self.validator.test_loader = DataLoader(data, ..., pin_memory=False)
testalgorithms/rfnet/task_allocation_by_origin.py — three fixes **`task_extractor` made optional**

The lifelong learning paradigm does not always pass task_extractor — in some rounds it's None. The old required positional argument raised TypeError: __call__() missing 1 required positional argument.

  def __init__(self, **kwargs):
      self.default_origin = kwargs.get("default", None)
+     self.task_extractor = None
 
- def __call__(self, task_extractor, samples: BaseDataSource):
-     self.task_extractor = task_extractor
+ def __call__(self, task_extractor=None, samples: BaseDataSource = None):
+     if task_extractor is not None:
+         self.task_extractor = task_extractor
+     if self.task_extractor is None:
+         self.task_extractor = {"real": 0, "sim": 1}

None-safe sample path extraction

During unseen-task rounds, samples can arrive as raw path strings rather than [path, depth_path] tuples. _x[0] on a string gives the first character, not the first element — city name lookup always fails, silently classifying every sample as "sim".

- for city in cities:
-     if city in _x[0]:
-         is_real = True
-         sample_origins.append("real")
-         break
- if not is_real:
-     sample_origins.append("sim")
+ if _x is None or (hasattr(_x, '__len__') and len(_x) == 0):
+     sample_origins.append("real")
+     continue
+ sample_path = _x[0] if isinstance(_x, (list, tuple)) else str(_x)
+ is_real = any(city in sample_path for city in cities)
+ sample_origins.append("real" if is_real else "sim")

Safe .get() with default

dict.get() without a default returns None when the key is missing. int(None) raises TypeError.

- allocations = [int(self.task_extractor.get(sample_origin)) for sample_origin in sample_origins]
+ allocations = [int(self.task_extractor.get(origin, 0)) for origin in sample_origins]
testenv/accuracy.py — replace make_data_loader with PIL reads The old code rebuilt a full PyTorch DataLoader inside the metric function. This was wrong in two ways:
  1. make_data_loader applies image transforms (resize, normalise, convert to tensor). Ground-truth labels come out as float tensors rather than integer label maps, corrupting the Evaluator's confusion matrix.
  2. y_true at eval time is a list of file paths, not a BaseDataSource. Passing it to make_data_loader caused AttributeError: 'list' object has no attribute 'x'.
- from tqdm import tqdm
- from RFNet.dataloaders import make_data_loader
+ import numpy as np
+ from PIL import Image
 
- _, _, test_loader, num_class = make_data_loader(args, test_data=y_true)
- for i, (sample, img_path) in enumerate(tqdm(test_loader)):
-     image, target = sample['image'], sample['label']
-     if args.cuda:
-         image, target = image.cuda(), target.cuda()
-     target[target > evaluator.num_class-1] = 255
-     target = target.cpu().numpy()
-     evaluator.add_batch(target, y_pred[i])
+ for i, label_path in enumerate(y_true):
+     if i >= len(y_pred):
+         break
+     target = np.array(Image.open(label_path.rstrip()))
+     target[target > evaluator.num_class - 1] = 255
+     pred = np.array(y_pred[i])
+     while pred.ndim > 2:
+         pred = pred[0]
+     evaluator.add_batch(target, pred)
 
+ if evaluator.confusion_matrix.sum() == 0:
+     return 0.0
core/.../lifelong_learning.py — three core paradigm bugs **Wrong round index passed to `_train`**

_train uses this integer to build the output path (output/train/{rounds}). Passing r (current round, starting at 1) wrote checkpoints to output/train/1/, output/train/2/, etc. _eval always looks in output/train/0/ and silently used random weights.

  self.cloud_task_index = self._train(self.cloud_task_index,
                                      train_dataset_file,
-                                     r)
+                                     0)

dtype=object for ragged numpy arrays

unseen_tasks can contain tuples (image_path, depth_path) or plain strings. NumPy raises ValueError: setting an array element with a sequence when it encounters mixed-length elements.

- unseen_task_train_samples.x = np.array(unseen_tasks)
- unseen_task_train_samples.y = np.array(unseen_task_labels)
+ unseen_task_train_samples.x = np.array(unseen_tasks, dtype=object)
+ unseen_task_train_samples.y = np.array(unseen_task_labels, dtype=object)

edge_task_index path construction

job.evaluate() returns evaluation metrics, not a file path. The index file is always written to eval_output_dir/index.pkl by sedna internals.

- edge_task_index = job.evaluate(eval_dataset, metrics=metric_func)
+ job.evaluate(eval_dataset, metrics=metric_func)
+ edge_task_index = os.path.join(eval_output_dir, "index.pkl")
RFNet/dataloaders/datasets/cityscapes.py — guard empty splits `data.x[0]` was accessed unconditionally before checking whether the list was empty. During unseen-task rounds no samples may match the current task, so `data.x` can be empty, raising `IndexError`. Hard `Exception` raises are also downgraded to warnings — an empty split is expected behaviour during incremental rounds.
- self.images[split] = [img[0] for img in data.x] if hasattr(data, "x") else data
- if hasattr(data, "x") and len(data.x[0]) == 1:
+ if hasattr(data, "x") and len(data.x) > 0:
+     self.images[split] = [img[0] for img in data.x]
+     if len(data.x[0]) == 1:
          self.disparities[split] = self.images[split]
-     elif hasattr(data, "x") and len(data.x[0]) == 2:
-         self.disparities[split] = [img[1] for img in data.x]
+     elif len(data.x[0]) == 2:
+         self.disparities[split] = [img[1] for img in data.x]
      else:
-         self.disparities[split] = data
+         self.disparities[split] = self.images[split]
+ else:
+     self.images[split] = []
+     self.disparities[split] = []
 
- raise Exception("No RGB images for split=[%s] found in %s" % (split, self.images_base))
+ print(f"Warning: No RGB images for split=[{split}]")
+ return
RFNet/utils/args.py — wire kwargs to TrainArgs `TrainArgs` accepted `**kwargs` but hardcoded all four values regardless. Any YAML hyperparameters were silently thrown away. Original defaults are preserved.
- self.workers = 4
- self.base_size = 1024
- self.crop_size = 768
- self.batch_size = 4
+ self.workers = kwargs.get("workers", 4)
+ self.base_size = kwargs.get("base_size", 1024)
+ self.crop_size = kwargs.get("crop_size", 768)
+ self.batch_size = kwargs.get("batch_size", 4)
RFNet/utils/metrics.py — three fixes **Division-by-zero guard in FWIoU**

If evaluate() is called before any batches are added, the confusion matrix is all zeros and the division produces NaN/inf, corrupting the leaderboard sort.

  def Frequency_Weighted_Intersection_over_Union(self):
+     if self.confusion_matrix.sum() == 0:
+         return 0.0
      freq = np.sum(self.confusion_matrix, axis=1) / np.sum(self.confusion_matrix)

Robust shape normalisation in add_batch

Predictions can have extra batch dimensions, e.g. (1, 1, H, W) vs ground truth (H, W). The old single [0] strip still crashed with two extra dimensions.

- if gt_image.shape != pre_image.shape:
-     pre_image = pre_image[0]
+ while pre_image.ndim > gt_image.ndim:
+     pre_image = pre_image[0]

Guarded per-class FWIoU print

CFWIoU only contains entries for classes present in the data. On a road-only subset CFWIoU[1] raises IndexError.

- print("road         : %.6f" % (CFWIoU[0] * 100.0), "%\t")
- print("sidewalk     : %.6f" % (CFWIoU[1] * 100.0), "%\t")
+ if len(CFWIoU) > 0:
+     print("road         : %.6f" % (CFWIoU[0] * 100.0), "%\t")
+ if len(CFWIoU) > 1:
+     print("sidewalk     : %.6f" % (CFWIoU[1] * 100.0), "%\t")
RFNet/utils/summaries.py — deprecated make_grid argument torchvision renamed `range=` to `value_range=` in v0.13. The old name raises `TypeError` at import time, blocking all training setup. Four occurrences fixed.
- grid_image = make_grid(..., normalize=False, range=(0, 255))
+ grid_image = make_grid(..., normalize=False, value_range=(0, 255))
.gitignore — add sedna_src/ ```diff + sedna_src/ ```
---

How to verify

# From the ianvs root directory
source venv/bin/activate
ianvs -f examples/cityscapes-synthia/lifelong_learning_bench/curb-detection/benchmarkingjob.yaml

No manual PYTHONPATH export needed — basemodel.py handles it automatically now.

Expected output:

| rank | algorithm               | accuracy | samples_transfer_ratio | paradigm         |
|  1   | rfnet_lifelong_learning |  0.2123  |         0.4649         | lifelonglearning |

@NishantSinghhhhh
Copy link
Copy Markdown
Contributor Author

  1. Print function to Logger / Exceptions

Copy link
Copy Markdown
Collaborator

@MooreZheng MooreZheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Change print function to logger or exception
  2. Cityscape is different from cloud-robotics
  3. this pull request is different from #297 for dataset and algorithm

Copy link
Copy Markdown

@abhisheksainimitawa abhisheksainimitawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: [LFX Term 01] Restoration: CItyscape-Sythia Curb detection

Contributor: @abhisheksainimitawa | LFX Mentorship 2026 Term 2 Pre-test Task 2


What it does: PR #441 restores the cityscapes-synthia/lifelong_learning_bench/curb-detection example. It fixes broken YAML paths, repairs the accuracy evaluation script, and addresses hardware/library compatibility bugs in RFNet code (deprecated torchvision keyword, DataLoader worker imports, task allocation interface). It also modifies the shared lifelong_learning.py paradigm controller in 3 hunks, making it directly relevant to all lifelong learning examples including robot-cityscapes-synthia.

Recommendation: Merge. The sibling example fixes are correct and address real bugs. All three lifelong_learning.py hunks are correct: Hunk 1 passes 0 instead of r for the initial training call, which correctly sets HAS_COMPLETED_INITIAL_TRAINING=False via the rounds < 1 check in _train() — passing r=1 at that call site would incorrectly signal initial training is already complete. Hunks 2 and 3 are also correct.


What Makes This Review Unique

Existing reviews flagged the exceptions-to-warnings pattern in cityscapes.py and suggested using a logger. This review adds:

  • Hunk 1 _train(...0) semantics verified against _train() implementation: _train() uses the rounds parameter both for the output directory path (output/train/{rounds}) and for the HAS_COMPLETED_INITIAL_TRAINING env flag (False when rounds < 1, True otherwise). Passing 0 at the if r == 1 call site correctly sets the flag to False for the initial training round. The original r=1 would have set it to True immediately — a semantic bug. Not analyzed in any existing comment.
  • All 3 lifelong_learning.py hunks confirmed orthogonal to robot-cityscapes-synthia via layered execution: The no-inference mode calls my_eval() at line 140. None of the 3 hunks touch this code path. Confirmed by running the layered stack with PR #441 applied and observing the same execution result as without it.
  • Sibling fixes map directly to open issues in robot-cityscapes-synthia: The RFNet value_range= fix, sys.path.insert pattern, task_extractor optional parameter, cityscapes guards, and train_index/test_index rename each correspond to open issues (#472, #473, #79) in the sibling robot-cityscapes-synthia example.

1. Problem: Complexity and Difficulty of the Bug

The PR makes correct fixes across three separate concern areas: sibling example configuration, hardware/library compatibility, and the shared lifelong_learning.py paradigm controller. The sibling example fixes cover the same bug classes as issues #472, #471, and #473 in robot-cityscapes-synthia.

The three lifelong_learning.py changes require careful reading because they affect different execution modes:

Hunk 1 (line 268): hard-example-mining mode round index. The _train() method uses the rounds argument for two purposes: the output directory path (output/train/{rounds}) and the HAS_COMPLETED_INITIAL_TRAINING env flag (False when rounds < 1, True otherwise). In the if r == 1 branch, PR #441 passes 0 instead of r. This is correct: rounds=0 sets HAS_COMPLETED_INITIAL_TRAINING=False, accurately reflecting that the initial training has not yet completed. The original code with r=1 would set the flag to True at the very first call, which is semantically wrong. The else branch for rounds 2, 3, ... correctly passes r unchanged.

Hunk 2 (line 344): _inference() numpy dtype adds dtype=object to np.array(unseen_tasks). This is a correct fix for ragged-array deprecation warnings and benefits any example that reaches the inference path with variable-length sample lists.

Hunk 3 (line 389): _eval() edge task index construction changes from trusting job.evaluate() to return a usable index path, to constructing the path explicitly as os.path.join(eval_output_dir, "index.pkl"). This is pragmatically correct, as job.evaluate() does not reliably return a path. _eval() is only called in hard-example-mining mode, so Hunk 3 does not affect the no-inference execution path.

2. Code Review Finding: Sibling Example Fixes Confirming Shared Issues

File changed in PR #441 Bug fixed Issue confirmed in robot-cityscapes-synthia
RFNet/utils/summaries.py range=(0,255) to value_range=(0,255) Issue #472 Bug A: identical fix needed in ERFNet/utils/summaries.py
basemodel.py sys.path.insert(0, _rfnet_dir) + pin_memory=False Issue #472 Bug B: same pattern needed in ERFNet basemodel.py
task_allocation_by_origin.py task_extractor made optional, default fallback added Issue #473 Bug A: same interface contract mismatch
RFNet/dataloaders/datasets/cityscapes.py Guards before data.x[0] access; exceptions to warnings Issue #473 Bug B: same safety checks needed in ERFNet dataloaders
testenv/testenv.yaml train_url/test_url to train_index/test_index Issue #79: same rename needed in robot-cityscapes-synthia testenv.yaml

3. Execution Video

Watch execution recording on Google Drive


A. Independent Execution: PR #441 applied alone on main (no other fixes)

git fetch origin pull/441/head:pr-441
git checkout pr-441
ianvs -f examples/robot-cityscapes-synthia/lifelong_learning_bench/semantic-segmentation/benchmarkingjob.yaml

Section A-1: PR #441 applied alone, same path crash as main

RuntimeError: not found testenv config file
  (./examples/class_increment_semantic_segmentation/lifelong_learning_bench/testenv/testenv.yaml) in local

PR #441 modifies only lifelong_learning.py and the cityscapes-synthia/curb-detection sibling example files. It does not touch any robot-cityscapes-synthia YAML configuration file. Running the robot-cityscapes-synthia example with PR #441 applied alone produces the identical path crash as unpatched main. None of PR #441 changes are reachable from that example until PR #366 is applied first.

Section A-2: Diff of PR #441, shared lifelong_learning.py changes

git fetch origin pull/441/head:pr-441
git diff main pr-441 -- core/testcasecontroller/algorithm/paradigm/lifelong_learning/lifelong_learning.py
# Hunk 1 -- hard-example-mining mode, if r==1 branch (does NOT affect no-inference mode):
-    self.cloud_task_index = self._train(self.cloud_task_index, train_dataset_file, r)
+    self.cloud_task_index = self._train(self.cloud_task_index, train_dataset_file, 0)
# rounds=0 sets HAS_COMPLETED_INITIAL_TRAINING=False in _train(); r=1 would set it True (wrong)

# Hunk 2 -- _inference() path, line 344 (does NOT affect no-inference mode):
-    unseen_task_train_samples.x = np.array(unseen_tasks)
-    unseen_task_train_samples.y = np.array(unseen_task_labels)
+    unseen_task_train_samples.x = np.array(unseen_tasks, dtype=object)
+    unseen_task_train_samples.y = np.array(unseen_task_labels, dtype=object)

# Hunk 3 -- _eval() function, line 389 (called from hard-example-mining, NOT no-inference):
-    edge_task_index = job.evaluate(eval_dataset, metrics=metric_func)
+    job.evaluate(eval_dataset, metrics=metric_func)
+    edge_task_index = os.path.join(eval_output_dir, "index.pkl")

All three hunks modify code paths only reached in hard-example-mining mode or the _inference() helper. The no-inference mode used by robot-cityscapes-synthia calls my_eval() at line 370, a separate function not touched by any of these hunks.


Step 0: Local fixes applied before Section B

These fixes were applied to the running-stack branch (commit 70e8be5) before cherry-picking PR #441, to isolate PR #441 contribution from other known blockers in the robot-cityscapes-synthia example:

Fix applied File changed What it fixes
sys.path.insert(0, ERFNet_dir) before imports basemodel.py Issue #472 Bug B: bare relative ERFNet imports fail in DataLoader subprocesses
range=(0,255) to value_range=(0,255) at 4 sites ERFNet/utils/summaries.py Issue #472 Bug A: deprecated torchvision keyword
self.cuda = torch.cuda.is_available() + device detection ERFNet/utils/args.py, ERFNet/train.py Issue #471: hardcoded .cuda() crashes on CPU-only machines
__call__(self, samples) with task_extractor removed task_allocation_by_domain.py Issue #473 Bug A: Sedna calls allocator(samples) with 1 arg
Guards before data.x[0] access ERFNet/dataloaders/datasets/cityscapes.py Issue #473 Bug B: unsafe array access
job.inference_2(...) to job.inference(...) line 328 lifelong_learning.py Issue #470: inference_2 not in Sedna API
job.my_inference(...) to seen_estimator.predict(...) line 155 lifelong_learning.py Issue #461: my_inference not in Sedna API
train_url/test_url to train_index/test_index testenv/testenv.yaml Issue #79: field names not recognized by dataset.py

B. Layered Stack Execution: running-stack + PR #441 lifelong_learning.py changes

git checkout running-stack
git cherry-pick 6ee1ba4   # PR #441 main commit
ianvs -f examples/robot-cityscapes-synthia/lifelong_learning_bench/semantic-segmentation/benchmarkingjob.yaml

Section B-1: PR #441 applied on top of running-stack, pipeline advances, crashes at seen_estimator.predict()

PR-441-Section03

[ERROR] base.py(181) - RetryError[<Future at 0x1589b2450 state=finished returned NoneType>]
[INFO]  lifelong_learning.py(145) - {"accuracy": 0.0}
Traceback (most recent call last):
  ...
    raise EOFError
EOFError
RuntimeError: (paradigm=lifelonglearning) pipeline runs failed, error:

Three findings from this run:

Finding 1: PR #441 introduces no regression and no improvement for no-inference mode. All three lifelong_learning.py hunks affect hard-example-mining mode and _inference() paths, neither of which is reached in the no-inference execution path. The output is identical to the pre-PR #441 state, confirming PR #441 lifelong_learning.py changes are strictly orthogonal to this execution path.

Finding 2: my_eval() succeeds and returns the expected dict format. The log line lifelong_learning.py(145) [INFO] - {"accuracy": 0.0} is direct evidence that job.evaluate() returns a metrics dict and the caller at line 140 receives it cleanly. The accuracy: 0.0 value is expected because the knowledge base has not been populated by a Sedna server.

Finding 3: The crash has moved one layer deeper. The pipeline clears my_eval() completely and crashes at seen_estimator.predict() (line 150) via FileOps.load() then joblib.load() then pickle.load(), raising EOFError from an empty index.pkl. This is a Sedna server dependency surfacing at a different call site, outside the scope of this PR.


Sub-comment Summary

File Line(s) Sub-comment Topic
lifelong_learning.py 271 Hunk 1: _train(... 0) is verified correct; rounds=0 sets HAS_COMPLETED_INITIAL_TRAINING=False in _train(), which is the right semantic for the initial training call. The original r=1 would have set the flag to True immediately (wrong).

@kubeedge-bot kubeedge-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 3, 2026
@NishantSinghhhhh NishantSinghhhhh force-pushed the Restoration-CityScapeSynthia-Curb-Detection branch 3 times, most recently from 34821b0 to 1d3240a Compare June 3, 2026 17:02
- Add benchmarkingjob.yaml, testenv.yaml, and rfnet_algorithm.yaml with correct paths and parameters
- Add comprehensive README covering installation, dataset prep, configuration, execution, and troubleshooting
- Refactor cityscapes.py dataset loader with safe empty-data handling and logging
- Convert print() calls to logger throughout metrics.py, accuracy.py, and cityscapes.py
- Fix metrics.py confusion matrix edge cases and dimension mismatches
- Update accuracy.py to use PIL directly instead of make_data_loader
- Expose base_size, crop_size, batch_size, workers as configurable hyperparameters
- Fix value_range API change in summaries.py for PyTorch >= 2.x
- Fix task_allocation_by_origin.py with safe path detection
- Fix lifelong_learning.py to pass correct round index and dtype

Signed-off-by: NishantSinghhhhh <nishantsingh_230137@aitpune.edu.in>
Wrap numpy division operations in errstate to silence invalid/divide
warnings in all metric methods. Switch testenv from index_mini.txt to
index.txt to use the full dataset for benchmarking runs.

Signed-off-by: NishantSinghhhhh <nishantsingh_230137@aitpune.edu.in>
@NishantSinghhhhh NishantSinghhhhh force-pushed the Restoration-CityScapeSynthia-Curb-Detection branch from 1d3240a to 4ff2589 Compare June 3, 2026 17:03
@NishantSinghhhhh
Copy link
Copy Markdown
Contributor Author

Screencast.from.2026-06-03.22-19-10.webm

Running withour errors, reduced the epoches and dataset size to debug things faster

Copy link
Copy Markdown
Collaborator

@MooreZheng MooreZheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@kubeedge-bot kubeedge-bot added the lgtm Indicates that a PR is ready to be merged. label Jun 4, 2026
@MooreZheng MooreZheng requested review from hsj576 and removed request for Poorunga June 4, 2026 08:48
@MooreZheng
Copy link
Copy Markdown
Collaborator

This version have modified as reviewer asked and looks good to me

what do you think @hsj576

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Comprehensive Example Restoration for KubeEdge Ianvs

4 participants