Skip to content

Comments

Fix AttributeError: 'DataParallel' object has no attribute 'scheduler'#705

Closed
Copilot wants to merge 3 commits intomainfrom
copilot/fix-attribute-error-scheduler
Closed

Fix AttributeError: 'DataParallel' object has no attribute 'scheduler'#705
Copilot wants to merge 3 commits intomainfrom
copilot/fix-attribute-error-scheduler

Conversation

Copy link
Contributor

Copilot AI commented Feb 17, 2026

Change Description

Training crashes with AttributeError: 'DataParallel' object has no attribute 'scheduler' when idist.auto_model() wraps the model in DataParallel or DistributedDataParallel.

Solution Description

Root cause: The scheduler_step handler accesses model.scheduler directly on the wrapper. PyTorch wrappers don't forward custom attributes.

Fix: Use already-extracted local variables that properly unwrap the model via extract_model_method():

# Before (line 659: scheduler already extracted, but not used)
scheduler = extract_model_method(model, "scheduler")

@trainer.on(HyraxEvents.HYRAX_EPOCH_COMPLETED)
def scheduler_step(trainer):
    if model.scheduler:  # AttributeError: wrapper has no 'scheduler' attribute
        model._learning_rates_history.append(...)
        model.scheduler.step()
# After
scheduler = extract_model_method(model, "scheduler")
unwrapped_model = model.module if (type(model) is DataParallel or ...) else model

@trainer.on(HyraxEvents.HYRAX_EPOCH_COMPLETED)  
def scheduler_step(trainer):
    if scheduler:  # Use extracted variable
        unwrapped_model._learning_rates_history.append(...)
        scheduler.step()

Changes:

  • src/hyrax/pytorch_ignite.py: Extract unwrapped_model and use it for _learning_rates_history attribute access (lines 660-663, 735-745)
  • tests/hyrax/test_train.py: Add test_scheduler_with_data_parallel that mocks wrapper to validate fix

Scope: Other similar patterns (model.final_training_metrics, model.log_epoch_metrics()) remain for follow-up.

Code Quality

  • I have read the Contribution Guide and agree to the Code of Conduct
  • My code follows the code style of this project
  • My code builds (or compiles) cleanly without any errors or warnings
  • My code contains relevant comments and necessary documentation
Original prompt

Bug

PR #652 ("LR Scheduler implemented", merged 2026-02-10 in commit 68c8e0cb) introduced an AttributeError crash during training when the model is wrapped in DataParallel or DistributedDataParallel by idist.auto_model().

Error

AttributeError: 'DataParallel' object has no attribute 'scheduler'

The traceback points to src/hyrax/pytorch_ignite.py, line 734, inside the scheduler_step inner function within create_trainer().

Root Cause

In create_trainer(), after model = idist.auto_model(model) on line 653, the model variable may be a DataParallel wrapper. Line 659 correctly uses extract_model_method(model, "scheduler") to unwrap it and store the result in the local variable scheduler. However, the scheduler_step handler on lines 732-741 accesses model.scheduler directly on the wrapper instead of using the already-extracted scheduler local variable. DataParallel does not forward attribute access to custom model attributes, so this crashes.

Fix Required

In the scheduler_step inner function (lines 732-741 of src/hyrax/pytorch_ignite.py), replace all references to model.scheduler with the already-extracted scheduler local variable. Specifically:

  • Line 734: Change if model.scheduler:if scheduler:
  • Line 737: Change epoch_lr = model.scheduler.get_last_lr()epoch_lr = scheduler.get_last_lr()
  • Line 741: Change model.scheduler.step()scheduler.step()

Additionally, the _learning_rates_history attribute accesses on the wrapper (lines 735-736, 739) need to be fixed to use the unwrapped model. Use extract_model_method or access the unwrapped model directly. The simplest minimal approach: introduce unwrapped_model = model.module if (type(model) is DataParallel or type(model) is DistributedDataParallel) else model near line 658 (after model = idist.auto_model(model)) and use it for the _learning_rates_history accesses in scheduler_step:

  • Line 735: Change if not hasattr(model, "_learning_rates_history"):if not hasattr(unwrapped_model, "_learning_rates_history"):
  • Line 736: Change model._learning_rates_history = []unwrapped_model._learning_rates_history = []
  • Line 739: Change model._learning_rates_history.append(epoch_lr)unwrapped_model._learning_rates_history.append(epoch_lr)

Scope

This is a targeted fix to unblock training. There are other similar fragilities in this file (e.g., model.final_training_metrics, model.final_validation_metrics, model.log_epoch_metrics() accesses on the wrapper in other handlers) — those should be addressed in a follow-up PR.

Important Notes

  • DataParallel and DistributedDataParallel are already imported at the top of the file (line 23).
  • The extract_model_method helper (line 428) already handles unwrapping correctly, but using an unwrapped_model local is simpler for attribute sets/gets that aren't method calls.
  • model.eval() and model.train() calls elsewhere are safe because those are inherited nn.Module methods that DataParallel properly delegates. Do NOT change those.
  • Make sure not to change line 667 (if scheduler:) or line 668 (to_save["scheduler"] = scheduler) — those already use the local variable correctly.

This pull request was created from Copilot chat.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

…allel scheduler

- Use scheduler local variable instead of model.scheduler in scheduler_step handler
- Add unwrapped_model variable for _learning_rates_history attribute access
- Add test_scheduler_with_data_parallel to validate fix works with DataParallel wrapper

Fixes issue where model.scheduler is accessed on DataParallel/DistributedDataParallel
wrappers which don't forward custom attributes, causing AttributeError during training.
Copilot AI changed the title [WIP] Fix AttributeError in scheduler step for DataParallel model Fix AttributeError: 'DataParallel' object has no attribute 'scheduler' Feb 17, 2026
@gitosaurus gitosaurus marked this pull request as ready for review February 17, 2026 19:27
Copilot AI review requested due to automatic review settings February 17, 2026 19:27
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an AttributeError crash during training when models are wrapped in DataParallel or DistributedDataParallel by the distributed training infrastructure. The bug was introduced in PR #652 when learning rate scheduler support was added.

Changes:

  • Fixed scheduler access in scheduler_step handler to use pre-extracted scheduler variable instead of accessing it through the DataParallel wrapper
  • Added unwrapped model reference to properly access _learning_rates_history attribute
  • Added comprehensive test to validate the fix works with DataParallel-wrapped models

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/hyrax/pytorch_ignite.py Extracts unwrapped_model reference and uses it along with the pre-extracted scheduler variable in the scheduler_step handler to avoid AttributeError when model is wrapped by DataParallel/DistributedDataParallel
tests/hyrax/test_train.py Adds test_scheduler_with_data_parallel that mocks idist.auto_model to wrap the model in DataParallel and validates scheduler functionality works correctly

@gitosaurus gitosaurus requested a review from mtauraso February 17, 2026 19:32
@codecov
Copy link

codecov bot commented Feb 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.17%. Comparing base (4c93171) to head (c1d45ef).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #705   +/-   ##
=======================================
  Coverage   64.17%   64.17%           
=======================================
  Files          61       61           
  Lines        5892     5893    +1     
=======================================
+ Hits         3781     3782    +1     
  Misses       2111     2111           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mtauraso
Copy link
Collaborator

I've stolen the test from this into #707. I think #707 is cleaner. I'm going to close this to reduce confusion.

@mtauraso mtauraso closed this Feb 17, 2026
@github-actions
Copy link

Before [4c93171] After [71106ec] Ratio Benchmark (Parameter)
1.69G 1.76G 1.04 vector_db_benchmarks.VectorDBInsertBenchmarks.peakmem_load_vector_db(16384, 'chromadb')
15.452620994412609 15.935010791694598 1.03 data_cache_benchmarks.DataCacheBenchmarks.track_cache_hsc1k_hyrax_size_undercount
1.91±0.1s 1.94±0.04s 1.02 benchmarks.time_train_help
1.93±0.03s 1.94±0.1s 1.01 benchmarks.time_database_connection_help
1.91±0.1s 1.93±0.04s 1.01 benchmarks.time_save_to_database_help
1.91±0.07s 1.91±0.1s 1 benchmarks.time_download_help
1.90±0.07s 1.90±0.07s 1 benchmarks.time_help
1.94±0.05s 1.94±0.05s 1 benchmarks.time_infer_help
1.93±0.05s 1.92±0.03s 1 benchmarks.time_lookup_help
1.92±0.1s 1.91±0.03s 1 benchmarks.time_umap_help

Click here to view all benchmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants