Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Fixes a DLPNO + monoatomic edge case that could trigger ORCA “INVALID ORBITAL RANGE” failures and an infinite memory-troubleshooting loop.
Changes:
- Normalize the DLPNO monoatomic guard in the scheduler to actually trigger (case/normalization fix) and downgrade to the canonical (non-DLPNO) method.
- Pass monoatomic context into ESS troubleshooting.
- Reorder ORCA troubleshooting to detect DLPNO + monoatomic/H before memory-based retries.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| arc/scheduler.py | Detects DLPNO on monoatomic species and rewrites the level of theory; forwards monoatomic flag into troubleshooting. |
| arc/job/trsh.py | Adds is_monoatomic parameter and prioritizes DLPNO+monoatomic/H handling before memory retries for ORCA. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
69457e1 to
e3a59d3
Compare
|
We must add a trsh counter (or do we already have one?), so we don't do anything infinitely |
|
|
||
| elif 'orca' in software: | ||
| if 'Memory' in job_status['keywords']: | ||
| if 'dlpno' in level_of_theory.method and (is_monoatomic or is_h): |
There was a problem hiding this comment.
this shouldn't happen, if it does, it means Scheduler is buggy. I almost think we could raise an error here (so devs know)
Depends on what we are troubleshooting - liek for TS guess, we eventually try everything and then declare all methods attempted |
e3a59d3 to
da8f011
Compare
I added a counter now, and defaulted it to 10 in the settings.py |
da8f011 to
f799aba
Compare
|
I think the trsh counter could be per trsh method (not to try the same one too many times), Maybe we can add a trsh_counter dict to Species? |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #865 +/- ##
==========================================
+ Coverage 60.10% 60.20% +0.10%
==========================================
Files 102 102
Lines 31041 31052 +11
Branches 8082 8084 +2
==========================================
+ Hits 18657 18696 +39
+ Misses 10071 10033 -38
- Partials 2313 2323 +10
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
I guess so. The thing is, most of our trsh methods are safe - they have limits. like rotor scans max 4 in the settings. TS guesses have a flag I implemented a year ago or more where it tries all relevant methods and then does 'all_attempted' to indicate exhaustion. Orca Mem has a guard if 'memory' is not in the ess_trsh_methods. and molpro has an elif chain of guards. The issue now gaussian memory - no 'memory' not in ess trsh method guard. can keep doubling until hiting 95% node memory. I think that was for ATLAS? and then general ESS trsh where there is no global max attempt counter - so it relies entirely on ess_trsh_methods list eventually mathcing attempted_ess_trsh_methods. So, I am not so sure having an ess trsh counter per method is really relevant here. |
|
Ok
Further investigating, I am hesitant about per trsh method cause then they means we need to set a limit for a fair few methods and the also allow the user to change it (could be overwhelming), and I know for ORCA memory - I misspoke. It doesn't have a guard cause I know we sometimes have to troubleshoot the mem multiple times as ORCA can keep saying 'okay allocate more' and then we do, then it complains for even more. |
557b52d to
46b213f
Compare
DLPNO methods are incompatible with monoatomic species. This change generalizes the previous hydrogen-specific check to all monoatomic species and automatically falls back to the canonical method by stripping the 'dlpno-' prefix. Added max ess trsh counter attempts
Generalize the H-atom-specific check to all monoatomic species when using DLPNO methods in Orca, as these methods are incompatible with single-atom systems that lack electron pairs to correlate. Added tests for trsh regard monoatomic
Added a counter now to how many times ARC will troubleshoot an ESS job. This is set in the settings.py - default is 25 times.
Correctly import and use the logging module to set the Paramiko log level in the SSHClient class, replacing an undefined logger reference.
46b213f to
dfd53e3
Compare
Bugs:
Guard 1 — Prevention (scheduler.py:1440): Case bug. 'DLPNO' in level.method but Level.init normalizes to lowercase. Dead code since the day it was written — never fired once.
Guard 2 — Troubleshooting (trsh.py:1070): Structurally unreachable. This one actually used lowercase 'dlpno' correctly, but it was an elif after the Memory branch. The error flow made it impossible to reach:
The DLPNO check at step 4 was behind elif, so it could never fire when Memory was in the keywords. Two bugs compounding — the first one prevents the problem, the second one should have caught it but couldn't due to the control flow.
Following on for why ARC did the trsh ad infinitum: