Conversation
Test Results777 tests ±0 765 ✅ - 1 11m 17s ⏱️ +44s For more details on these failures, see this check. Results for commit 54080a3. ± Comparison against base commit 047cb84. ♻️ This comment has been updated with latest results. |
| // TODO: turn these two debug level | ||
| log.info("findBestFitFor: TE {} excluded - not in stateMap", teHolder.getId()); | ||
| return false; | ||
| } | ||
| if (currentBestFit.contains(teHolder.getId())) { | ||
| log.info("findBestFitFor: TE {} excluded - already in bestFit", teHolder.getId()); |
There was a problem hiding this comment.
These logs will be too chatty on main agent pool on every schedule request. Maybe convert to metrics and use TE id as tag instead.
There was a problem hiding this comment.
turn these into warn level for now.
...-plane-server/src/main/java/io/mantisrx/master/resourcecluster/ExecutorStateManagerImpl.java
Outdated
Show resolved
Hide resolved
b252107 to
54080a3
Compare
| } | ||
| return true; | ||
| }) | ||
| .collect(Collectors.toList()); |
There was a problem hiding this comment.
double collect in this function btw. perf punishment.
There was a problem hiding this comment.
we can revert it back after we diagnose
|
|
||
| if (noResourcesAvailable) { | ||
| log.warn("Not all scheduling constraints had enough workers available to fulfill the request {}", request); | ||
| log.warn("Not all scheduling constraints had enough workers for jobId={}, cluster={}", |
There was a problem hiding this comment.
i think you still need workerId + schedulingConstraint info
There was a problem hiding this comment.
worker id and constraint info already logged at findTaskExecutorsFor before coming into this log line.
I can put constraint again in this log line.
worker id we can't output here because it's in the array nested fields.
Context
currently logs are way too long to read, it basically just saying we don't have enough worker to fulfill the request that request for one single worker, and we don't need all these
machineDefinition=MachineDefinition{cpuCores=2.0, memoryMB=14336.0, networkMbps=700.0, diskMB=65536.0, numPorts=1to be part of the details.Besides, we don't have logs to explain why we can't find the TE for the worker even though the scheduler sees there are 2 idle TE, adding logs to show details why TE not selected for the worker
Checklist
./gradlew buildcompiles code correctly./gradlew testpasses all tests