Skip to content

Conversation

@m8pple
Copy link
Contributor

@m8pple m8pple commented Jun 14, 2021

This is related to #235.
Main changes are:

  • Add CommonBase::HaveIdleWork
  • Modify CommonBase::MPISpinner to check
    whether there is idle work, and if not back off down
    to spinning at 10hz.
  • Override HaveIdlework in various CommonBase
    derivatives to indicate if they have idle work.

Tested locally, though not tested while running hardware. Maximum backup threshold is 10hz, though
probably this could go down to 100hz or even 1Khz and still get benefit. Backoff schedule could also
be adjusted.

(cherry picked from commit 7ce6aef)

This is related to #235.
Main changes are:
- Add CommonBase::HaveIdleWork
- Modify CommonBase::MPISpinner to check
   whether there is idle work, and if not back off down
   to spinning at 10hz.
- Override HaveIdlework in various CommonBase
   derivatives to indicate if they have idle work.

(cherry picked from commit 7ce6aef)
Copy link
Contributor

@mvousden mvousden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally happy to see this. It will have very little impact on runtime in practice, because the Mothership still spins as fast as it can. Should reduce the rate at which electrons leave smaller computers.

I approve, but would like @heliosfa to take a look before anyone merges it.

if(HaveIdleWork()){
OnIdle(); // Guess
}else{
probe_idle_backoff_us=std::min<int>(MAX_PROBE_IDLE_BACKOFF_US, std::max<int>(10, (probe_idle_backoff_us*5)/4));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to see some documentation about this, explaning how probe_idle_backoff_us decreases and increases incrementally as a function of workload.

Small note on styling - we generally try to adhere to the same style as neighbouring source when introducing modifications. In this case, lines of source in the Orchestrator source shouldn't span more than 79 characters, and we generally avoid underscores in variable names over camel case.

We do have a style guide, at https://github.com/POETSII/Orchestrator/wiki/Code-Style, but I don't want to dissuade you from contributing!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't really think about the schedule, it's just exponential backoff with a slow-ish
multiplier and a reset to zero on activity. I'm afraid there is no particular thought that went into it. So feel free to change it to better match work-load and so on.

In terms of style, I generally try to match where possible, but it is a balance
between fixing things for my ultimate goals (getting DPD working) versus trying
to package up into small patches that can be easily applied. So probably that
means more work to integrate the PRs, but I don't really expect them
to be directly applicable, as I don't really understand the orchestrator.
So feel free to improve however you want, and I'll just merge 1.0.0-alpha
back into my copy.

return false;
}
return true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return !pCmCall->IsEmpty()? ;p

@heliosfa heliosfa added the feature New feature label Jun 14, 2021
@heliosfa
Copy link
Contributor

Some care needs to be taken in implementing this and we may even need to look at making it switchable.

@DrongoTheDog will need to confirm, but I believe the design intent here is that the MPI spinners spin as fast as possible to drain the MPI network as quickly as possible given that there could be large bursts of data that overwhelm the statically sized MPI buffers.

On a beecake of a system, this is not a huge issue, but I can definitely see issues on laptops/desktops with fewer cores or battery.

@DrongoTheDog
Copy link

DrongoTheDog commented Jun 14, 2021 via email

@heliosfa heliosfa self-requested a review June 15, 2021 12:11
Copy link
Contributor

@heliosfa heliosfa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good change set overall. A couple of queries (Marked with a 🍠 as per MLV's convention).

I have also had a bash at updating the Orchestrator Docs [Docx|PDF] with this info (changes on pages 10 and 15). I would appreciate if you could have a quick read and confirm that that describes the behaviour.

void Dump(unsigned = 0,FILE * = stdout);
unsigned OnExit(PMsg_p *);
virtual void OnIdle();
virtual bool HaveIdleWork(); // Return false if there is something we want to do in OnIdle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this comment be // Return false if there is ... 🍠

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, comment is flipped. I think originally I had it as NoIdleWork, then changed the sense.

if (flag==0) { // Nothing there....
OnIdle(); // Guess
if(HaveIdleWork()){
OnIdle(); // Guess
Copy link
Contributor

@heliosfa heliosfa Jun 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just having a think about this - shouldn't the sleep period be reset if there is idle work and OnIdle() is called? 🍠

Currently, the period is incremented any time there is no idle work, but it is not reset if there is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mental model was that the OnIdle was mostly draining work that arrived in messages, so it was something like:

  • Message arrives, causing work to queue up, resetting delay to zero
  • HaveIdleWork returns true until all the work is finished, with delay staying at zero
  • HaveIdleWork returns false (and presumably keeps returning false), so delay slowly increases

But if idle work can arrive from elsewhere (i.e. not from a message), then I guess that you'd
want any return from HaveIdleWork to reset the counter too.

@m8pple
Copy link
Contributor Author

m8pple commented Jun 15, 2021

If the worry about a flood of messages arriving is a concern, (which I could see), then
presumably the correct approach is to move to a blocking MPI receive once HaveIdleWork
returns false. Maybe the documentation for HaveIdleWork could be changed to:

/*! Return true if there is something we want to do in OnIdle. If HaveIdleWork
  returns false it is guaranteed it will never return true again unless a message
  is received. */
virtual bool           HaveIdleWork();

That would allow the implementation to switch from MPI_Iprobe to MPI_Probe
which would get the absolute minimum latency of receiving messages to avoid
flooding, while also maximising the efficiency of the loop.

(Not saying this should be done, probably not worth the effort - just reset the
delay to 0 if HaveIdleWork returns true, and then you get almost all the CPU
reduction benefit.).

@DrongoTheDog
Copy link

DrongoTheDog commented Jun 15, 2021 via email

@m8pple
Copy link
Contributor Author

m8pple commented Jun 15, 2021

I agree, I don't really think it is worth shifting to blocking or doing much else
beyond the current approach, but:

It seems to me that the only driver for this is excessive CPU load and
GMBs feet getting hot. OK, poke a 100mS delay in the loop and let's get
on with our lives. Dynamic delay adjustment here is not worth worrying
about.

I don't think this is a trivial issue that only applies to people
running laptops, and should be taken seriously. Running the stock 1.0.0-alpha
Orchestrator on Ayres (which has hardware), it immediately and permanently consumes
600% of CPU until it exits. Ayres only has 6 CPUs and 12 hyper-threads (IIRC, it
is i7), so the orchestrator by default consumes 100% of the true CPU
capacity of the server - any other useful task will have to compete with the spinners.

Some of those threads will probably switch to doing real things during
various stages, but it's difficult to see how it's not going to make tasks
like compose slower if all your parallel g++ processes have to compete
with the spinners. It also means that you won't get a frequency boost
on the CPUs that are doing the important work because the spinners
are reducing the possible boost frequency. I guess at least they don't
pollute the cache.

So yes - I think that the hacky delay fix is fine, but it's not just a case of avoiding
laptops heating up or Graeme's feet getting warm. Spinning CPU cores wreck
performance in a multi-core system due to lots of second order hardware
and OS effects, and should be identified and removed.

@heliosfa heliosfa changed the base branch from 1.0.0-alpha to development June 16, 2021 15:22
@mvousden
Copy link
Contributor

@heliosfa @m8pple what's blocking here?

@heliosfa heliosfa self-assigned this Jul 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants