-
Notifications
You must be signed in to change notification settings - Fork 1k
trigger interrupts after long vector operations #994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The trap count currently roughly tracks the number of procedure calls. Since these copy operations are quick, would it make sense to divide the count by some power of 2, maybe 16, to avoid calling the interrupt too frequently? |
|
Yes, that would make sense. The intent currently is to count in words (not just bytes), and I can make it shift down by an extra bit or two. |
For operations that are atomic from the perspective of interrupts but that may work on large objects, such as `vector-append`, adjust the trap counter proportional to work done. That way, interrupts are dispatched in a more timely manner, especially GC interrupts. The trap-counter adjustment is 1/4 the number of words involved in the operation. The change to "7.ms" is unrelated; wrapping that test with its smaller list size in a loop could provoke a failure befere these changes.
647f89e to
1c9db68
Compare
|
It was definitely worth a closer look at how operation sizes turned into counter adjustments, because it wasn't consistent before. With the revised commit, the counter should decrement by 1/4 the number of words involved (which is 1/32 or 1/16 the number of bytes). That ratio can be adjusted via I'm not sure of the right ratio. If the various loops were written in Scheme, it would be about 1/1 (or maybe even 8/1 for bytevector operations). So, I picked 1/4 reflecting that various shortcuts and unrollings by inlined operations likely provide a small-integer-factor improvement. |
burgerrg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this change on arm64osx, a6osx, and a6le against a program I use in production, and I didn't notice any significant change in boot file size, heap size, CPU usage, or number and duration of garbage collections. Thanks for this helpful change!
This commit repairs a problem created by the previous commit (cisco#994), and it further repairs some hashtable operations alog the same lines as vector operations in that commit. The implementation of `eq?`-based hashtables turns off interrupts, but the parameter setting had no effect on the checks now inserted for `make-vector`. An interrupt can go wrong with operations like `hashtable-keys` with weekly held keys, because keys could disappear before being transferred to a newly allocated vector. The solution of not inserting an interrupt trap in `make-vector` when `generate-interrupt-trap` is `#f` seems like the obvious approach, but that interacts badly with wrappers for `make-vector` and similar as defined in "prims.ss". Instead, this commit adds a new procedure `$make-vector/no-interrupt-trap`. In general, functions that need to avoid interrupts must choose the functions they call carefully, anyway, and `make-vector` has become one that cannot be called. The repaired operations `$eq-hashtable-keys`, etc., now use `$make-vector/no-interrupt-trap`, but they also use `$use-trap-fuel` after the result is created. The `$eq-hashtable-copy` and `$eq-hashtable-clear` operations similarly need `$use-trap-fuel`, and they previously did not use `make-vector` or another operation that would have inserted interrupt checks.
This commit repairs a problem created by the previous commit (#994), and it further repairs some hashtable operations alog the same lines as vector operations in that commit. The implementation of `eq?`-based hashtables turns off interrupts, but the parameter setting had no effect on the checks now inserted for `make-vector`. An interrupt can go wrong with operations like `hashtable-keys` with weekly held keys, because keys could disappear before being transferred to a newly allocated vector. The solution of not inserting an interrupt trap in `make-vector` when `generate-interrupt-trap` is `#f` seems like the obvious approach, but that interacts badly with wrappers for `make-vector` and similar as defined in "prims.ss". Instead, this commit adds a new procedure `$make-vector/no-interrupt-trap`. In general, functions that need to avoid interrupts must choose the functions they call carefully, anyway, and `make-vector` has become one that cannot be called. The repaired operations `$eq-hashtable-keys`, etc., now use `$make-vector/no-interrupt-trap`, but they also use `$use-trap-fuel` after the result is created. The `$eq-hashtable-copy` and `$eq-hashtable-clear` operations similarly need `$use-trap-fuel`, and they previously did not use `make-vector` or another operation that would have inserted interrupt checks.
Here's a program that should not need too much memory at its peak. The program allocates a 4 MB string and the appends to it 1000 times, but each appended string is immediately discarded. Nevertheless, running the problem on a 64-bit platform before this PR will hit peak memory use of 1.5 GB or so:
The problem is that the implementation of
string-appendperforms each 4 MB allocation as an atomic-seeming kernel step, including a copy viamemcpy— as opposed to a loop in Scheme where the trap register would be decremented every time around the loop. In other words, a large amount of work is done, but it is treated as effectively constant for the purposes of deciding when to fire interrupts, including GC interrupts. Thevector-appendoperation does not usememcpy, but it uses a hand-code loop that (before this PR) similarly dis not adjust the trap register. Operations that don't allocate, such asbytevector-fill!won't create GC trouble, but infrequent timer-interrupt checking can interfere with using timers/engines for coroutines.So, for operations that are atomic from the perspective of interrupts but that may work on large objects, such as
vector-append, the change here adjusts the trap counter proportional to work done. That way, interrupts are dispatched in a more timely manner, especially GC interrupts.(The change to "7.ms" is unrelated. Wrapping that test with its smaller list size in a loop could provoke a failure before these changes.)
There should be a runtime cost, but it is small. The
string-appendfunction turns out to sometimes run a little faster on small strings, but that's because becausememcpyis now called via an__atomicforeign procedure. I've observed a slowdown as large as 10% for fast operations like(#3%vector-set/copy '#(1) 0 1)on x86_64, but the same example has 0% difference on AArch64, and generally the differences are in the noise.Unsafe list operations like
#3%lengthor#3%memqhave the same issue, but since it's only the unsafe versions of those functions (safe versions oflengthandmemqare normal Scheme code), and since those operations tend not to have an overall length straightforwardly available (except in the case of a#3%lengthresult), there's no attempt to adjust those here.Setting aside unsafe list operations, I'm not sure this commit covers all relevant operations. The "4.ms" changes show the ones that I found.