-
Notifications
You must be signed in to change notification settings - Fork 2
Update/buffer fill do not merge before buffer copy PR #123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
sjw36
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor changes for the api.
We shouldn't disable tests at this point. Really need to be adding more tests.
plugins/cuda/cuda_runtime.cpp
Outdated
| } | ||
|
|
||
| // Add the 'const' to match the header | ||
| extern "C" nxs_status NXS_API_CALL nxsFillBuffer(nxs_int buffer_id, const void *fill_value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should pass a size of bytes of the fill_value. Assuming float seems arbitrary.
And we should probably have a special case for zero that just uses cudaMemset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it fine if we just use the memcpy instead of memset given that we will probably end up replacing these cuda calls with a shader kernel call?
d8a0daf to
d8f0182
Compare
d8f0182 to
8923626
Compare
| auto buf1 = dev0.createBuffer(size, vecB.data()); | ||
| auto buf2 = dev0.createBuffer(size, vecResult_GPU.data()); | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace.
|
|
||
| // 3. The "Inefficient" but Reliable Method: | ||
| // Calculate how many floats we need to fill the allocated space | ||
| size_t num_elements = size / sizeof(int); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size should be the number of bytes of value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add a check for all zeros in value, then call cudaMemset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so not how many bytes of the buffer should be filled, but the size of the input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should fill the entire buffer.
| size_t num_elements = size / sizeof(int); | ||
|
|
||
| // Create a temporary host buffer and fill it using the CPU | ||
| std::vector<int> host_gold_standard(num_elements, (int)(intptr_t)value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The vector should be bytes (char) and the size should be buffer->size(). You'll need another way to set the value, like std::fill for different size bytes (ie. 1, 2, 4).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose you will need to handle the case of size not aligned with buffer->size().
No description provided.