Skip to content

Conversation

@leonematt
Copy link
Member

No description provided.

@leonematt leonematt requested a review from sjw36 February 10, 2026 23:36
@leonematt leonematt changed the title Update/buffer fill Update/buffer fill do not merge before buffer copy PR Feb 10, 2026
Copy link
Collaborator

@sjw36 sjw36 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor changes for the api.

We shouldn't disable tests at this point. Really need to be adding more tests.

}

// Add the 'const' to match the header
extern "C" nxs_status NXS_API_CALL nxsFillBuffer(nxs_int buffer_id, const void *fill_value) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should pass a size of bytes of the fill_value. Assuming float seems arbitrary.

And we should probably have a special case for zero that just uses cudaMemset.

Copy link
Member Author

@leonematt leonematt Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it fine if we just use the memcpy instead of memset given that we will probably end up replacing these cuda calls with a shader kernel call?

@leonematt leonematt requested a review from sjw36 February 11, 2026 22:55
auto buf1 = dev0.createBuffer(size, vecB.data());
auto buf2 = dev0.createBuffer(size, vecResult_GPU.data());

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace.


// 3. The "Inefficient" but Reliable Method:
// Calculate how many floats we need to fill the allocated space
size_t num_elements = size / sizeof(int);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size should be the number of bytes of value.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add a check for all zeros in value, then call cudaMemset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so not how many bytes of the buffer should be filled, but the size of the input.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should fill the entire buffer.

size_t num_elements = size / sizeof(int);

// Create a temporary host buffer and fill it using the CPU
std::vector<int> host_gold_standard(num_elements, (int)(intptr_t)value);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vector should be bytes (char) and the size should be buffer->size(). You'll need another way to set the value, like std::fill for different size bytes (ie. 1, 2, 4).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you will need to handle the case of size not aligned with buffer->size().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants