Skip to content

Adding entry and exit hooks for functions#9

Draft
programLyrique wants to merge 27 commits intomainfrom
entry-exit-hooks
Draft

Adding entry and exit hooks for functions#9
programLyrique wants to merge 27 commits intomainfrom
entry-exit-hooks

Conversation

@programLyrique
Copy link
Collaborator

@programLyrique programLyrique commented Feb 19, 2026

It makes it possible to look at the environment (and the return value) of a function at entry and exit.
Eventually, we can reproduce the instrumentation performed with R-dyntrace (but without having to modify the R codebase), or with inject (but much faster).

It now uses the new plugin stencils.

The plugin stencils take a data parameter but that must be a pointer that always points to the same address.
The type tracer needs to store an unknown number of elements in its trace so the design uses a manually allocated growable array with malloc to store them, and this is then put into a R external pointer type.

Limitations

  • The type tracing is only performed at the exit hook, as more of the arguments will have been forced by then. It means that we might incorrectly find out the type of an argument if this one is reassigned to another type in the body of the function. It would be possible to peek at the promises at entry and then check if the type changes but it becomes more involved.
  • Function whose name is <unknown> are not traced for types, as we would mix the types of many different functions.

Some outputs

There are 2 helpers to get the types at the end:

  • rcp_get_types_df : returns a data frame with the argument names as columns, plus the number of arguments in dots, and the return value for a given function
  • rcp_get_types: returns an environment where keys are function names and the value is a list of type results per calls. A type result itself is a named lists with 3 elements, arguments (a named list of the argument types), dots_count, and ret

Order of arguments does not matter

p <- function(x, y) {
  cat(x, y, "\n")
  y
}
p <- rcp::rcp_cmpfun(p, list(name = "p"))
p(1, "hello")
p(y=3, x="world")
print(rcp::rcp_get_types_df("p"))

Output:

          x         y dots_count       ret
1    double character          0 character
2 character    double          0    double

Dot argument is correctly handled

library(rcp)
h <- function(a, ...) {
  cat(a, ..., "\n")
}
h = rcp::rcp_cmpfun(h, list(name = "h"))
h(1, "hello")
h("world", 4, "three")
h(4L, t=89)
print(rcp::rcp_get_types_df("h"))

Output:

          a       ..1       ..2   t..1 dots_count  ret
1    double character      <NA>   <NA>          1 NULL
2 character    double character   <NA>          2 NULL
3   integer      <NA>      <NA> double          1 NULL

dots_count is the number of arguments in ... for the call. For those arguments, the displayed name is its tag name if it exists, and then its position (with the ..i convention).

TODO

  • basic entry and exit hooks
  • support collecting types
    • normal parameters
    • dots parameters (expand them to get the actual parameters inside?)
  • test it
  • benchmark it: with and without hooks, vs R-dyntrace, vs injectr

@programLyrique programLyrique self-assigned this Feb 19, 2026
@programLyrique programLyrique added the enhancement New feature or request label Feb 19, 2026
@programLyrique
Copy link
Collaborator Author

When testing with actual hooks (not NULL), it does not work because the function pointer for _RCP_INIT_HOOK does not fit into a patch hole.

@programLyrique
Copy link
Collaborator Author

It works fine with the large memory model.

Matej told me about the new plugin pencils, which sounds like a perfect fit for the entry and exit hooks I want to add.
They should get a way to add them before a particular type of bytecode, e.g. RETURN and RETURNJMP instead of only by position, to make it even easier.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a plugin-stencil based entry/exit hook mechanism to instrument compiled R functions, and builds a first “type tracing” feature on top that records argument/.../return SEXPTYPE information and exposes it through new R APIs.

Changes:

  • Introduces plugin stencil injection (with per-stencil custom data pointers) into the native code generation pipeline.
  • Adds entry/exit hook stencils and a type-tracing implementation backed by a growable malloc’d trace stored in an external pointer.
  • Adds new exported R APIs (rcp_get_types(), rcp_get_types_df()) plus a new functional test suite under rcp/tests/types.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
rcp/src/compile.c Core implementation of plugin stencil injection, coverage/type instrumentation wiring, and new C APIs for retrieving traced types.
rcp/src/stencils/stencils.c Adds CUSTOM_DATA plumbing plus entry/exit hook stencils and type collection at exit.
rcp/src/rcp_hooks.h New header defining TypeTrace/TypeRecord structures shared between compiler and stencils.
rcp/src/extractor/extract_stencils.cpp Updates relocation parsing to recognize CUSTOM_DATA and GOT-based runtime symbol relocations.
rcp/src/rcp_common.h Renames relocation kind to RELOC_RCP_CUSTOM.
rcp/src/rcp_init.c Registers new .Call entry points for type retrieval.
rcp/R/compile.R Adds exported R wrappers and roxygen docs for the new APIs.
rcp/NAMESPACE Exports rcp_get_types and rcp_get_types_df.
rcp/tests/types/basic.R Adds functional tests validating tracing across fixed args, ..., and named-arg reordering.
rcp/tests/types/Makefile Adds a simple harness to run the new types tests.
rcp/tests/Makefile Includes the new types test directory.
rcp/Makefile Adds test-functional target including the new types tests.
rcp/code.R Adds an example/scratch script demonstrating the new APIs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 1281 to 1382
@@ -1296,15 +1364,32 @@ static rcp_exec_ptrs copy_patch_internal(int bytecode[], int bytecode_size,

const Stencil *stencil = get_stencil(opcode, opargs, constpool);

stencil_variants[bc_pos] = (int)(stencil - stencils[opcode]);
uint8_t *pos = inst_start[bc_pos];
// pos is already aligned in context of native code generation, but stencils might require additional alignment, so we need to align it again here

for (; p < plugin_size && plugins[p].pos == bc_pos; p++)
{
DEBUG_PRINT("Patching plugin %d at bytecode position %d\n", p, bc_pos);
const PluginStencil *plugin = &plugins[p];
const Stencil *plugin_stencil = plugin->stencil;

pos = (uint8_t *)align_to_higher((uintptr_t)pos, plugin_stencil->alignment);

memcpy(pos, plugin_stencil->body, plugin_stencil->body_size);
for (size_t k = 0; k < plugin_stencil->holes_size; ++k)
patch(pos, pos, bc_pos, plugin_stencil, &plugin_stencil->holes[k], k, opargs, bc_pos + RCP_BC_ARG_CNT[bytecode[bc_pos]] + 1, plugin->data, &ctx);
pos += plugin_stencil->body_size;
}
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OpenMP copy/patch loop uses a shared iterator variable p to walk the plugins array. Because the loop is parallel, p becomes a data race and can cause missed/duplicated plugin patches, memory corruption, or crashes. Make plugin lookup thread-safe (e.g., precompute per-bc_pos plugin ranges, or remove p and scan plugins for each bc_pos, or move plugin patching out of the parallel region).

Copilot uses AI. Check for mistakes.
Comment on lines +2012 to +2014
coverage_registry = PROTECT(Rf_findVarInFrame(covr_ns, Rf_install(".counters")));
coverage_registry = eval(coverage_registry, covr_ns); // In case it's a promise
UNPROTECT_SAFE(coverage_registry); // Is it safe?
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coverage_registry is PROTECTed, then overwritten with eval(...), and then passed to UNPROTECT_SAFE(coverage_registry). This breaks the UNPROTECT_SAFE assertion (top-of-stack is the pre-eval object) and is confusing even when asserts are off. Keep the protected object in a separate variable and UNPROTECT the exact SEXP you PROTECTed, or PROTECT the eval result explicitly if needed.

Suggested change
coverage_registry = PROTECT(Rf_findVarInFrame(covr_ns, Rf_install(".counters")));
coverage_registry = eval(coverage_registry, covr_ns); // In case it's a promise
UNPROTECT_SAFE(coverage_registry); // Is it safe?
SEXP coverage_var = PROTECT(Rf_findVarInFrame(covr_ns, Rf_install(".counters")));
coverage_registry = eval(coverage_var, covr_ns); // In case it's a promise
UNPROTECT_SAFE(coverage_var);

Copilot uses AI. Check for mistakes.
Comment on lines +1734 to +1735
int used_expressions[len];
int used_expr_ids[len]; // store expr_ids in discovery order
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

srcref_coverage() allocates used_expressions[len] and used_expr_ids[len] as VLAs on the C stack, where len is the bytecode length. For large functions this can overflow the stack. Prefer heap/R allocators (e.g., R_alloc/S_alloc) for these arrays.

Suggested change
int used_expressions[len];
int used_expr_ids[len]; // store expr_ids in discovery order
int *used_expressions = (int *) R_alloc(len, sizeof(int));
int *used_expr_ids = (int *) R_alloc(len, sizeof(int)); // store expr_ids in discovery order

Copilot uses AI. Check for mistakes.
Comment on lines +242 to +246
// Resize if needed
if (trace->count >= trace->capacity) {
trace->capacity *= 2;
trace->types = realloc(trace->types, trace->capacity * sizeof(TypeRecord));
}
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type tracing hooks use malloc/realloc without checking for allocation failure. In particular, realloc directly assigns back to trace->types, which can lose the original pointer on failure. Use a temporary pointer, check for NULL, and error cleanly (or keep the old buffer).

Copilot uses AI. Check for mistakes.
if (opcode == GOTO_BCOP)
{
int target = imms[0] - 1;
DEBUG_PRINT("Peephole optimization: Simplifying unncessary trampoline jump from bytecode %d to target %d\n", index, target);
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in debug message string: "unncessary" -> "unnecessary".

Suggested change
DEBUG_PRINT("Peephole optimization: Simplifying unncessary trampoline jump from bytecode %d to target %d\n", index, target);
DEBUG_PRINT("Peephole optimization: Simplifying unnecessary trampoline jump from bytecode %d to target %d\n", index, target);

Copilot uses AI. Check for mistakes.
}

// get the types environment from the hooks_registry
// Should not need to protect as types is already in an enviornment known by the GC
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "enviornment" -> "environment".

Suggested change
// Should not need to protect as types is already in an enviornment known by the GC
// Should not need to protect as types is already in an environment known by the GC

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +84
options(rcp.cmpfun.entry_exit_hooks = TRUE)
library(rcp)
fib <- function(x) {
if (x == 0) 0
else if (x == 1) 1
else fib(x-2) + fib(x-1)
}

fib = rcp::rcp_cmpfun(fib, list(name="fib"))
fib(10)
print(rcp::rcp_get_types_df("fib"))

library(rcp)
test <- function(x) {
if (x == 0) x=10
else x=11
x
}

test =rcp::rcp_cmpfun(test);
test(1)

exec <- function(x) {
1
}


tmp = rcp::rcp_cmpfun(exec)



exec <- function(x) {
repeat {
next
}
}

library(rcp)
f <- function(x) {
y <- x + 1
if(y > 0){
z <- x - 1
}
else {
z <- x + 1
}
y <- z / y
z
}
f = rcp::rcp_cmpfun(f, list(name="f"))
f(14)
print(rcp::rcp_get_types_df("f"))

library(rcp)

g <- function(x, y) {
cat(x, y, "\n")
x
}
g = rcp::rcp_cmpfun(g, list(name = "g"))
g(34, "hello")
g(1L, "world!")
g("Nope", 456)
print(rcp::rcp_get_types_df("g"))

library(rcp)
h <- function(a, ...) {
cat(a, ..., "\n")
}
h = rcp::rcp_cmpfun(h, list(name = "h"))
h(1, "hello")
h("world", 4, "three")
h(4L, t=89)
print(rcp::rcp_get_types_df("h"))

library(rcp)
p <- function(x, y) {
cat(x, y, "\n")
y
}
p <- rcp::rcp_cmpfun(p, list(name = "p"))
p(1, "hello")
p(y=3, x="world")
print(rcp::rcp_get_types_df("p")) No newline at end of file
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rcp/code.R looks like an ad-hoc scratch/demo script (sets options, compiles/runs examples) and is not referenced by build/test tooling. Keeping this in the package root will ship it to users and makes maintenance harder. Consider removing it, or moving the content into a vignette, README example, or tests.

Suggested change
options(rcp.cmpfun.entry_exit_hooks = TRUE)
library(rcp)
fib <- function(x) {
if (x == 0) 0
else if (x == 1) 1
else fib(x-2) + fib(x-1)
}
fib = rcp::rcp_cmpfun(fib, list(name="fib"))
fib(10)
print(rcp::rcp_get_types_df("fib"))
library(rcp)
test <- function(x) {
if (x == 0) x=10
else x=11
x
}
test =rcp::rcp_cmpfun(test);
test(1)
exec <- function(x) {
1
}
tmp = rcp::rcp_cmpfun(exec)
exec <- function(x) {
repeat {
next
}
}
library(rcp)
f <- function(x) {
y <- x + 1
if(y > 0){
z <- x - 1
}
else {
z <- x + 1
}
y <- z / y
z
}
f = rcp::rcp_cmpfun(f, list(name="f"))
f(14)
print(rcp::rcp_get_types_df("f"))
library(rcp)
g <- function(x, y) {
cat(x, y, "\n")
x
}
g = rcp::rcp_cmpfun(g, list(name = "g"))
g(34, "hello")
g(1L, "world!")
g("Nope", 456)
print(rcp::rcp_get_types_df("g"))
library(rcp)
h <- function(a, ...) {
cat(a, ..., "\n")
}
h = rcp::rcp_cmpfun(h, list(name = "h"))
h(1, "hello")
h("world", 4, "three")
h(4L, t=89)
print(rcp::rcp_get_types_df("h"))
library(rcp)
p <- function(x, y) {
cat(x, y, "\n")
y
}
p <- rcp::rcp_cmpfun(p, list(name = "p"))
p(1, "hello")
p(y=3, x="world")
print(rcp::rcp_get_types_df("p"))
# Example/demo script for rcp.
# To run these examples manually, call run_rcp_examples().
run_rcp_examples <- function() {
options(rcp.cmpfun.entry_exit_hooks = TRUE)
library(rcp)
fib <- function(x) {
if (x == 0) 0
else if (x == 1) 1
else fib(x - 2) + fib(x - 1)
}
fib = rcp::rcp_cmpfun(fib, list(name = "fib"))
fib(10)
print(rcp::rcp_get_types_df("fib"))
library(rcp)
test <- function(x) {
if (x == 0) x = 10
else x = 11
x
}
test = rcp::rcp_cmpfun(test);
test(1)
exec <- function(x) {
1
}
tmp = rcp::rcp_cmpfun(exec)
exec <- function(x) {
repeat {
next
}
}
library(rcp)
f <- function(x) {
y <- x + 1
if (y > 0) {
z <- x - 1
}
else {
z <- x + 1
}
y <- z / y
z
}
f = rcp::rcp_cmpfun(f, list(name = "f"))
f(14)
print(rcp::rcp_get_types_df("f"))
library(rcp)
g <- function(x, y) {
cat(x, y, "\n")
x
}
g = rcp::rcp_cmpfun(g, list(name = "g"))
g(34, "hello")
g(1L, "world!")
g("Nope", 456)
print(rcp::rcp_get_types_df("g"))
library(rcp)
h <- function(a, ...) {
cat(a, ..., "\n")
}
h = rcp::rcp_cmpfun(h, list(name = "h"))
h(1, "hello")
h("world", 4, "three")
h(4L, t = 89)
print(rcp::rcp_get_types_df("h"))
library(rcp)
p <- function(x, y) {
cat(x, y, "\n")
y
}
p <- rcp::rcp_cmpfun(p, list(name = "p"))
p(1, "hello")
p(y = 3, x = "world")
print(rcp::rcp_get_types_df("p"))
}

Copilot uses AI. Check for mistakes.
if(hooks_registry != R_NilValue)
types_of_function(bytecode, bytecode_size, &plugins, hooks_registry, name, formals);

// Example of adding a plugin stencil to all stencil at beggining and end of the function:
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "beggining" -> "beginning".

Suggested change
// Example of adding a plugin stencil to all stencil at beggining and end of the function:
// Example of adding a plugin stencil to all stencil at beginning and end of the function:

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants