Adding entry and exit hooks for functions#9
Conversation
…environment and at the return value.
|
When testing with actual hooks (not |
|
It works fine with the large memory model. Matej told me about the new plugin pencils, which sounds like a perfect fit for the entry and exit hooks I want to add. |
Locals are prepended in the FRAME, so we just save the first argument name in the entry hook and then scan the FRAME up to it to indentify the beginning of the parameters
…le to run the functional tests only (not the benchmarks, which takes ages)
There was a problem hiding this comment.
Pull request overview
This PR adds a plugin-stencil based entry/exit hook mechanism to instrument compiled R functions, and builds a first “type tracing” feature on top that records argument/.../return SEXPTYPE information and exposes it through new R APIs.
Changes:
- Introduces plugin stencil injection (with per-stencil custom data pointers) into the native code generation pipeline.
- Adds entry/exit hook stencils and a type-tracing implementation backed by a growable
malloc’d trace stored in an external pointer. - Adds new exported R APIs (
rcp_get_types(),rcp_get_types_df()) plus a new functional test suite underrcp/tests/types.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| rcp/src/compile.c | Core implementation of plugin stencil injection, coverage/type instrumentation wiring, and new C APIs for retrieving traced types. |
| rcp/src/stencils/stencils.c | Adds CUSTOM_DATA plumbing plus entry/exit hook stencils and type collection at exit. |
| rcp/src/rcp_hooks.h | New header defining TypeTrace/TypeRecord structures shared between compiler and stencils. |
| rcp/src/extractor/extract_stencils.cpp | Updates relocation parsing to recognize CUSTOM_DATA and GOT-based runtime symbol relocations. |
| rcp/src/rcp_common.h | Renames relocation kind to RELOC_RCP_CUSTOM. |
| rcp/src/rcp_init.c | Registers new .Call entry points for type retrieval. |
| rcp/R/compile.R | Adds exported R wrappers and roxygen docs for the new APIs. |
| rcp/NAMESPACE | Exports rcp_get_types and rcp_get_types_df. |
| rcp/tests/types/basic.R | Adds functional tests validating tracing across fixed args, ..., and named-arg reordering. |
| rcp/tests/types/Makefile | Adds a simple harness to run the new types tests. |
| rcp/tests/Makefile | Includes the new types test directory. |
| rcp/Makefile | Adds test-functional target including the new types tests. |
| rcp/code.R | Adds an example/scratch script demonstrating the new APIs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -1296,15 +1364,32 @@ static rcp_exec_ptrs copy_patch_internal(int bytecode[], int bytecode_size, | |||
|
|
|||
| const Stencil *stencil = get_stencil(opcode, opargs, constpool); | |||
|
|
|||
| stencil_variants[bc_pos] = (int)(stencil - stencils[opcode]); | |||
| uint8_t *pos = inst_start[bc_pos]; | |||
| // pos is already aligned in context of native code generation, but stencils might require additional alignment, so we need to align it again here | |||
|
|
|||
| for (; p < plugin_size && plugins[p].pos == bc_pos; p++) | |||
| { | |||
| DEBUG_PRINT("Patching plugin %d at bytecode position %d\n", p, bc_pos); | |||
| const PluginStencil *plugin = &plugins[p]; | |||
| const Stencil *plugin_stencil = plugin->stencil; | |||
|
|
|||
| pos = (uint8_t *)align_to_higher((uintptr_t)pos, plugin_stencil->alignment); | |||
|
|
|||
| memcpy(pos, plugin_stencil->body, plugin_stencil->body_size); | |||
| for (size_t k = 0; k < plugin_stencil->holes_size; ++k) | |||
| patch(pos, pos, bc_pos, plugin_stencil, &plugin_stencil->holes[k], k, opargs, bc_pos + RCP_BC_ARG_CNT[bytecode[bc_pos]] + 1, plugin->data, &ctx); | |||
| pos += plugin_stencil->body_size; | |||
| } | |||
There was a problem hiding this comment.
The OpenMP copy/patch loop uses a shared iterator variable p to walk the plugins array. Because the loop is parallel, p becomes a data race and can cause missed/duplicated plugin patches, memory corruption, or crashes. Make plugin lookup thread-safe (e.g., precompute per-bc_pos plugin ranges, or remove p and scan plugins for each bc_pos, or move plugin patching out of the parallel region).
| coverage_registry = PROTECT(Rf_findVarInFrame(covr_ns, Rf_install(".counters"))); | ||
| coverage_registry = eval(coverage_registry, covr_ns); // In case it's a promise | ||
| UNPROTECT_SAFE(coverage_registry); // Is it safe? |
There was a problem hiding this comment.
coverage_registry is PROTECTed, then overwritten with eval(...), and then passed to UNPROTECT_SAFE(coverage_registry). This breaks the UNPROTECT_SAFE assertion (top-of-stack is the pre-eval object) and is confusing even when asserts are off. Keep the protected object in a separate variable and UNPROTECT the exact SEXP you PROTECTed, or PROTECT the eval result explicitly if needed.
| coverage_registry = PROTECT(Rf_findVarInFrame(covr_ns, Rf_install(".counters"))); | |
| coverage_registry = eval(coverage_registry, covr_ns); // In case it's a promise | |
| UNPROTECT_SAFE(coverage_registry); // Is it safe? | |
| SEXP coverage_var = PROTECT(Rf_findVarInFrame(covr_ns, Rf_install(".counters"))); | |
| coverage_registry = eval(coverage_var, covr_ns); // In case it's a promise | |
| UNPROTECT_SAFE(coverage_var); |
| int used_expressions[len]; | ||
| int used_expr_ids[len]; // store expr_ids in discovery order |
There was a problem hiding this comment.
srcref_coverage() allocates used_expressions[len] and used_expr_ids[len] as VLAs on the C stack, where len is the bytecode length. For large functions this can overflow the stack. Prefer heap/R allocators (e.g., R_alloc/S_alloc) for these arrays.
| int used_expressions[len]; | |
| int used_expr_ids[len]; // store expr_ids in discovery order | |
| int *used_expressions = (int *) R_alloc(len, sizeof(int)); | |
| int *used_expr_ids = (int *) R_alloc(len, sizeof(int)); // store expr_ids in discovery order |
| // Resize if needed | ||
| if (trace->count >= trace->capacity) { | ||
| trace->capacity *= 2; | ||
| trace->types = realloc(trace->types, trace->capacity * sizeof(TypeRecord)); | ||
| } |
There was a problem hiding this comment.
The type tracing hooks use malloc/realloc without checking for allocation failure. In particular, realloc directly assigns back to trace->types, which can lose the original pointer on failure. Use a temporary pointer, check for NULL, and error cleanly (or keep the old buffer).
| if (opcode == GOTO_BCOP) | ||
| { | ||
| int target = imms[0] - 1; | ||
| DEBUG_PRINT("Peephole optimization: Simplifying unncessary trampoline jump from bytecode %d to target %d\n", index, target); |
There was a problem hiding this comment.
Typo in debug message string: "unncessary" -> "unnecessary".
| DEBUG_PRINT("Peephole optimization: Simplifying unncessary trampoline jump from bytecode %d to target %d\n", index, target); | |
| DEBUG_PRINT("Peephole optimization: Simplifying unnecessary trampoline jump from bytecode %d to target %d\n", index, target); |
| } | ||
|
|
||
| // get the types environment from the hooks_registry | ||
| // Should not need to protect as types is already in an enviornment known by the GC |
There was a problem hiding this comment.
Typo in comment: "enviornment" -> "environment".
| // Should not need to protect as types is already in an enviornment known by the GC | |
| // Should not need to protect as types is already in an environment known by the GC |
| options(rcp.cmpfun.entry_exit_hooks = TRUE) | ||
| library(rcp) | ||
| fib <- function(x) { | ||
| if (x == 0) 0 | ||
| else if (x == 1) 1 | ||
| else fib(x-2) + fib(x-1) | ||
| } | ||
|
|
||
| fib = rcp::rcp_cmpfun(fib, list(name="fib")) | ||
| fib(10) | ||
| print(rcp::rcp_get_types_df("fib")) | ||
|
|
||
| library(rcp) | ||
| test <- function(x) { | ||
| if (x == 0) x=10 | ||
| else x=11 | ||
| x | ||
| } | ||
|
|
||
| test =rcp::rcp_cmpfun(test); | ||
| test(1) | ||
|
|
||
| exec <- function(x) { | ||
| 1 | ||
| } | ||
|
|
||
|
|
||
| tmp = rcp::rcp_cmpfun(exec) | ||
|
|
||
|
|
||
|
|
||
| exec <- function(x) { | ||
| repeat { | ||
| next | ||
| } | ||
| } | ||
|
|
||
| library(rcp) | ||
| f <- function(x) { | ||
| y <- x + 1 | ||
| if(y > 0){ | ||
| z <- x - 1 | ||
| } | ||
| else { | ||
| z <- x + 1 | ||
| } | ||
| y <- z / y | ||
| z | ||
| } | ||
| f = rcp::rcp_cmpfun(f, list(name="f")) | ||
| f(14) | ||
| print(rcp::rcp_get_types_df("f")) | ||
|
|
||
| library(rcp) | ||
|
|
||
| g <- function(x, y) { | ||
| cat(x, y, "\n") | ||
| x | ||
| } | ||
| g = rcp::rcp_cmpfun(g, list(name = "g")) | ||
| g(34, "hello") | ||
| g(1L, "world!") | ||
| g("Nope", 456) | ||
| print(rcp::rcp_get_types_df("g")) | ||
|
|
||
| library(rcp) | ||
| h <- function(a, ...) { | ||
| cat(a, ..., "\n") | ||
| } | ||
| h = rcp::rcp_cmpfun(h, list(name = "h")) | ||
| h(1, "hello") | ||
| h("world", 4, "three") | ||
| h(4L, t=89) | ||
| print(rcp::rcp_get_types_df("h")) | ||
|
|
||
| library(rcp) | ||
| p <- function(x, y) { | ||
| cat(x, y, "\n") | ||
| y | ||
| } | ||
| p <- rcp::rcp_cmpfun(p, list(name = "p")) | ||
| p(1, "hello") | ||
| p(y=3, x="world") | ||
| print(rcp::rcp_get_types_df("p")) No newline at end of file |
There was a problem hiding this comment.
rcp/code.R looks like an ad-hoc scratch/demo script (sets options, compiles/runs examples) and is not referenced by build/test tooling. Keeping this in the package root will ship it to users and makes maintenance harder. Consider removing it, or moving the content into a vignette, README example, or tests.
| options(rcp.cmpfun.entry_exit_hooks = TRUE) | |
| library(rcp) | |
| fib <- function(x) { | |
| if (x == 0) 0 | |
| else if (x == 1) 1 | |
| else fib(x-2) + fib(x-1) | |
| } | |
| fib = rcp::rcp_cmpfun(fib, list(name="fib")) | |
| fib(10) | |
| print(rcp::rcp_get_types_df("fib")) | |
| library(rcp) | |
| test <- function(x) { | |
| if (x == 0) x=10 | |
| else x=11 | |
| x | |
| } | |
| test =rcp::rcp_cmpfun(test); | |
| test(1) | |
| exec <- function(x) { | |
| 1 | |
| } | |
| tmp = rcp::rcp_cmpfun(exec) | |
| exec <- function(x) { | |
| repeat { | |
| next | |
| } | |
| } | |
| library(rcp) | |
| f <- function(x) { | |
| y <- x + 1 | |
| if(y > 0){ | |
| z <- x - 1 | |
| } | |
| else { | |
| z <- x + 1 | |
| } | |
| y <- z / y | |
| z | |
| } | |
| f = rcp::rcp_cmpfun(f, list(name="f")) | |
| f(14) | |
| print(rcp::rcp_get_types_df("f")) | |
| library(rcp) | |
| g <- function(x, y) { | |
| cat(x, y, "\n") | |
| x | |
| } | |
| g = rcp::rcp_cmpfun(g, list(name = "g")) | |
| g(34, "hello") | |
| g(1L, "world!") | |
| g("Nope", 456) | |
| print(rcp::rcp_get_types_df("g")) | |
| library(rcp) | |
| h <- function(a, ...) { | |
| cat(a, ..., "\n") | |
| } | |
| h = rcp::rcp_cmpfun(h, list(name = "h")) | |
| h(1, "hello") | |
| h("world", 4, "three") | |
| h(4L, t=89) | |
| print(rcp::rcp_get_types_df("h")) | |
| library(rcp) | |
| p <- function(x, y) { | |
| cat(x, y, "\n") | |
| y | |
| } | |
| p <- rcp::rcp_cmpfun(p, list(name = "p")) | |
| p(1, "hello") | |
| p(y=3, x="world") | |
| print(rcp::rcp_get_types_df("p")) | |
| # Example/demo script for rcp. | |
| # To run these examples manually, call run_rcp_examples(). | |
| run_rcp_examples <- function() { | |
| options(rcp.cmpfun.entry_exit_hooks = TRUE) | |
| library(rcp) | |
| fib <- function(x) { | |
| if (x == 0) 0 | |
| else if (x == 1) 1 | |
| else fib(x - 2) + fib(x - 1) | |
| } | |
| fib = rcp::rcp_cmpfun(fib, list(name = "fib")) | |
| fib(10) | |
| print(rcp::rcp_get_types_df("fib")) | |
| library(rcp) | |
| test <- function(x) { | |
| if (x == 0) x = 10 | |
| else x = 11 | |
| x | |
| } | |
| test = rcp::rcp_cmpfun(test); | |
| test(1) | |
| exec <- function(x) { | |
| 1 | |
| } | |
| tmp = rcp::rcp_cmpfun(exec) | |
| exec <- function(x) { | |
| repeat { | |
| next | |
| } | |
| } | |
| library(rcp) | |
| f <- function(x) { | |
| y <- x + 1 | |
| if (y > 0) { | |
| z <- x - 1 | |
| } | |
| else { | |
| z <- x + 1 | |
| } | |
| y <- z / y | |
| z | |
| } | |
| f = rcp::rcp_cmpfun(f, list(name = "f")) | |
| f(14) | |
| print(rcp::rcp_get_types_df("f")) | |
| library(rcp) | |
| g <- function(x, y) { | |
| cat(x, y, "\n") | |
| x | |
| } | |
| g = rcp::rcp_cmpfun(g, list(name = "g")) | |
| g(34, "hello") | |
| g(1L, "world!") | |
| g("Nope", 456) | |
| print(rcp::rcp_get_types_df("g")) | |
| library(rcp) | |
| h <- function(a, ...) { | |
| cat(a, ..., "\n") | |
| } | |
| h = rcp::rcp_cmpfun(h, list(name = "h")) | |
| h(1, "hello") | |
| h("world", 4, "three") | |
| h(4L, t = 89) | |
| print(rcp::rcp_get_types_df("h")) | |
| library(rcp) | |
| p <- function(x, y) { | |
| cat(x, y, "\n") | |
| y | |
| } | |
| p <- rcp::rcp_cmpfun(p, list(name = "p")) | |
| p(1, "hello") | |
| p(y = 3, x = "world") | |
| print(rcp::rcp_get_types_df("p")) | |
| } |
| if(hooks_registry != R_NilValue) | ||
| types_of_function(bytecode, bytecode_size, &plugins, hooks_registry, name, formals); | ||
|
|
||
| // Example of adding a plugin stencil to all stencil at beggining and end of the function: |
There was a problem hiding this comment.
Typo in comment: "beggining" -> "beginning".
| // Example of adding a plugin stencil to all stencil at beggining and end of the function: | |
| // Example of adding a plugin stencil to all stencil at beginning and end of the function: |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
… when compiling a package
… the package environment
It makes it possible to look at the environment (and the return value) of a function at entry and exit.
Eventually, we can reproduce the instrumentation performed with R-dyntrace (but without having to modify the R codebase), or with
inject(but much faster).It now uses the new plugin stencils.
The plugin stencils take a
dataparameter but that must be a pointer that always points to the same address.The type tracer needs to store an unknown number of elements in its trace so the design uses a manually allocated growable array with
mallocto store them, and this is then put into a R external pointer type.Limitations
<unknown>are not traced for types, as we would mix the types of many different functions.Some outputs
There are 2 helpers to get the types at the end:
rcp_get_types_df: returns a data frame with the argument names as columns, plus the number of arguments in dots, and the return value for a given functionrcp_get_types: returns an environment where keys are function names and the value is a list of type results per calls. A type result itself is a named lists with 3 elements,arguments(a named list of the argument types),dots_count, andretOrder of arguments does not matter
Output:
Dot argument is correctly handled
Output:
dots_countis the number of arguments in...for the call. For those arguments, the displayed name is its tag name if it exists, and then its position (with the..iconvention).TODO
injectr