Adding entry and exit hooks for functions by programLyrique · Pull Request #9 · PRL-PRG/rcp

programLyrique · 2026-02-19T16:02:14Z

It makes it possible to look at the environment (and the return value) of a function at entry and exit.
Eventually, we can reproduce the instrumentation performed with R-dyntrace (but without having to modify the R codebase), or with inject (but much faster).

It now uses the new plugin stencils.

The plugin stencils take a data parameter but that must be a pointer that always points to the same address.
The type tracer needs to store an unknown number of elements in its trace so the design uses a manually allocated growable array with malloc to store them, and this is then put into a R external pointer type.

Limitations

The type tracing is only performed at the exit hook, as more of the arguments will have been forced by then. It means that we might incorrectly find out the type of an argument if this one is reassigned to another type in the body of the function. It would be possible to peek at the promises at entry and then check if the type changes but it becomes more involved.
Function whose name is <unknown> are not traced for types, as we would mix the types of many different functions.

Some outputs

There are 2 helpers to get the types at the end:

rcp_get_types_df : returns a data frame with the argument names as columns, plus the number of arguments in dots, and the return value for a given function
rcp_get_types: returns an environment where keys are function names and the value is a list of type results per calls. A type result itself is a named lists with 3 elements, arguments (a named list of the argument types), dots_count, and ret

Order of arguments does not matter

p <- function(x, y) {
  cat(x, y, "\n")
  y
}
p <- rcp::rcp_cmpfun(p, list(name = "p"))
p(1, "hello")
p(y=3, x="world")
print(rcp::rcp_get_types_df("p"))

Output:

          x         y dots_count       ret
1    double character          0 character
2 character    double          0    double

Dot argument is correctly handled

library(rcp)
h <- function(a, ...) {
  cat(a, ..., "\n")
}
h = rcp::rcp_cmpfun(h, list(name = "h"))
h(1, "hello")
h("world", 4, "three")
h(4L, t=89)
print(rcp::rcp_get_types_df("h"))

Output:

          a       ..1       ..2   t..1 dots_count  ret
1    double character      <NA>   <NA>          1 NULL
2 character    double character   <NA>          2 NULL
3   integer      <NA>      <NA> double          1 NULL

dots_count is the number of arguments in ... for the call. For those arguments, the displayed name is its tag name if it exists, and then its position (with the ..i convention).

TODO

basic entry and exit hooks
support collecting types
- normal parameters
- dots parameters (expand them to get the actual parameters inside?)
test it
benchmark it: with and without hooks, vs R-dyntrace, vs injectr

…environment and at the return value.

programLyrique · 2026-02-19T16:17:24Z

When testing with actual hooks (not NULL), it does not work because the function pointer for _RCP_INIT_HOOK does not fit into a patch hole.

programLyrique · 2026-02-19T17:14:05Z

It works fine with the large memory model.

Matej told me about the new plugin pencils, which sounds like a perfect fit for the entry and exit hooks I want to add.
They should get a way to add them before a particular type of bytecode, e.g. RETURN and RETURNJMP instead of only by position, to make it even easier.

…exit-hooks

…onment

Locals are prepended in the FRAME, so we just save the first argument name in the entry hook and then scan the FRAME up to it to indentify the beginning of the parameters

…le to run the functional tests only (not the benchmarks, which takes ages)

Copilot

Pull request overview

This PR adds a plugin-stencil based entry/exit hook mechanism to instrument compiled R functions, and builds a first “type tracing” feature on top that records argument/.../return SEXPTYPE information and exposes it through new R APIs.

Changes:

Introduces plugin stencil injection (with per-stencil custom data pointers) into the native code generation pipeline.
Adds entry/exit hook stencils and a type-tracing implementation backed by a growable malloc’d trace stored in an external pointer.
Adds new exported R APIs (rcp_get_types(), rcp_get_types_df()) plus a new functional test suite under rcp/tests/types.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
rcp/src/compile.c	Core implementation of plugin stencil injection, coverage/type instrumentation wiring, and new C APIs for retrieving traced types.
rcp/src/stencils/stencils.c	Adds CUSTOM_DATA plumbing plus entry/exit hook stencils and type collection at exit.
rcp/src/rcp_hooks.h	New header defining `TypeTrace`/`TypeRecord` structures shared between compiler and stencils.
rcp/src/extractor/extract_stencils.cpp	Updates relocation parsing to recognize CUSTOM_DATA and GOT-based runtime symbol relocations.
rcp/src/rcp_common.h	Renames relocation kind to `RELOC_RCP_CUSTOM`.
rcp/src/rcp_init.c	Registers new `.Call` entry points for type retrieval.
rcp/R/compile.R	Adds exported R wrappers and roxygen docs for the new APIs.
rcp/NAMESPACE	Exports `rcp_get_types` and `rcp_get_types_df`.
rcp/tests/types/basic.R	Adds functional tests validating tracing across fixed args, `...`, and named-arg reordering.
rcp/tests/types/Makefile	Adds a simple harness to run the new types tests.
rcp/tests/Makefile	Includes the new `types` test directory.
rcp/Makefile	Adds `test-functional` target including the new types tests.
rcp/code.R	Adds an example/scratch script demonstrating the new APIs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-24T17:26:06Z

rcp/src/compile.c

@@ -1296,15 +1364,32 @@ static rcp_exec_ptrs copy_patch_internal(int bytecode[], int bytecode_size,

 		const Stencil *stencil = get_stencil(opcode, opargs, constpool);

-		stencil_variants[bc_pos] = (int)(stencil - stencils[opcode]);
+		uint8_t *pos = inst_start[bc_pos];
+		// pos is already aligned in context of native code generation, but stencils might require additional alignment, so we need to align it again here
+
+		for (; p < plugin_size && plugins[p].pos == bc_pos; p++)
+		{
+			DEBUG_PRINT("Patching plugin %d at bytecode position %d\n", p, bc_pos);
+			const PluginStencil *plugin = &plugins[p];
+			const Stencil *plugin_stencil = plugin->stencil;
+
+			pos = (uint8_t *)align_to_higher((uintptr_t)pos, plugin_stencil->alignment);
+
+			memcpy(pos, plugin_stencil->body, plugin_stencil->body_size);
+			for (size_t k = 0; k < plugin_stencil->holes_size; ++k)
+				patch(pos, pos, bc_pos, plugin_stencil, &plugin_stencil->holes[k], k, opargs, bc_pos + RCP_BC_ARG_CNT[bytecode[bc_pos]] + 1, plugin->data, &ctx);
+			pos += plugin_stencil->body_size;
+		}


The OpenMP copy/patch loop uses a shared iterator variable p to walk the plugins array. Because the loop is parallel, p becomes a data race and can cause missed/duplicated plugin patches, memory corruption, or crashes. Make plugin lookup thread-safe (e.g., precompute per-bc_pos plugin ranges, or remove p and scan plugins for each bc_pos, or move plugin patching out of the parallel region).

Copilot · 2026-02-24T17:26:06Z

rcp/src/compile.c

+		coverage_registry = PROTECT(Rf_findVarInFrame(covr_ns, Rf_install(".counters")));
+		coverage_registry = eval(coverage_registry, covr_ns); // In case it's a promise
+		UNPROTECT_SAFE(coverage_registry);					  // Is it safe?


coverage_registry is PROTECTed, then overwritten with eval(...), and then passed to UNPROTECT_SAFE(coverage_registry). This breaks the UNPROTECT_SAFE assertion (top-of-stack is the pre-eval object) and is confusing even when asserts are off. Keep the protected object in a separate variable and UNPROTECT the exact SEXP you PROTECTed, or PROTECT the eval result explicitly if needed.

Suggested change

coverage_registry = PROTECT(Rf_findVarInFrame(covr_ns, Rf_install(".counters")));

coverage_registry = eval(coverage_registry, covr_ns); // In case it's a promise

UNPROTECT_SAFE(coverage_registry); // Is it safe?

SEXP coverage_var = PROTECT(Rf_findVarInFrame(covr_ns, Rf_install(".counters")));

coverage_registry = eval(coverage_var, covr_ns); // In case it's a promise

UNPROTECT_SAFE(coverage_var);

Copilot · 2026-02-24T17:26:06Z

rcp/src/compile.c

+	int used_expressions[len];
+	int used_expr_ids[len]; // store expr_ids in discovery order


srcref_coverage() allocates used_expressions[len] and used_expr_ids[len] as VLAs on the C stack, where len is the bytecode length. For large functions this can overflow the stack. Prefer heap/R allocators (e.g., R_alloc/S_alloc) for these arrays.

Suggested change

int used_expressions[len];

int used_expr_ids[len]; // store expr_ids in discovery order

int *used_expressions = (int *) R_alloc(len, sizeof(int));

int *used_expr_ids = (int *) R_alloc(len, sizeof(int)); // store expr_ids in discovery order

Copilot · 2026-02-24T17:26:07Z

rcp/src/stencils/stencils.c

+	// Resize if needed
+	if (trace->count >= trace->capacity) {
+		trace->capacity *= 2;
+		trace->types = realloc(trace->types, trace->capacity * sizeof(TypeRecord));
+	}


The type tracing hooks use malloc/realloc without checking for allocation failure. In particular, realloc directly assigns back to trace->types, which can lose the original pointer on failure. Use a temporary pointer, check for NULL, and error cleanly (or keep the old buffer).

rcp/src/rcp_hooks.h

Copilot · 2026-02-24T17:26:08Z

rcp/src/compile.c

+	if (opcode == GOTO_BCOP)
+	{
+		int target = imms[0] - 1;
+		DEBUG_PRINT("Peephole optimization: Simplifying unncessary trampoline jump from bytecode %d to target %d\n", index, target);


Typo in debug message string: "unncessary" -> "unnecessary".

Suggested change

DEBUG_PRINT("Peephole optimization: Simplifying unncessary trampoline jump from bytecode %d to target %d\n", index, target);

DEBUG_PRINT("Peephole optimization: Simplifying unnecessary trampoline jump from bytecode %d to target %d\n", index, target);

Copilot · 2026-02-24T17:26:08Z

rcp/src/compile.c

+	}
+
+	// get the types environment from the hooks_registry 
+	// Should not need to protect as types is already in an enviornment known by the GC


Typo in comment: "enviornment" -> "environment".

Suggested change

// Should not need to protect as types is already in an enviornment known by the GC

// Should not need to protect as types is already in an environment known by the GC

rcp/src/stencils/stencils.c

Copilot · 2026-02-24T17:26:09Z

rcp/code.R

+options(rcp.cmpfun.entry_exit_hooks = TRUE)
+library(rcp)
+fib <- function(x) {
+  if (x == 0) 0
+  else if (x == 1) 1
+  else fib(x-2) + fib(x-1)
+}
+
+fib = rcp::rcp_cmpfun(fib, list(name="fib"))
+fib(10)
+print(rcp::rcp_get_types_df("fib"))
+
+library(rcp)
+test <- function(x) {
+  if (x == 0) x=10
+  else x=11
+  x
+}
+
+test =rcp::rcp_cmpfun(test);
+test(1)
+
+exec <- function(x) {
+ 1
+}
+
+
+tmp = rcp::rcp_cmpfun(exec)
+
+
+
+exec <- function(x) {
+repeat {
+                              next
+                      }
+}
+
+library(rcp)
+f <- function(x) {
+  y <- x + 1 
+  if(y > 0){
+    z <- x - 1
+  }
+  else {
+    z <- x + 1
+  }
+  y <- z / y
+  z
+}
+f = rcp::rcp_cmpfun(f, list(name="f"))
+f(14)
+print(rcp::rcp_get_types_df("f"))
+
+library(rcp)
+
+g <- function(x, y) {
+  cat(x, y, "\n")
+  x
+}
+g = rcp::rcp_cmpfun(g, list(name = "g"))
+g(34, "hello")
+g(1L, "world!")
+g("Nope", 456)
+print(rcp::rcp_get_types_df("g"))
+
+library(rcp)
+h <- function(a, ...) {
+  cat(a, ..., "\n")
+}
+h = rcp::rcp_cmpfun(h, list(name = "h"))
+h(1, "hello")
+h("world", 4, "three")
+h(4L, t=89)
+print(rcp::rcp_get_types_df("h"))
+
+library(rcp)
+p <- function(x, y) {
+  cat(x, y, "\n")
+  y
+}
+p <- rcp::rcp_cmpfun(p, list(name = "p"))
+p(1, "hello")
+p(y=3, x="world")
+print(rcp::rcp_get_types_df("p"))


rcp/code.R looks like an ad-hoc scratch/demo script (sets options, compiles/runs examples) and is not referenced by build/test tooling. Keeping this in the package root will ship it to users and makes maintenance harder. Consider removing it, or moving the content into a vignette, README example, or tests.

Suggested change

options(rcp.cmpfun.entry_exit_hooks = TRUE)

library(rcp)

fib <- function(x) {

if (x == 0) 0

else if (x == 1) 1

else fib(x-2) + fib(x-1)

}

fib = rcp::rcp_cmpfun(fib, list(name="fib"))

fib(10)

print(rcp::rcp_get_types_df("fib"))

library(rcp)

test <- function(x) {

if (x == 0) x=10

else x=11

x

}

test =rcp::rcp_cmpfun(test);

test(1)

exec <- function(x) {

1

}

tmp = rcp::rcp_cmpfun(exec)

exec <- function(x) {

repeat {

next

}

}

library(rcp)

f <- function(x) {

y <- x + 1

if(y > 0){

z <- x - 1

}

else {

z <- x + 1

}

y <- z / y

z

}

f = rcp::rcp_cmpfun(f, list(name="f"))

f(14)

print(rcp::rcp_get_types_df("f"))

library(rcp)

g <- function(x, y) {

cat(x, y, "\n")

x

}

g = rcp::rcp_cmpfun(g, list(name = "g"))

g(34, "hello")

g(1L, "world!")

g("Nope", 456)

print(rcp::rcp_get_types_df("g"))

library(rcp)

h <- function(a, ...) {

cat(a, ..., "\n")

}

h = rcp::rcp_cmpfun(h, list(name = "h"))

h(1, "hello")

h("world", 4, "three")

h(4L, t=89)

print(rcp::rcp_get_types_df("h"))

library(rcp)

p <- function(x, y) {

cat(x, y, "\n")

y

}

p <- rcp::rcp_cmpfun(p, list(name = "p"))

p(1, "hello")

p(y=3, x="world")

print(rcp::rcp_get_types_df("p"))

# Example/demo script for rcp.

# To run these examples manually, call run_rcp_examples().

run_rcp_examples <- function() {

options(rcp.cmpfun.entry_exit_hooks = TRUE)

library(rcp)

fib <- function(x) {

if (x == 0) 0

else if (x == 1) 1

else fib(x - 2) + fib(x - 1)

}

fib = rcp::rcp_cmpfun(fib, list(name = "fib"))

fib(10)

print(rcp::rcp_get_types_df("fib"))

library(rcp)

test <- function(x) {

if (x == 0) x = 10

else x = 11

x

}

test = rcp::rcp_cmpfun(test);

test(1)

exec <- function(x) {

1

}

tmp = rcp::rcp_cmpfun(exec)

exec <- function(x) {

repeat {

next

}

}

library(rcp)

f <- function(x) {

y <- x + 1

if (y > 0) {

z <- x - 1

}

else {

z <- x + 1

}

y <- z / y

z

}

f = rcp::rcp_cmpfun(f, list(name = "f"))

f(14)

print(rcp::rcp_get_types_df("f"))

library(rcp)

g <- function(x, y) {

cat(x, y, "\n")

x

}

g = rcp::rcp_cmpfun(g, list(name = "g"))

g(34, "hello")

g(1L, "world!")

g("Nope", 456)

print(rcp::rcp_get_types_df("g"))

library(rcp)

h <- function(a, ...) {

cat(a, ..., "\n")

}

h = rcp::rcp_cmpfun(h, list(name = "h"))

h(1, "hello")

h("world", 4, "three")

h(4L, t = 89)

print(rcp::rcp_get_types_df("h"))

library(rcp)

p <- function(x, y) {

cat(x, y, "\n")

y

}

p <- rcp::rcp_cmpfun(p, list(name = "p"))

p(1, "hello")

p(y = 3, x = "world")

print(rcp::rcp_get_types_df("p"))

}

Copilot · 2026-02-24T17:26:09Z

rcp/src/compile.c

+	if(hooks_registry != R_NilValue)
+		types_of_function(bytecode, bytecode_size, &plugins, hooks_registry, name, formals);
+
+	// Example of adding a plugin stencil to all stencil at beggining and end of the function:


Typo in comment: "beggining" -> "beginning".

Suggested change

// Example of adding a plugin stencil to all stencil at beggining and end of the function:

// Example of adding a plugin stencil to all stencil at beginning and end of the function:

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…exit-hooks

… when compiling a package

… the package environment

MatejKocourek and others added 4 commits February 10, 2026 19:11

Optimize unnecessarry GOTO trampoline jumps

a6759ba

Plugin stencils

c42e2bf

WIP srcrefcoverage

d936e3d

Adding entry and exit hooks for functions, to be able to look at the …

6ff5676

…environment and at the return value.

programLyrique requested review from MatejKocourek and fikovnik February 19, 2026 16:02

programLyrique self-assigned this Feb 19, 2026

programLyrique added the enhancement New feature or request label Feb 19, 2026

Made to work with covr

deea3fc

Switching to large memory model

fc2e44b

programLyrique and others added 15 commits February 20, 2026 15:09

Merge branch 'matej-code-coverage' into entry-exit-hooks

236b183

Plugin stencils more general

3194176

Put back the stats attribute

27a616a

Merge remote-tracking branch 'origin/matej-code-coverage' into entry-…

53f0a70

…exit-hooks

Basic infrastructure to use the plugin stencils

b19457c

Exit hook to record types of arguments and return value

2cb1dce

Add functions to get the results of the type tracing

94650c6

Correctly deal with immediate binding cells in the frame of the envir…

13c6d0c

…onment

Only scan the parameters in the exit hook, not the locals

ca971ff

Locals are prepended in the FRAME, so we just save the first argument name in the entry hook and then scan the FRAME up to it to indentify the beginning of the parameters

Now also output parameter names when outputting the types

908c1a2

More examples

bbed5a6

Add types test suite and update Makefile to include it

83a5b8d

Correctly handle ... (dots) argument

8f15653

Refactor

73bb375

Add more tests for the testing and add a specific entry in the Makefi…

19950ed

…le to run the functional tests only (not the benchmarks, which takes ages)

programLyrique requested a review from Copilot February 24, 2026 17:17

Copilot started reviewing on behalf of programLyrique February 24, 2026 17:18 View session

Copilot AI reviewed Feb 24, 2026

View reviewed changes

programLyrique and others added 6 commits February 24, 2026 18:34

Update the documentation of rcp_get_types_df

d9be954

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Improve

64d1c9b

Merge branch 'entry-exit-hooks' of github.com:PRL-PRG/rcp into entry-…

69bae80

…exit-hooks

Better API for the rcp_get_types* functions

a729f85

Correctly update the package environment in addition to the namespace…

dabdef7

… when compiling a package

Much shorter way of having the compiled version of a function also in…

6400cc9

… the package environment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding entry and exit hooks for functions#9

Adding entry and exit hooks for functions#9
programLyrique wants to merge 27 commits intomainfrom
entry-exit-hooks

programLyrique commented Feb 19, 2026 •

edited

Loading

Uh oh!

programLyrique commented Feb 19, 2026

Uh oh!

programLyrique commented Feb 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		int used_expressions[len];
		int used_expr_ids[len]; // store expr_ids in discovery order

	DEBUG_PRINT("Peephole optimization: Simplifying unncessary trampoline jump from bytecode %d to target %d\n", index, target);
	DEBUG_PRINT("Peephole optimization: Simplifying unnecessary trampoline jump from bytecode %d to target %d\n", index, target);

	// Should not need to protect as types is already in an enviornment known by the GC
	// Should not need to protect as types is already in an environment known by the GC

	// Example of adding a plugin stencil to all stencil at beggining and end of the function:
	// Example of adding a plugin stencil to all stencil at beginning and end of the function:

Conversation

programLyrique commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Limitations

Some outputs

Order of arguments does not matter

Dot argument is correctly handled

TODO

Uh oh!

programLyrique commented Feb 19, 2026

Uh oh!

programLyrique commented Feb 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

programLyrique commented Feb 19, 2026 •

edited

Loading