v3.0.0b3 – Notebook Demo Update & Bug Fixes by jlarson4 · Pull Request #1196 · TransformerLensOrg/TransformerLens

jlarson4 · 2026-03-10T22:56:18Z

Description

Updating Notebooks to work with TransformerLens v3. Resolving and discovered bugs and confirming that all architectures are still performing as expected

Type of change

Bug fix (non-breaking change which fixes an issue)

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

TransformerBridge.add_hook() now accepts a callable filter `(str) -> bool` as the name parameter, matching the HookedTransformer API. When a callable is passed, the hook is added to every hook point whose name satisfies the filter. This was already supported in run_with_hooks() but missing from add_hook(), causing AttributeError when migrating notebooks that use filter-based hook registration.

… in-place ops (#1187)

…lding after Layer Norm application (#1188) * Fixed bug where stale joint QKV is being used instead of the correct split weights * Format fixes * Fixing typing issues

* HF optimizes that batch size information, this converts it to the true batch size to ensure replicable information * Fixed bug for read and clear only

* Fixing issue with storing 3D tensor for hook_result when a 4D tensor is expexted * Restore hook_result

#1014) * updated loading in exploratory analysis demo to use transformer bridge * updated loading in exploratory analysis demo to use transformer bridge * Fix boot_transformers kwargs and clear stale outputs - Move weight processing args (center_unembed, fold_ln, etc.) from boot_transformers() to enable_compatibility_mode() where they belong - Clear stale outputs from cell with execution_count=None Notebook blocked on missing TransformerBridge features: W_U property delegation (Bug 6), tokens_to_residual_directions (Bug 7), and pos_embed batch dim mismatch (Bug 3). See .claude/plans/transformer_bridge_bugs.md. * Work on exploratory analysis demo updates * Removed inline bug fix code in favorite of systemic fixes * Additional bug resolution * More bug fixes --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com> Co-authored-by: jlarson4 <jonahalarson@comcast.net>

…dge (#1021) * updated loading in patchscopes generation demo to use transformer bridge * Migrate Patchscopes Generation Demo to TransformerBridge - Replace HookedTransformer with TransformerBridge.boot_transformers() - Fix deprecated ipython.magic() to ipython.run_line_magic() - Clear stale outputs from unrun cells All 20 cells pass locally. * Fixes to ensure functionality with v3.x --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com> Co-authored-by: jlarson4 <jonahalarson@comcast.net>

* updated loading in exploratory analysis demo to use transformer bridge * updated loading in exploratory analysis demo to use transformer bridge * Fix boot_transformers kwargs and clear stale outputs - Move weight processing args (center_unembed, fold_ln, etc.) from boot_transformers() to enable_compatibility_mode() where they belong - Clear stale outputs from cell with execution_count=None Notebook blocked on missing TransformerBridge features: W_U property delegation (Bug 6), tokens_to_residual_directions (Bug 7), and pos_embed batch dim mismatch (Bug 3). See .claude/plans/transformer_bridge_bugs.md. * Work on exploratory analysis demo updates * Removed inline bug fix code in favorite of systemic fixes * Additional bug resolution * More bug fixes --------- Co-authored-by: degenfabian <fabian.degen@tuta.com> Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

* updated loading in attribution patching demo to use transformer bridge * updated loading in bert demo to use transformer bridge * Update to allow NSP via bridge * Format and type fixes * Add import * Attribution Patching moved to own branch * Hiding Attribution patching until its own PR --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com> Co-authored-by: jlarson4 <jonahalarson@comcast.net>

* updating loading in qwen demo to use transformer bridge * add qwen demo to CI * Updating Qwen Notebook for TransformerLens 3.x * Changing model to fit in CI --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com> Co-authored-by: jlarson4 <jonahalarson@comcast.net>

…#1011) * updated loading in Activation Patching in TL Demo to use transformer bridge * use undeprecated ipython code to avoid deprecation warnings * revert metadata changes * updated installation source * Fix notebook CI: skip widget MIME type comparison and clear stale cell output Add application/vnd.jupyter.widget-view+json to conftest.py skip_compare to avoid false failures from random widget model_id values. Clear outputs from unrun cell (execution_count=null) in Activation Patching demo. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Neel Plotly import does not run in CI * Extend processing time for slow cell * Added NBVAL_SKIP for long running process that cant pass CI --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com> Co-authored-by: jlarson4 <jonahalarson@comcast.net> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* updating loading in t5 demo to use transformer bridge * add T5 demo to CI * Adjusting system to properly account for encoder-decoder * Cleanup * Small repairs * Updating doc_sanitize * Activation patching update * Final cleanup for activation patching * device cleanup --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com> Co-authored-by: jlarson4 <jonahalarson@comcast.net>

#1013) * updated loading in attribution patching demo to use transformer bridge * updated loading in attribution patching demo to use transformer bridge * Replace deprecated torchtyping import and clear stale cell outputs Replace `from torchtyping import TensorType as TT` with a lightweight stub class since torchtyping is not in project dependencies. Clear outputs from cells with execution_count=null. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Replace pysvelte with circuitsvis for attention visualization pysvelte was never imported in the notebook. Replace pysvelte.AttentionMulti with cv.attention.attention_heads from circuitsvis, which is already a project dependency. * Use run_with_cache for forward caching, clean up stale outputs - Replace manual forward cache hooks with model.run_with_cache() which handles hook alias resolution automatically - Keep manual backward hooks for gradient caching (no built-in method) - Add alias entries for grad_cache to fix hook.name mismatch - Clear stale stderr output (DeprecationWarning for ipython.magic) - Clear stale error output (torchtyping ModuleNotFoundError) - Clear stale Cell 18 output (cache counts differ with TransformerBridge) Note: Notebook is blocked on TransformerBridge bugs documented in .claude/plans/transformer_bridge_bugs.md (pos_embed batch dim, cache aliasing, MPS placeholder storage). Will revisit after upstream fixes. * Updates to Attribution Patching notebook for TransformerLens v5 * Skip excessive forward pass test, too long for CI * Fixing output bug * Additional notebook changes * Rerunning the notebook * Running the notebook again to get correct outputs * Another attempt --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com> Co-authored-by: jlarson4 <jonahalarson@comcast.net> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

jlarson4 and others added 20 commits February 27, 2026 13:28

Update Patching Hook to avoid causing conflicts with Torch not liking…

0edcb8d

… in-place ops (#1187)

Prevent Stale Joint QKV values from being incorporated into weight fo…

72d37b9

…lding after Layer Norm application (#1188) * Fixed bug where stale joint QKV is being used instead of the correct split weights * Format fixes * Fixing typing issues

Updated to remove hardcoded .cpu() processing (#1189)

a212f71

Return true initial batch size information (#1190)

6662189

* HF optimizes that batch size information, this converts it to the true batch size to ensure replicable information * Fixed bug for read and clear only

hook_result & Hook Aliases issues (#1191)

a601233

* Fixing issue with storing 3D tensor for hook_result when a 4D tensor is expexted * Restore hook_result

Review and benchmarking

fb9f46a

Coment cleanup

861beea

fixing format bug

ba1c64a

CI fixes

c0f62dd

Updat attribution patching to run with swap space

d5512d3

Updating

747e0f9

jlarson4 force-pushed the dev-3.x-canary branch from a08043b to fc68fe6 Compare March 11, 2026 04:13

jlarson4 changed the title ~~Dev 3.x canary~~ v3.0.0b3 – Notebook Demo Update & Bug Fixes Mar 11, 2026

Trying to get attributution patching to pass

e6b72eb

jlarson4 force-pushed the dev-3.x-canary branch from fc68fe6 to e6b72eb Compare March 11, 2026 05:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v3.0.0b3 – Notebook Demo Update & Bug Fixes#1196

v3.0.0b3 – Notebook Demo Update & Bug Fixes#1196
jlarson4 wants to merge 21 commits intodev-3.xfrom
dev-3.x-canary

jlarson4 commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jlarson4 commented Mar 10, 2026

Description

Type of change

Checklist:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants