Skip to content

Association Graph

Varun Pratap Bhardwaj edited this page Mar 30, 2026 · 1 revision

Association Graph: Spreading Activation and Auto-Linking

The Association Graph introduces multi-hop reasoning to SLM --- the ability to retrieve fact C when you query for fact A, even when A and C share no direct connection, because A links to B and B links to C. This is accomplished without any LLM calls at query time, using the SYNAPSE spreading activation algorithm as a 5th retrieval channel.

What It Does

Three interconnected capabilities work together:

  1. Spreading Activation (SYNAPSE algorithm) --- A 5th retrieval channel that propagates activation energy through the memory graph, reaching facts that keyword and vector search cannot find.

  2. Auto-Linking (A-MEM pattern) --- When a new fact is stored, SLM automatically finds semantically similar existing facts (cosine similarity >= 0.7) and creates association edges between them.

  3. Hebbian Strengthening --- When facts are recalled together, the edges between them are strengthened by +0.05, following the neuroscience principle: "neurons that fire together wire together."

Additionally, GraphAnalyzer computes PageRank importance scores and Label Propagation community IDs for structural analysis of the memory graph.

The SYNAPSE Algorithm

SYNAPSE (arXiv 2601.02744) defines a 5-step spreading activation algorithm adapted for SLM. With default parameters (M=7, T=3), the computation involves approximately 21 neighbor lookups --- on SQLite with proper indexes, this completes in under 5ms.

Step 1: Initialization

Seed nodes are selected from VectorStore KNN results. Each seed receives an initial activation proportional to its similarity to the query:

a_i^(0) = ALPHA * similarity(embedding_i, query_embedding)

ALPHA = 1.0 is the seed scaling factor.

Step 2: Propagation with Fan Effect

Activation energy spreads through edges to neighboring nodes:

u_i^(t+1) = DELTA * a_i^(t) + S * SUM_j [ (w_ji / deg_out(j)) * a_j^(t) ]
  • DELTA = 0.5 --- self-retention (how much activation a node keeps)
  • S = 0.8 --- spreading factor (energy diffusion rate)
  • w_ji --- edge weight between nodes
  • deg_out(j) --- out-degree normalization (fan effect: highly connected nodes spread less per edge)

Step 3: Lateral Inhibition

Only the top-M=7 highest-activation nodes survive each iteration. All others are pruned to zero. This prevents activation from diffusing uniformly across the entire graph.

Step 4: Nonlinear Sigmoid Gating

Surviving activations pass through a sigmoid function with threshold shift:

a_i^(t+1) = sigmoid(u_i^(t+1) - THETA)

where sigmoid(x) = 1 / (1 + exp(-x)), THETA = 0.5

This introduces nonlinearity, sharpening the distinction between weakly and strongly activated nodes.

Step 5: Iterate

Steps 2--4 repeat for T=3 iterations. After the final iteration, a FOK (Feeling-of-Knowing) gate rejects results where the maximum activation is below TAU_GATE = 0.12.

Parameters

Parameter Value Description
ALPHA 1.0 Seed scaling factor
DELTA 0.5 Self-retention per iteration
S (spreading_factor) 0.8 Energy diffusion rate
THETA 0.5 Sigmoid activation threshold
M (top_m) 7 Max active nodes per iteration (lateral inhibition)
T (max_iterations) 3 Propagation depth
TAU_GATE 0.12 FOK confidence gate

These parameters originate from the SYNAPSE paper, which tuned on 384-dimensional embeddings (all-MiniLM-L6-v2). SLM uses 768-dimensional embeddings (nomic-embed-text). A calibration test verifies convergence at 768d; if it fails, the parameters are recalibrated.

The 5th Retrieval Channel

SLM v3.1 uses 4 retrieval channels fused via Reciprocal Rank Fusion (RRF):

  1. Semantic (Fisher-Rao similarity)
  2. BM25 (keyword)
  3. Entity Graph (structural)
  4. Temporal (time-weighted)

V3.2 adds spreading activation as the 5th channel. It registers with the ChannelRegistry and participates in the same RRF fusion pipeline. When disabled (default), it returns empty results, and RRF handles this gracefully.

Channel weights are configurable per retrieval strategy:

Strategy Spreading Activation Weight When to Use
multi_hop 2.0 (highest) "What relates to X?" queries
general 1.0 (default) Standard retrieval
temporal 0.5 (low) Time-focused queries
opinion 0.5 (low) Preference queries
factual 0.8 Direct fact queries
entity 1.0 Entity-centric queries

Auto-Linking

When a new fact is stored, the AutoLinker:

  1. Queries VectorStore for the top-N most similar existing facts
  2. Filters candidates with cosine similarity >= 0.7
  3. Creates bidirectional association_edges (type: auto_link)
  4. Triggers memory evolution: linked old facts get their contextual descriptions updated to reflect the new connection

Auto-linking creates edges in the association_edges table only --- it never writes to the existing graph_edges table. This separation is by design:

Edge Table Written By Contains
graph_edges GraphBuilder (existing) Encoding-time structural edges (entity overlap, temporal proximity, semantic similarity, causal)
association_edges AutoLinker (new) Runtime behavioral edges (auto-links, Hebbian co-access, consolidation bridges)

Spreading activation reads both tables via a UNION query, giving it full visibility across structural and behavioral connections.

Hebbian Strengthening

After every recall, facts that appear together in the results have their shared edges strengthened:

new_weight = min(1.0, current_weight + 0.05)  # Capped at 1.0
co_access_count += 1

Over time, this causes frequently co-recalled facts to develop stronger connections, improving future multi-hop retrieval for related queries.

Conversely, edges that are not strengthened for 30+ days undergo exponential decay:

new_weight = current_weight * exp(-0.01 * days_inactive)
# Edges below weight 0.05 are deleted

This prevents the graph from growing without bound and ensures that stale connections naturally fade.

GraphAnalyzer: PageRank and Community Detection

The GraphAnalyzer computes two structural metrics over the combined graph (both graph_edges and association_edges):

PageRank (via networkx): Global structural importance. Facts that are highly connected and linked to by other important facts receive higher PageRank scores. This acts as a prior on fact importance, independent of any specific query.

Community Detection (Label Propagation via networkx): Groups facts into clusters based on graph structure. Community IDs enable cluster-aware retrieval and help the Consolidation engine identify related fact groups.

Both metrics are stored in the fact_importance table and recomputed during consolidation, not at query time.

Mode A Behavior

Without an embedding provider, the following features are disabled in Mode A:

  • Spreading Activation: Disabled (requires seed vectors from VectorStore)
  • Auto-Linking: Disabled (requires cosine similarity computation)

The following features remain active in Mode A:

  • PageRank: Enabled (pure graph math over existing edges)
  • Community Detection: Enabled (Label Propagation on graph structure)
  • Hebbian Strengthening: Partially enabled (works on manually created edges)

If Mode A has sentence-transformers installed, all features activate normally.

Configuration

Enable the association graph features:

slm config set spreading_activation.enabled true

Adjust SYNAPSE parameters (advanced):

slm config set spreading_activation.alpha 1.0
slm config set spreading_activation.delta 0.5
slm config set spreading_activation.spreading_factor 0.8
slm config set spreading_activation.theta 0.5
slm config set spreading_activation.top_m 7
slm config set spreading_activation.max_iterations 3
slm config set spreading_activation.tau_gate 0.12

Multi-Hop Example

Consider three facts in memory:

  • Fact A: "We use PostgreSQL 15 for the production database."
  • Fact B: "PostgreSQL connection pooling is configured via PgBouncer."
  • Fact C: "PgBouncer sessions should be set to transaction mode for serverless."

There is no direct connection between Fact A and Fact C. But:

  1. Fact A links to Fact B (shared entity: PostgreSQL)
  2. Fact B links to Fact C (shared entity: PgBouncer)

When you ask "What database configuration do we use?", spreading activation:

  1. Seeds with Fact A (highest similarity to "database configuration")
  2. Spreads to Fact B (neighbor of A)
  3. Spreads to Fact C (neighbor of B)
  4. Returns all three facts, ranked by activation score

This is multi-hop reasoning without an LLM call at query time.

Relationship to Existing BridgeDiscovery

SLM v3.1 includes BridgeDiscovery, which finds intermediate facts that connect disconnected retrieval results. The new SpreadingActivation is a different mechanism:

BridgeDiscovery (v3.1) SpreadingActivation (v3.2)
Role Post-retrieval enrichment Retrieval channel (5th)
When After initial retrieval, fills gaps During retrieval, as part of RRF
Reads graph_edges only Both graph_edges + association_edges
Algorithm Custom decay with typed mu multipliers SYNAPSE 5-step with lateral inhibition

Both coexist without conflict.


Part of Qualixar | Created by Varun Pratap Bhardwaj

Clone this wiki locally