feat: implement local LLM client with yzma (purego) by nvandessel · Pull Request #60 · nvandessel/floop

nvandessel · 2026-02-11T07:27:02Z

Summary

Replace llama-go (CGo) with hybridgroup/yzma (purego) for local LLM embeddings
Single local.go file — no build tags, no stub file, no C++ compiler needed
go build always works; shared libs only needed at runtime (yzma install --lib)
Per-call embedding context creation (simpler for low-throughput dedup workload)
Package-level sync.Once for library init, instance-level sync.Once for model loading
Available() does cheap os.Stat checks without loading any libraries

Stacked on

PR feat: re-add local LLM foundations with yzma-ready config #58 (feat/local-llm-foundations-v2) — pure Go foundations
PR revert: remove llama-go local LLM integration #57 (revert/local-llm-llamago) — revert llama-go

Test plan

go build ./cmd/floop — builds cleanly (no yzma libs needed at build time)
go vet ./... — no issues
go test ./... — all 25 packages pass
Integration: FLOOP_TEST_LIB_PATH=/path/to/libs FLOOP_TEST_MODEL_PATH=/path/to/model.gguf go test -tags integration ./internal/llm/ -v

🤖 Generated with Claude Code

greptile-apps · 2026-02-11T07:29:59Z

Greptile Overview

Greptile Summary

Replaces llama-go (CGo) with hybridgroup/yzma (purego) for local LLM embeddings, eliminating C++ compiler requirements at build time.

Key implementation details:

Single local.go file with full yzma integration (no build tags or stub files)
Package-level sync.Once ensures llama.Load() and llama.Init() happen once per process
Instance-level sync.Once for lazy model loading on first Embed() call
Per-call embedding context creation with immediate cleanup (simpler for low-throughput dedup workload)
Thread-safe via sync.Mutex protecting all model operations
Available() performs cheap os.Stat checks without loading libraries
Comprehensive unit tests (error paths, config validation, interface compliance)
Integration tests with build tag and env-based skip conditions

The PR achieves its goal of pure Go builds while maintaining runtime flexibility for local embeddings.

Confidence Score: 4/5

This PR is safe to merge with low risk; the implementation is solid with comprehensive tests and proper resource management
Score reflects well-structured implementation with thread-safe lazy loading, proper error handling, comprehensive test coverage (both unit and integration), and adherence to Go best practices. Minor style suggestion on context padding doesn't affect correctness. The change successfully eliminates CGo dependency while maintaining all functionality.
No files require special attention - all changes follow project conventions and include proper tests

Important Files Changed

Filename	Overview
go.mod	Added `github.com/hybridgroup/yzma v1.7.0` dependency with transitive deps `ebitengine/purego` and `jupiterrider/ffi`; updated `golang.org/x/*` packages
go.sum	Updated checksums for new dependencies and updated Go standard library packages
internal/llm/local.go	Implemented full yzma-based embedding client with thread-safe lazy loading, per-call context creation, package-level library init, and proper resource cleanup; one minor style suggestion on context padding
internal/llm/local_test.go	Comprehensive unit tests covering config validation, `Available()` checks, error paths, and interface compliance; tests properly use `t.TempDir()` and `t.Setenv()`
internal/llm/local_integration_test.go	Well-structured integration tests with build tag, clear skip conditions via env vars, and realistic test cases for embeddings and similarity comparisons

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant LC as LocalClient
    participant Lib as yzma Library
    participant Model as GGUF Model

    Note over App,Model: First Call (Lazy Initialization)
    
    App->>LC: Embed(ctx, text)
    LC->>LC: loadModel() [once.Do]
    LC->>Lib: loadLib(libPath) [package-level once]
    Lib-->>LC: library loaded
    LC->>Lib: llama.ModelLoadFromFile()
    Lib->>Model: load model
    Model-->>Lib: model handle
    Lib-->>LC: model, vocab, nEmbd
    
    Note over LC: Model loaded, proceed with embedding
    
    LC->>LC: mu.Lock()
    LC->>Lib: llama.Tokenize(vocab, text)
    Lib-->>LC: tokens
    LC->>Lib: llama.InitFromModel() [per-call context]
    Lib-->>LC: lctx
    LC->>Lib: llama.SetEmbeddings(lctx, true)
    LC->>Lib: llama.Decode(lctx, batch)
    LC->>Lib: llama.GetEmbeddingsSeq(lctx, 0, nEmbd)
    Lib-->>LC: rawVec
    LC->>LC: normalize(vec)
    LC->>Lib: llama.Free(lctx)
    LC->>LC: mu.Unlock()
    LC-->>App: []float32 embedding

    Note over App,Model: Subsequent Calls (Model Already Loaded)
    
    App->>LC: CompareEmbeddings(ctx, a, b)
    LC->>LC: Embed(ctx, a) [reuses loaded model]
    LC->>LC: Embed(ctx, b) [reuses loaded model]
    LC->>LC: CosineSimilarity(embA, embB)
    LC-->>App: float64 similarity

greptile-apps

_{5 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-11T07:30:02Z

internal/llm/local.go

+	tokens := llama.Tokenize(c.vocab, text, true, true)
+
+	ctxParams := llama.ContextDefaultParams()
+	ctxParams.NCtx = uint32(len(tokens) + 16) // fit the text with padding


padding of +16 tokens may be insufficient for special tokens and context overhead

Most embedding models require additional space beyond the raw token count for BOS/EOS tokens, padding tokens, and internal context management. The current +16 may cause allocation issues with longer inputs or models with larger special token sets.

Suggested change

ctxParams.NCtx = uint32(len(tokens) + 16) // fit the text with padding

ctxParams.NCtx = uint32(len(tokens) + 64) // fit the text with padding for special tokens

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

nvandessel · 2026-02-11T16:11:53Z

@greptileai review

greptile-apps · 2026-02-11T16:28:34Z

Greptile Overview

Greptile Summary

This PR replaces the previous (CGo-based) local embedding backend with hybridgroup/yzma (purego) and wires a functional LocalClient implementation in internal/llm/local.go. The client a one-time process-global yzma library init, lazy-loads the GGUF embedding model on first use, and creates/frees a llama context per Embed() call. Unit tests were expanded for Available() and configuration fallbacks, and an integration test file (behind -tags integration) was added to exercise embedding/similarity with real libs and a real model.

The change fits into the existing llm package by keeping MergeBehaviors delegated to the existing FallbackClient while enabling embedding-based CompareBehaviors/CompareEmbeddings when the local runtime is present.

Confidence Score: 3/5

Moderately safe to merge once the local client error/cleanup issues are addressed.
Core wiring is straightforward and test coverage is improved, but Embed() currently ignores decode errors and Close() leaves a freed model handle in a state where the client can be used again unsafely due to sync.Once preventing reload.
internal/llm/local.go

Important Files Changed

Filename	Overview
go.mod	Adds hybridgroup/yzma dependency and updates indirect x/* versions for purego local embeddings.
go.sum	Updates checksums for new yzma/purego/ffi dependencies and bumps x/* module versions.
internal/llm/local.go	Implements yzma-based embedding client with global library init and per-call contexts; has a couple of correctness issues around error handling/cleanup.
internal/llm/local_integration_test.go	Adds integration tests behind build tag that exercise Available/Embed/Compare paths when libs and model are provided.
internal/llm/local_test.go	Expands unit tests for Available/config fallbacks and verifies error behavior when local backend isn't configured.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant LocalClient
    participant L as llama (yzma)

    Caller->>LocalClient: CompareBehaviors(ctx, a, b)
    LocalClient->>LocalClient: CompareEmbeddings(ctx, aText, bText)

    LocalClient->>LocalClient: Embed(ctx, aText)
    LocalClient->>LocalClient: loadModel() [sync.Once]
    LocalClient->>L: Load(libPath) [process once]
    LocalClient->>L: Init()
    LocalClient->>L: ModelLoadFromFile(modelPath)
    LocalClient->>L: ModelGetVocab(model)
    LocalClient->>L: ModelNEmbd(model)

    LocalClient->>L: Tokenize(vocab, text)
    LocalClient->>L: InitFromModel(model, ctxParams)
    LocalClient->>L: SetEmbeddings(lctx, true)
    LocalClient->>L: BatchGetOne(tokens)
    LocalClient->>L: Decode(lctx, batch)
    LocalClient->>L: GetEmbeddingsSeq(lctx, 0, nEmbd)
    LocalClient->>LocalClient: normalize(vec)
    LocalClient->>L: Free(lctx)

    LocalClient->>LocalClient: Embed(ctx, bText)
    Note over LocalClient: Same per-call context flow

    LocalClient->>LocalClient: CosineSimilarity(embA, embB)
    LocalClient-->>Caller: ComparisonResult

greptile-apps

_{5 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-11T16:28:37Z

internal/llm/local.go

+	llama.SetEmbeddings(lctx, true)
+
+	batch := llama.BatchGetOne(tokens)
+	llama.Decode(lctx, batch)


Unchecked decode errors

llama.Decode(lctx, batch) returns an error, but the current code ignores it. If decoding fails (e.g., context too small, invalid tokens, model/runtime issues), GetEmbeddingsSeq may return an error or (worse) succeed with undefined/garbage embeddings. Please capture and propagate the decode error before proceeding.

greptile-apps · 2026-02-11T16:28:38Z

internal/llm/local.go

 func (c *LocalClient) Close() error {
+	c.mu.Lock()
+	defer c.mu.Unlock()
+
+	if c.loaded {
+		llama.ModelFree(c.model)
+		c.loaded = false
+	}


Close doesn’t reset handle

After llama.ModelFree(c.model), c.model remains non-zero and c.once prevents re-loading. If the client is closed and then used again, loadModel() will not run and Embed() will try to create a context from a freed model handle. Consider resetting c.model/c.vocab/c.nEmbd and/or allowing loadModel() to run again after Close() so post-close usage fails cleanly or reloads safely.

Replace llama-go CGo bindings with hybridgroup/yzma, which uses purego to load llama.cpp shared libraries at runtime. No CGo, no build tags, no C++ compiler needed — go build always works. - Single local.go file (no stub), package-level sync.Once for lib init - Per-call context creation in Embed() for simplicity - Available() checks lib dir + model file via os.Stat - Integration tests gated on FLOOP_TEST_LIB_PATH + FLOOP_TEST_MODEL_PATH Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Increase token context padding from +16 to +64 for special tokens - Check llama.Decode() error return instead of ignoring it - Reset model/vocab/nEmbd handles and sync.Once in Close() so post-close usage fails cleanly or reloads safely Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

nvandessel · 2026-02-12T01:52:28Z

@greptileai review

greptile-apps · 2026-02-12T01:55:22Z

Greptile Overview

Greptile Summary

This PR successfully replaces the CGo-based llama-go with hybridgroup/yzma (purego) for local LLM embeddings, eliminating C++ compiler dependencies while maintaining full functionality.

Key changes:

Replaced llama-go with yzma v1.7.0 for pure Go embedding generation
Implemented full LocalClient with thread-safe model loading and per-call context management
Added package-level sync.Once for library initialization and instance-level sync.Once for model loading
All three previous review issues addressed: decode error checking (line 190), proper Close() cleanup with once reset (line 253), and increased token padding from +16 to +64 (line 179)
Comprehensive test coverage with unit tests and integration tests (requires FLOOP_TEST_LIB_PATH and FLOOP_TEST_MODEL_PATH)
Available() performs cheap filesystem checks without loading libraries

Build improvement: go build now works without yzma libraries; shared libs only needed at runtime via yzma install --lib

Confidence Score: 4/5

Safe to merge with minor style suggestion regarding context cancellation handling
All previous review feedback addressed correctly, comprehensive error handling throughout, proper resource management with defer and mutex protection, extensive test coverage. One minor style suggestion about context cancellation timing (non-critical). The implementation is solid and follows Go best practices.
No files require special attention - all changes are well-structured and thoroughly tested

Important Files Changed

Filename	Overview
internal/llm/local.go	Implements full local embedding client with yzma (purego); addresses all previous review feedback (decode errors, close handling, token padding)
internal/llm/local_test.go	Expanded unit tests for config validation, path resolution, error cases, and YZMA_LIB fallback
internal/llm/local_integration_test.go	New integration tests for embedding, comparison, and behavior operations with real GGUF models

Sequence Diagram

sequenceDiagram
    participant Client as LocalClient
    participant Lib as yzma/llama
    participant Model as GGUF Model
    
    Note over Client: First call to Embed()
    Client->>Client: loadModel() with sync.Once
    Client->>Lib: Load(libPath) - package-level once
    Lib-->>Client: OK
    Client->>Lib: Init() - process global
    Client->>Lib: ModelLoadFromFile(path, params)
    Lib->>Model: Load GGUF
    Model-->>Lib: Model handle
    Lib-->>Client: model, vocab, nEmbd
    
    Note over Client: Per-call embedding context
    Client->>Client: Acquire mutex
    Client->>Lib: Tokenize(vocab, text)
    Lib-->>Client: tokens[]
    Client->>Lib: InitFromModel(model, ctxParams)
    Lib-->>Client: lctx (context handle)
    Client->>Lib: SetEmbeddings(lctx, true)
    Client->>Lib: Decode(lctx, batch)
    Lib->>Model: Process tokens
    Model-->>Lib: OK
    Client->>Lib: GetEmbeddingsSeq(lctx, 0, nEmbd)
    Lib-->>Client: rawVec (float32[])
    Client->>Client: Copy + L2 normalize
    Client->>Lib: Free(lctx)
    Client->>Client: Release mutex
    Client-->>Client: Return embedding vector
    
    Note over Client: On Close()
    Client->>Lib: ModelFree(model)
    Client->>Client: Reset once for reload

greptile-apps

_{5 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-12T01:55:25Z

internal/llm/local.go

+	select {
+	case <-ctx.Done():
+		return nil, ctx.Err()
+	default:
+	}


Context check happens before acquiring mutex. If blocked waiting for c.mu.Lock() and context cancelled during that wait, cancellation won't be detected until after lock acquired. Consider checking context again after lock or using select with ctx.Done() for long operations.

- Fix gofmt alignment in LocalClient struct fields - Check llama.Free and llama.ModelFree error returns (errcheck) - Suppress gosec G115 for safe int->int32/uint32 conversions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Change GPULayers field type to int32 (matches llama API) - Exclude gosec G115 (integer overflow) — no dataflow analysis, false positives on bounded values like len() - Remove unnecessary math import and bounds check wrappers Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

internal/config/config.go

 	if v := os.Getenv("FLOOP_LOCAL_GPU_LAYERS"); v != "" {
 		if n, err := strconv.Atoi(v); err == nil {
-			config.LLM.LocalGPULayers = n
+			config.LLM.LocalGPULayers = int32(min(n, math.MaxInt32))


In general, to avoid incorrect conversions between integer types, you should either parse directly into the target sized type using strconv.ParseInt/ParseUint with the correct bit size, or ensure you perform explicit bounds checks against the target type’s min/max constants before casting.

For this specific code, the best fix without changing functionality is:

Replace strconv.Atoi with strconv.ParseInt using base 10 and bit size 32.

Use the returned int64 value as already guaranteed to be within the int32 range (because we pass 32 as the bit size), then cast to int32.

Apply the existing min limiting logic on the int32 value (or on the parsed int64 if we ensure it’s within int32) without relying on Atoi’s architecture-dependent int.

Concretely, in internal/config/config.go around lines 242–245:

Change n, err := strconv.Atoi(v) to parsed, err := strconv.ParseInt(v, 10, 32).

Replace config.LLM.LocalGPULayers = int32(min(n, math.MaxInt32)) with a form that:

First converts parsed to int32.

Then applies min with an int32-typed math.MaxInt32 (or an equivalent constant), so we aren't mixing different integer types.

Because we must not modify code outside the shown snippets or invent unseen helpers, we’ll perform the clamp after converting to int32 and use a correctly typed int32 max value. No new imports are needed; math and strconv are already imported.

The IntentMatch threshold (>0.8) is too strict for small embedding models like all-MiniLM-L6-v2. Use MergeCandidate (>0.7) which is the actionable threshold for dedup and works across model sizes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

tsaucedogithub · 2026-02-14T04:20:05Z

local llm? seems like a bitter lesson tbh

nvandessel · 2026-02-19T01:31:30Z

this is gonna close soon in favor of new prs that will implement dedupe and also emebeddings for a vector search system to help bootstrap association and edges prior to them being learned through activation.

greptile-apps bot reviewed Feb 11, 2026

View reviewed changes

nvandessel force-pushed the feat/local-llm-foundations-v2 branch from c06d230 to 43db901 Compare February 11, 2026 16:04

nvandessel force-pushed the feat/local-llm-yzma branch from 188f32a to b3af4a6 Compare February 11, 2026 16:04

nvandessel force-pushed the feat/local-llm-foundations-v2 branch from 43db901 to 4dbaf80 Compare February 11, 2026 16:08

nvandessel force-pushed the feat/local-llm-yzma branch 2 times, most recently from 7023d3d to ffef5ea Compare February 11, 2026 16:11

greptile-apps bot reviewed Feb 11, 2026

View reviewed changes

nvandessel force-pushed the feat/local-llm-foundations-v2 branch from 4dbaf80 to 62e320c Compare February 11, 2026 16:28

nvandessel force-pushed the feat/local-llm-yzma branch from ffef5ea to c19e4dc Compare February 11, 2026 16:29

Base automatically changed from feat/local-llm-foundations-v2 to main February 12, 2026 01:41

nvandessel and others added 2 commits February 11, 2026 17:42

nvandessel force-pushed the feat/local-llm-yzma branch from 35c084e to b81226c Compare February 12, 2026 01:42

greptile-apps bot reviewed Feb 12, 2026

View reviewed changes

nvandessel and others added 3 commits February 11, 2026 18:00

fix: resolve lint errors in local LLM client

13d1386

- Fix gofmt alignment in LocalClient struct fields - Check llama.Free and llama.ModelFree error returns (errcheck) - Suppress gosec G115 for safe int->int32/uint32 conversions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: replace nolint comments with proper bounds checks

ac44ff1

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-advanced-security bot found potential problems Feb 12, 2026

View reviewed changes

nvandessel marked this pull request as draft February 17, 2026 02:19

nvandessel mentioned this pull request Feb 17, 2026

chore: pre-article polish — visuals, changelog, and install fix #106

Merged

3 tasks

nvandessel closed this Feb 19, 2026

@@ -240,8 +240,10 @@
             		config.LLM.LocalEmbeddingModelPath = v
             	}
             	if v := os.Getenv("FLOOP_LOCAL_GPU_LAYERS"); v != "" {
-            		if n, err := strconv.Atoi(v); err == nil {
-            			config.LLM.LocalGPULayers = int32(min(n, math.MaxInt32))
+            		if parsed, err := strconv.ParseInt(v, 10, 32); err == nil {
+            			n32 := int32(parsed)
+            			const maxInt32 int32 = math.MaxInt32
+            			config.LLM.LocalGPULayers = min(n32, maxInt32)
             		}
             	}
             	if v := os.Getenv("FLOOP_LOCAL_CONTEXT_SIZE"); v != "" {

	ctxParams.NCtx = uint32(len(tokens) + 16) // fit the text with padding
	ctxParams.NCtx = uint32(len(tokens) + 64) // fit the text with padding for special tokens

Conversation

nvandessel commented Feb 11, 2026

Summary

Stacked on

Test plan

Uh oh!

greptile-apps bot commented Feb 11, 2026

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

nvandessel commented Feb 11, 2026

Uh oh!

greptile-apps bot commented Feb 11, 2026

Greptile Overview

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

nvandessel commented Feb 12, 2026

Uh oh!

greptile-apps bot commented Feb 12, 2026

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Check failure

Uh oh!

Copilot Autofix

tsaucedogithub commented Feb 14, 2026

Uh oh!

nvandessel commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments