BrowserOperator
diff --git a/‎eval-server/.gitignore‎
Lines changed: 2 additions & 1 deletion b/‎eval-server/.gitignore‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎eval-server/README.md‎
Lines changed: 219 additions & 47 deletions b/‎eval-server/README.md‎
Lines changed: 219 additions & 47 deletions
diff --git a/‎eval-server/CLAUDE.md‎ ‎eval-server/nodejs/CLAUDE.md‎eval-server/CLAUDE.md renamed to eval-server/nodejs/CLAUDE.md b/‎eval-server/CLAUDE.md‎ ‎eval-server/nodejs/CLAUDE.md‎eval-server/CLAUDE.md renamed to eval-server/nodejs/CLAUDE.md
@@ -1,2 +1,3 @@
 .env
-node_modules
+node_modules
+*.log
@@ -1,67 +1,239 @@
-# bo-eval-server
+# Eval-Server
 
-A WebSocket-based evaluation server for LLM agents using LLM-as-a-judge methodology.
+A WebSocket-based evaluation server for LLM agents with multiple language implementations.
 
-## Quick Start
+## Overview
+
+This directory contains two functionally equivalent implementations of the bo-eval-server:
+
+- **NodeJS** (`nodejs/`) - Full-featured implementation with YAML evaluations, HTTP API, CLI, and judge system
+- **Python** (`python/`) - Minimal library focused on core WebSocket functionality and programmatic evaluation creation
 
-1. **Install dependencies**
-   ```bash
-   npm install
-   ```
+Both implementations provide:
+- 🔌 **WebSocket Server** - Real-time agent connections
+- 🤖 **Bidirectional RPC** - JSON-RPC 2.0 for calling agent methods  
+- 📚 **Programmatic API** - Create and manage evaluations in code
+- ⚡ **Concurrent Support** - Handle multiple agents simultaneously
+- 📊 **Structured Logging** - Comprehensive evaluation tracking
+
+## Quick Start
 
-2. **Configure environment**
-   ```bash
-   cp .env.example .env
-   # Edit .env and add your OPENAI_API_KEY
-   ```
+### NodeJS (Full Featured)
 
-3. **Start the server**
-   ```bash
-   npm start
-   ```
+The NodeJS implementation includes YAML evaluation loading, HTTP API wrapper, CLI tools, and LLM-as-a-judge functionality.
 
-4. **Use interactive CLI** (alternative to step 3)
-   ```bash
-   npm run cli
-   ```
+```bash
+cd nodejs/
+npm install
+npm start
+```
 
-## Features
+**Key Features:**
+- YAML evaluation file loading
+- HTTP API wrapper for REST integration  
+- Interactive CLI for management
+- LLM judge system for response evaluation
+- Comprehensive documentation and examples
 
-- 🔌 WebSocket server for real-time agent connections
-- 🤖 Bidirectional RPC calls to connected agents
-- ⚖️ LLM-as-a-judge evaluation using OpenAI GPT-4
-- 📊 Structured JSON logging of all evaluations
-- 🖥️ Interactive CLI for testing and management
-- ⚡ Support for concurrent agent evaluations
+See [`nodejs/README.md`](nodejs/README.md) for detailed usage.
 
-## OpenAI Compatible API
+### Python (Lightweight Library)
 
-The server provides an OpenAI-compatible `/v1/responses` endpoint for direct API access:
+The Python implementation focuses on core WebSocket functionality with programmatic evaluation creation.
 
 ```bash
-curl -X POST 'http://localhost:8081/v1/responses' \
-  -H 'Content-Type: application/json' \
-  -d '{
-    "input": "What is 2+2?",
-    "main_model": "gpt-4.1",
-    "mini_model": "gpt-4.1-nano", 
-    "nano_model": "gpt-4.1-nano",
-    "provider": "openai"
-  }'
+cd python/
+pip install -e .
+python examples/basic_server.py
 ```
 
-**Model Precedence:**
-1. **API calls** OR **individual test YAML models** (highest priority)
-2. **config.yaml defaults** (fallback when neither API nor test specify models)
+**Key Features:**
+- Minimal dependencies (websockets, loguru)
+- Full async/await support
+- Evaluation stack for LIFO queuing
+- Type hints throughout
+- Clean Pythonic API
+
+See [`python/README.md`](python/README.md) for detailed usage.
+
+## Architecture Comparison
+
+| Feature | NodeJS | Python |
+|---------|--------|--------|
+| **Core WebSocket Server** | ✅ | ✅ |
+| **JSON-RPC 2.0** | ✅ | ✅ |
+| **Client Management** | ✅ | ✅ |
+| **Programmatic Evaluations** | ✅ | ✅ |
+| **Evaluation Stack** | ✅ | ✅ |
+| **Structured Logging** | ✅ (Winston) | ✅ (Loguru) |
+| **YAML Evaluations** | ✅ | ❌ |
+| **HTTP API Wrapper** | ✅ | ❌ |
+| **CLI Interface** | ✅ | ❌ |
+| **LLM Judge System** | ✅ | ❌ |
+| **Type System** | TypeScript | Type Hints |
+
+## Choosing an Implementation
+
+**Choose NodeJS if you need:**
+- YAML-based evaluation definitions
+- HTTP REST API endpoints
+- Interactive CLI for management
+- LLM-as-a-judge evaluation
+- Comprehensive feature set
+
+**Choose Python if you need:**
+- Minimal dependencies
+- Pure programmatic approach
+- Integration with Python ML pipelines
+- Modern async/await patterns
+- Lightweight deployment
 
 ## Agent Protocol
 
-Your agent needs to:
+Both implementations use the same WebSocket protocol:
+
+### 1. Connect to WebSocket
+```javascript
+// NodeJS
+const ws = new WebSocket('ws://localhost:8080');
+
+// Python
+import websockets
+ws = await websockets.connect('ws://localhost:8080')
+```
+
+### 2. Send Registration
+```json
+{
+  "type": "register",
+  "clientId": "your-client-id",
+  "secretKey": "your-secret-key", 
+  "capabilities": ["chat", "action"]
+}
+```
+
+### 3. Send Ready Signal
+```json
+{
+  "type": "ready"
+}
+```
+
+### 4. Handle RPC Calls
+Both implementations send JSON-RPC 2.0 requests with the `evaluate` method:
+
+```json
+{
+  "jsonrpc": "2.0",
+  "method": "evaluate", 
+  "params": {
+    "id": "eval_001",
+    "name": "Test Evaluation",
+    "tool": "chat",
+    "input": {"message": "Hello world"}
+  },
+  "id": "unique-call-id"
+}
+```
+
+Agents should respond with:
+```json
+{
+  "jsonrpc": "2.0",
+  "id": "unique-call-id",
+  "result": {
+    "status": "completed",
+    "output": {"response": "Hello! How can I help you?"}
+  }
+}
+```
+
+## Examples
+
+### NodeJS Example
+```javascript
+import { EvalServer } from 'bo-eval-server';
+
+const server = new EvalServer({
+  authKey: 'secret',
+  port: 8080
+});
+
+server.onConnect(async client => {
+  const result = await client.evaluate({
+    id: "test",
+    name: "Hello World", 
+    tool: "chat",
+    input: {message: "Hi there!"}
+  });
+  console.log(result);
+});
+
+await server.start();
+```
+
+### Python Example
+```python
+import asyncio
+from bo_eval_server import EvalServer
+
+async def main():
+    server = EvalServer(
+        auth_key='secret',
+        port=8080
+    )
+    
+    @server.on_connect
+    async def handle_client(client):
+        result = await client.evaluate({
+            "id": "test",
+            "name": "Hello World",
+            "tool": "chat", 
+            "input": {"message": "Hi there!"}
+        })
+        print(result)
+    
+    await server.start()
+    await server.wait_closed()
+
+asyncio.run(main())
+```
+
+## Development
+
+Each implementation has its own development setup:
+
+**NodeJS:**
+```bash
+cd nodejs/
+npm install
+npm run dev    # Watch mode
+npm test       # Run tests
+npm run cli    # Interactive CLI
+```
+
+**Python:**
+```bash
+cd python/
+pip install -e ".[dev]"
+pytest         # Run tests  
+black .        # Format code
+mypy src/      # Type checking
+```
+
+## Contributing
+
+When contributing to either implementation:
+
+1. Maintain API compatibility between versions where possible
+2. Update documentation for both implementations when adding shared features
+3. Follow the existing code style and patterns
+4. Add appropriate tests and examples
+
+## License
 
-1. Connect to the WebSocket server (default: `ws://localhost:8080`)
-2. Send a `{"type": "ready"}` message when ready for evaluations
-3. Implement the `Evaluate` RPC method that accepts a string task and returns a string response
+MIT License - see individual implementation directories for details.
 
-## For more details
+---
 
-See [CLAUDE.md](./CLAUDE.md) for comprehensive documentation of the architecture and implementation.
+Both implementations provide robust, production-ready evaluation servers for LLM agents with different feature sets optimized for different use cases.
-Original file line number
+Diff line change
@@ @@ -1,2 +1,3 @@ @@
 .env
 -node_modules
 +node_modules
 +*.log