diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..b8e32cf --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,144 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Commands + +```bash +npm run dev # Start dev server at http://localhost:3000 +npm run lint # Run ESLint +npm run build # Production build +npm run format # Format TS/tsx with Prettier +``` + +## Architecture Overview + +### Tech Stack + +- **Framework**: Next.js 16.0.7 (App Router) +- **React 19.2.0** + **TypeScript** (strict mode) +- **Routing**: Next.js App Router (react-router-dom is listed as a dependency but currently unused) +- **Styling**: Tailwind CSS 4.x + custom color system and styles defined in `/app/globals.css` for cases not supported by Tailwind +- **Data Fetching**: Native Fetch API + SWR 2.3.6 (used selectively where required) +- **Date/Time**: date-fns 4.1.0, date-fns-tz 3.2.0 +- **ESLint**: Used for maintaining code quality, enforcing consistent coding standards, and catching potential issues during development and build time + +### Directory Structure + +| Path | Purpose | +| ------------------------------ | --------------------------------------------------------------------------------------------- | +| `app/(auth)/` | Authentication-related routes (e.g., invite, verify flows) | +| `app/(main)/` | Main application routes (dashboard-level features like datasets, evaluations, settings, etc.) | +| `app/api/` | Backend API route handlers (Next.js route handlers acting as BFF layer) | +| `app/components/` | App-scoped components used within routes/Pages | +| `app/components/icons/` | Hand-authored React icon components | +| `app/hooks/` | Custom React hooks specific to app features | +| `app/lib/` | Core shared logic and utilities across the application | +| `app/lib/context/` | React context providers (global state handling) | +| `app/lib/store/` | State management logic (custom/global store) | +| `app/lib/types/` | TypeScript type definitions (shared across modules) | +| `app/lib/utils/` | Domain-specific utility modules (e.g., evaluation, guardrails) | +| `app/lib/data/` | Static data and validators (e.g., guardrails validators) | +| `app/lib/apiClient.ts` | Centralized API client for forwarding requests to the backend | +| `app/lib/authCookie.ts` | Authentication cookie utilities (get/set/remove tokens) | +| `app/lib/configFetchers.ts` | API fetchers related to configuration modules | +| `app/lib/constants.ts` | Global constants used across the app | +| `app/lib/guardrailsClient.ts` | Client-side API helpers for guardrails features | +| `app/lib/models.ts` | Data models/interfaces for structured data handling | +| `app/lib/navConfig.ts` | Navigation configuration (sidebar/menu structure) | +| `app/lib/promptEditorUtils.ts` | Utility functions for prompt editor logic | +| `app/lib/utils.ts` | General utility/helper functions | +| `public/favicon.ico` | Application favicon | + +## Import Aliases + +[tsconfig.json](./tsconfig.json) sets paths: `{ "@/*": ["./*"] }`, so imports are resolved from the project root using the `@/` prefix. Use: + +``` +import { apiClient } from '@/app/lib/apiClient'; +import { Providers } from '@/app/components/providers'; +import { APP_NAME } from '@/app/lib/constants'; +``` + +SVGs follow Next.js defaults (imported as static assets via next/image or referenced from /public). + +## Routing & Role-Based Access + +Routing uses the **Next.js App Router** exclusively. Routes are organized via route groups: + +- `app/(auth)/` - unauthenticated flows (`/invite`, `/verify`) +- `app/(main)/` — authenticated app surface (`/evaluations`, `/datasets`, `/configurations`, `/guardrails`, `/knowledge-base`, `/settings`, etc.) + +Role gating lives in middleware.ts and reads a kaapi_role cookie with two values: + +- `user` - standard authenticated user +- `superuser` - admin; required for `/settings/*` + +The cookie is issued server-side by [authCookie.ts](app/lib/authCookie.ts) after login/verify based on user.is_superuser. Middleware classifies each request into one of: + +- `PUBLIC_ROUTES` — open to everyone (`/evaluations`, `/invite`, `/verify`, `/coming-soon/*`) +- `GUEST_ONLY_ROUTES` — unauthenticated only (`/keystore`); authenticated users are redirected to `/evaluations` +- `/settings/*` — superuser only +- Everything else — any authenticated user + +There is no dynamic/custom role system; only the two static roles above. + +## Toast Notifications + +Toasts are managed via a React Context provider ([Toast.tsx](app/components/Toast.tsx)), mounted once in [Providers.tsx](app/components/providers/Providers.tsx). Consume them from any client component: + +``` +import { useToast } from '@/app/components/Toast'; +// or the re-export: import { useToast } from '@/app/hooks/useToast'; + +function MyComponent() { + const toast = useToast(); + + toast.success('Saved successfully'); // success toast + toast.error('Something went wrong'); // error toast + toast.warning('Heads up'); // warning toast + toast.info('FYI'); // info toast + + // Optional: override the default 5000ms auto-dismiss + toast.success('Saved', 3000); + + // Low-level API (type + duration) + toast.addToast('Custom message', 'success', 4000); +} +``` + +## Authentication [AuthContext.tsx](app/lib/context/AuthContext.tsx) + +There is no `AuthService` class. Auth state is owned by a React Context provider (`AuthProvider`) mounted in [Providers.tsx](app/components/providers/Providers.tsx), and consumed via the `useAuth()` hook: + +``` +import { useAuth } from '@/app/lib/context/AuthContext'; + +function MyComponent() { + const { + isAuthenticated, isHydrated, + session, currentUser, googleProfile, + apiKeys, activeKey, addKey, removeKey, setKeys, + loginWithToken, logout, + } = useAuth(); +} +``` + +## App Context [AppContext.tsx](app/lib/context/AppContext.tsx) + +Sidebar state is managed via `AppProvider`, consumed with `useApp()`: + +``` +import { useApp } from '@/app/lib/context/AppContext'; + +const { sidebarCollapsed, setSidebarCollapsed, toggleSidebar } = useApp(); +``` + +## API Client & Error Handling + +The BFF layer uses [apiClient.ts](app/lib/apiClient.ts) which forwards requests from Next.js route handlers to the backend at `BACKEND_URL` (defaults to `http://localhost:8000`). Key patterns: + +- **Server-side (route handlers)**: Use `apiClient(request, endpoint, options)` — it relays `X-API-KEY` and `Cookie` headers automatically and returns `{ status, data, headers }`. +- **Client-side**: Use `clientFetch(endpoint, options)` — handles token refresh on 401, dispatches `AUTH_EXPIRED_EVENT` when refresh fails, and throws with a message extracted from `error`, `message`, or `detail` fields in the response body. +- **Error extraction**: `extractErrorMessage(body, fallback)` reads `body.error || body.message || body.detail` — follow this pattern when adding new API routes. +- **Auth expiry**: On 401 with failed refresh, a `CustomEvent(AUTH_EXPIRED_EVENT)` is dispatched on `window`, which `AuthContext` listens to for automatic logout. diff --git a/app/(main)/configurations/page.tsx b/app/(main)/configurations/page.tsx index 2a68c20..a36ab48 100644 --- a/app/(main)/configurations/page.tsx +++ b/app/(main)/configurations/page.tsx @@ -13,7 +13,7 @@ import { colors } from "@/app/lib/colors"; import { usePaginatedList, useInfiniteScroll } from "@/app/hooks"; import ConfigCard from "@/app/components/ConfigCard"; import Loader, { LoaderBox } from "@/app/components/Loader"; -import { EvalJob } from "@/app/components/types"; +import { EvalJob } from "@/app/lib/types/evaluation"; import { ConfigPublic, ConfigVersionItems, diff --git a/app/(main)/evaluations/[id]/page.tsx b/app/(main)/evaluations/[id]/page.tsx index d2e583f..517c927 100644 --- a/app/(main)/evaluations/[id]/page.tsx +++ b/app/(main)/evaluations/[id]/page.tsx @@ -10,19 +10,21 @@ import { useRouter, useParams } from "next/navigation"; import { apiFetch } from "@/app/lib/apiClient"; import { useAuth } from "@/app/lib/context/AuthContext"; import { useApp } from "@/app/lib/context/AppContext"; -import { +import type { EvalJob, AssistantConfig, + GroupedTraceItem, +} from "@/app/lib/types/evaluation"; +import { hasSummaryScores, isNewScoreObjectV2, getScoreObject, normalizeToIndividualScores, - GroupedTraceItem, isGroupedFormat, -} from "@/app/components/types"; +} from "@/app/lib/utils/evaluation"; import ConfigModal from "@/app/components/ConfigModal"; import Sidebar from "@/app/components/Sidebar"; -import DetailedResultsTable from "@/app/components/DetailedResultsTable"; +import DetailedResultsTable from "@/app/components/evaluations/DetailedResultsTable"; import { colors } from "@/app/lib/colors"; import { useToast } from "@/app/components/Toast"; import Loader from "@/app/components/Loader"; @@ -126,7 +128,6 @@ export default function EvaluationReport() { if (isAuthenticated && jobId) fetchJobDetails(); }, [isAuthenticated, jobId, fetchJobDetails]); - // Export grouped format CSV const exportGroupedCSV = (traces: GroupedTraceItem[]) => { if (!job) return; try { @@ -391,9 +392,9 @@ export default function EvaluationReport() { > -
+

- {/* Actions */} -
+
diff --git a/app/(main)/evaluations/page.tsx b/app/(main)/evaluations/page.tsx index d7900f3..13ca97c 100644 --- a/app/(main)/evaluations/page.tsx +++ b/app/(main)/evaluations/page.tsx @@ -49,12 +49,8 @@ function SimplifiedEvalContent() { const [duplicationFactor, setDuplicationFactor] = useState("1"); const [uploadedFile, setUploadedFile] = useState(null); const [isUploading, setIsUploading] = useState(false); - - // Stored datasets const [storedDatasets, setStoredDatasets] = useState([]); const [isDatasetsLoading, setIsDatasetsLoading] = useState(false); - - // Evaluation config state const [selectedDatasetId, setSelectedDatasetId] = useState(() => { return searchParams.get("dataset") || ""; }); @@ -235,6 +231,10 @@ function SimplifiedEvalContent() { }); setIsEvaluating(false); + setExperimentName(""); + setSelectedDatasetId(""); + setSelectedConfigId(""); + setSelectedConfigVersion(0); toast.success(`Evaluation created!`); return true; } catch (error: unknown) { diff --git a/app/components/CodeBlock.tsx b/app/components/CodeBlock.tsx new file mode 100644 index 0000000..e76d9a3 --- /dev/null +++ b/app/components/CodeBlock.tsx @@ -0,0 +1,13 @@ +import type { ReactNode } from "react"; + +interface CodeBlockProps { + children: ReactNode; +} + +export default function CodeBlock({ children }: CodeBlockProps) { + return ( +
+ {children} +
+ ); +} diff --git a/app/components/ConfigModal.tsx b/app/components/ConfigModal.tsx index 817b24f..0f3f412 100644 --- a/app/components/ConfigModal.tsx +++ b/app/components/ConfigModal.tsx @@ -7,7 +7,11 @@ import React, { useState, useEffect } from "react"; import { colors } from "@/app/lib/colors"; -import { EvalJob, AssistantConfig } from "./types"; +import CopyableCodeBlock from "@/app/components/CopyableCodeBlock"; +import CodeBlock from "@/app/components/CodeBlock"; +import Tag from "@/app/components/Tag"; +import { CloseIcon } from "@/app/components/icons"; +import { EvalJob, AssistantConfig } from "@/app/lib/types/evaluation"; import { useAuth } from "@/app/lib/context/AuthContext"; import { apiFetch } from "@/app/lib/apiClient"; import { @@ -35,6 +39,24 @@ interface ConfigVersionInfo { knowledge_base_ids?: string[]; } +const ConfigField = ({ + label, + children, +}: { + label: string; + children: React.ReactNode; +}) => ( +
+
+ {label} +
+ {children} +
+); + export default function ConfigModal({ isOpen, onClose, @@ -80,15 +102,14 @@ export default function ConfigModal({ const params: CompletionParams = blob?.completion?.params || ({} as CompletionParams); - // Extract knowledge base IDs from multiple sources const knowledgeBaseIds: string[] = []; - // 1. Check direct params.knowledge_base_ids + // Check direct params.knowledge_base_ids if (Array.isArray(params.knowledge_base_ids)) { knowledgeBaseIds.push(...params.knowledge_base_ids); } - // 2. Check tools array for knowledge_base_ids + // Check tools array for knowledge_base_ids if (params.tools) { const toolKbIds = params.tools .filter( @@ -100,7 +121,6 @@ export default function ConfigModal({ knowledgeBaseIds.push(...toolKbIds); } - // Remove duplicates const uniqueKbIds = [...new Set(knowledgeBaseIds)]; setConfigVersionInfo({ @@ -128,51 +148,9 @@ export default function ConfigModal({ if (!isOpen) return null; - const ConfigField = ({ - label, - children, - }: { - label: string; - children: React.ReactNode; - }) => ( -
-
- {label} -
- {children} -
- ); - - const CodeBlock = ({ children }: { children: React.ReactNode }) => ( -
- {children} -
- ); - - const Tag = ({ children }: { children: React.ReactNode }) => ( - - {children} - - ); - return (
e.stopPropagation()} > - {/* Header */}
- - - +
- {/* Content */}
{isLoadingConfig ? (
@@ -295,9 +259,11 @@ export default function ConfigModal({ {configVersionInfo?.knowledge_base_ids && configVersionInfo.knowledge_base_ids.length > 0 && ( - + {configVersionInfo.knowledge_base_ids.join("\n")} - + )} @@ -305,11 +271,17 @@ export default function ConfigModal({ assistantConfig?.instructions || job.config?.instructions) && ( - + {configVersionInfo?.instructions || assistantConfig?.instructions || job.config?.instructions} - + )} diff --git a/app/components/CopyableCodeBlock.tsx b/app/components/CopyableCodeBlock.tsx new file mode 100644 index 0000000..7578033 --- /dev/null +++ b/app/components/CopyableCodeBlock.tsx @@ -0,0 +1,49 @@ +"use client"; + +import React, { useState, useCallback } from "react"; +import { useToast } from "@/app/hooks/useToast"; +import { CheckIcon, CopyIcon } from "@/app/components/icons"; + +interface CopyableCodeBlockProps { + children: React.ReactNode; + copyText: string; +} + +export default function CopyableCodeBlock({ + children, + copyText, +}: CopyableCodeBlockProps) { + const toast = useToast(); + const [copied, setCopied] = useState(false); + + const handleCopy = useCallback(async () => { + try { + await navigator.clipboard.writeText(copyText); + setCopied(true); + toast.success("Copied to clipboard"); + setTimeout(() => setCopied(false), 2000); + } catch { + toast.error("Failed to copy"); + } + }, [copyText, toast]); + + return ( +
+
+ {children} +
+ +
+ ); +} diff --git a/app/components/DetailedResultsTable.tsx b/app/components/DetailedResultsTable.tsx deleted file mode 100644 index c4b9de0..0000000 --- a/app/components/DetailedResultsTable.tsx +++ /dev/null @@ -1,639 +0,0 @@ -/** - * DetailedResultsTable.tsx - Table view for evaluation results - * - * Displays Q&A pairs with scores in a tabular format - * Supports both row format (individual traces) and grouped format (multiple answers per question) - */ - -import React, { useState, useEffect } from "react"; -import { - TraceScore, - getScoreObject, - normalizeToIndividualScores, - hasSummaryScores, - isNewScoreObjectV2, - isGroupedFormat, - GroupedTraceItem, - EvalJob, -} from "@/app/components/types"; - -// Helper function to format score value with color -const formatScoreValue = (score: TraceScore | undefined) => { - if (!score) return { value: "N/A", color: "#737373", bg: "transparent" }; - - if (score.data_type === "CATEGORICAL") { - const catValue = String(score.value); - let color = "#171717"; - let bg = "#fafafa"; - - if (catValue === "CORRECT") { - color = "#15803d"; - bg = "#dcfce7"; - } else if (catValue === "PARTIAL") { - color = "#92400e"; - bg = "#fef3c7"; - } else if (catValue === "INCORRECT") { - color = "#dc2626"; - bg = "#fee2e2"; - } - - return { value: catValue, color, bg }; - } - - // NUMERIC - const numValue = Number(score.value); - const formattedValue = numValue.toFixed(2); - let color = "#171717"; - let bg = "transparent"; - - // Color based on value - if (numValue >= 0.7) { - color = "#15803d"; - bg = "#dcfce7"; - } else if (numValue >= 0.5) { - color = "#92400e"; - bg = "#fef3c7"; - } else { - color = "#dc2626"; - bg = "#fee2e2"; - } - - return { value: formattedValue, color, bg }; -}; - -interface DetailedResultsTableProps { - job: EvalJob; -} - -export default function DetailedResultsTable({ - job, -}: DetailedResultsTableProps) { - const [openCommentId, setOpenCommentId] = useState(null); - const [commentPos, setCommentPos] = useState({ top: 0, left: 0 }); - - useEffect(() => { - if (!openCommentId) return; - const handleScroll = () => setOpenCommentId(null); - window.addEventListener("scroll", handleScroll, true); - return () => { - window.removeEventListener("scroll", handleScroll, true); - }; - }, [openCommentId]); - - const scoreObject = getScoreObject(job); - - // 1. First check: Does it have summary_scores at all? - if (!scoreObject || !hasSummaryScores(scoreObject)) { - return ( -
-

- No detailed results available or using legacy format -

-
- ); - } - - // 2. Second check: Does it have traces? (NewScoreObjectV2) - if (isNewScoreObjectV2(scoreObject)) { - // Check if grouped format - if (isGroupedFormat(scoreObject.traces)) { - return ( - - ); - } - // Otherwise show row format - } - - // 3. Try to normalize to IndividualScore format - // This handles NewScoreObjectV2 (with traces) - const individual_scores = normalizeToIndividualScores(scoreObject); - - // 4. If no individual scores available (e.g., BasicScoreObject with only summary_scores) - if (!individual_scores || individual_scores.length === 0) { - return ( -
-

- No individual scores available. Only summary metrics are available for - this evaluation. -

-
- ); - } - - // Get all unique score names from the first item - const scoreNames = - individual_scores[0]?.trace_scores?.map((s) => s.name) || []; - - // Helper function to get score value by name - const getScoreByName = ( - scores: TraceScore[], - name: string, - ): TraceScore | undefined => { - if (!scores || !Array.isArray(scores)) return undefined; - return scores.find((s) => s?.name === name); - }; - - return ( -
- {/* Table Container */} -
- - {/* Table Header */} - - - - - - - {scoreNames.map((scoreName) => ( - - ))} - - - - {/* Table Body */} - - {individual_scores.map((item, index) => { - const question = item.input?.question || "N/A"; - const answer = item.output?.answer || "N/A"; - const groundTruth = item.metadata?.ground_truth || "N/A"; - - return ( - { - const row = e.currentTarget; - row.style.backgroundColor = "#fafafa"; - }} - onMouseLeave={(e) => { - const row = e.currentTarget; - row.style.backgroundColor = "#ffffff"; - }} - > - - - {/* Question */} - - - {/* Ground Truth */} - - - {/* Answer */} - - - {/* Score Columns */} - {scoreNames.map((scoreName) => { - const score = getScoreByName(item.trace_scores, scoreName); - const { value, color, bg } = formatScoreValue(score); - - return ( - - ); - })} - - ); - })} - -
- Question - - Ground Truth - - Answer - - {scoreName} -
- {index + 1} - -
- {question} -
-
-
- {groundTruth} -
-
-
- {answer} -
-
-
-
- {value} -
- {score?.comment && ( - <> -
{ - const rect = - e.currentTarget.getBoundingClientRect(); - const tooltipWidth = 300; - const centerX = rect.left + rect.width / 2; - const clampedLeft = Math.min( - Math.max(centerX - tooltipWidth / 2, 8), - window.innerWidth - tooltipWidth - 8, - ); - setCommentPos({ - top: rect.top - 8, - left: clampedLeft, - }); - setOpenCommentId(`${index}-${scoreName}`); - }} - onMouseLeave={() => setOpenCommentId(null)} - > - i -
- {openCommentId === `${index}-${scoreName}` && ( -
- {score.comment} -
- )} - - )} -
-
-
-
- ); -} - -function GroupedResultsTable({ traces }: { traces: GroupedTraceItem[] }) { - const [openCommentId, setOpenCommentId] = useState(null); - const [commentPos, setCommentPos] = useState({ top: 0, left: 0 }); - - useEffect(() => { - if (!openCommentId) return; - const handleScroll = () => setOpenCommentId(null); - window.addEventListener("scroll", handleScroll, true); - return () => { - window.removeEventListener("scroll", handleScroll, true); - }; - }, [openCommentId]); - - if (!traces || traces.length === 0) { - return ( -
-

- No grouped results available -

-
- ); - } - - // Get max answers count - const maxAnswers = Math.max(...traces.map((t) => t.llm_answers.length)); - - // Fixed column widths (in pixels) for predictable layout - const COLUMN_WIDTHS = { - qId: 60, - question: 200, - groundTruth: 200, - answer: 250, - }; - - // Calculate minimum table width based on number of answers - // This ensures horizontal scroll activates at the right point - const fixedColumnsWidth = - COLUMN_WIDTHS.qId + COLUMN_WIDTHS.question + COLUMN_WIDTHS.groundTruth; - const tableMinWidth = fixedColumnsWidth + maxAnswers * COLUMN_WIDTHS.answer; - - return ( -
- {/* Table Container - overflow-x-auto enables horizontal scroll when table exceeds viewport */} -
- - {/* Table Header - matching row format styling */} - - - - - - {Array.from({ length: maxAnswers }, (_, i) => ( - - ))} - - - - {/* Table Body */} - - {traces.map((group, index) => ( - - {/* Text row */} - - {/* Question ID */} - - - {/* Question */} - - - {/* Ground Truth */} - - - {/* Answer text only */} - {Array.from({ length: maxAnswers }, (_, answerIndex) => { - const answer = group.llm_answers[answerIndex]; - return ( - - ); - })} - - {/* Scores row */} - - {/* Empty cells for Q.ID, Question, Ground Truth */} - - ); - })} - - - ))} - -
- Q.ID - - Question - - Ground Truth - - Answer {i + 1} -
- {group.question_id} - -
- {group.question} -
-
-
- {group.ground_truth_answer} -
-
- {answer ? ( -
- {answer} -
- ) : ( - - - )} -
- - - - {/* Score cells */} - {Array.from({ length: maxAnswers }, (_, answerIndex) => { - const answerScores: TraceScore[] = - group.scores?.[answerIndex] || []; - const answer = group.llm_answers[answerIndex]; - - return ( - - {answer && answerScores.length > 0 ? ( -
- {answerScores.map( - (score: TraceScore, scoreIdx: number) => { - if (!score) return null; - const { value, color, bg } = - formatScoreValue(score); - return ( -
- - {score.name}: - -
-
- {value} -
- {score?.comment && - (() => { - const commentId = `g${index}-a${answerIndex}-s${scoreIdx}`; - return ( - <> -
{ - const rect = - e.currentTarget.getBoundingClientRect(); - const tooltipWidth = 300; - const centerX = - rect.left + rect.width / 2; - const clampedLeft = Math.min( - Math.max( - centerX - - tooltipWidth / 2, - 8, - ), - window.innerWidth - - tooltipWidth - - 8, - ); - setCommentPos({ - top: rect.top - 8, - left: clampedLeft, - }); - setOpenCommentId(commentId); - }} - onMouseLeave={() => - setOpenCommentId(null) - } - > - i -
- {openCommentId === commentId && ( -
- {score.comment} -
- )} - - ); - })()} -
-
- ); - }, - )} -
- ) : null} -
-
-
- ); -} diff --git a/app/components/InfoTooltip.tsx b/app/components/InfoTooltip.tsx index d070496..902841d 100644 --- a/app/components/InfoTooltip.tsx +++ b/app/components/InfoTooltip.tsx @@ -11,15 +11,16 @@ export default function InfoTooltip({ text }: InfoTooltipProps) {
{text} +
); diff --git a/app/components/StatusBadge.tsx b/app/components/StatusBadge.tsx index 48df1e5..12b3704 100644 --- a/app/components/StatusBadge.tsx +++ b/app/components/StatusBadge.tsx @@ -13,20 +13,14 @@ interface StatusBadgeProps { } export default function StatusBadge({ status, size = "sm" }: StatusBadgeProps) { - const colors = getStatusColor(status); + const statusColor = getStatusColor(status); const sizeClasses = size === "md" ? "px-3 py-1.5 text-sm" : "px-2 py-1 text-xs"; return (
{status.toUpperCase()}
diff --git a/app/components/Tag.tsx b/app/components/Tag.tsx new file mode 100644 index 0000000..6932e29 --- /dev/null +++ b/app/components/Tag.tsx @@ -0,0 +1,13 @@ +import type { ReactNode } from "react"; + +interface TagProps { + children: ReactNode; +} + +export default function Tag({ children }: TagProps) { + return ( + + {children} + + ); +} diff --git a/app/components/Toast.tsx b/app/components/Toast.tsx index 2951ae8..bf3217b 100644 --- a/app/components/Toast.tsx +++ b/app/components/Toast.tsx @@ -88,7 +88,7 @@ function ToastContainer({ removeToast: (id: string) => void; }) { return ( -
+
{toasts.map((toast) => ( void }) {
diff --git a/app/components/evaluations/DetailedResultsTable.tsx b/app/components/evaluations/DetailedResultsTable.tsx new file mode 100644 index 0000000..9d50ebd --- /dev/null +++ b/app/components/evaluations/DetailedResultsTable.tsx @@ -0,0 +1,235 @@ +/** + * DetailedResultsTable.tsx - Table view for evaluation results + * + * Displays Q&A pairs with scores in a tabular format + * Supports both row format (individual traces) and grouped format (multiple answers per question) + */ + +import { useState, useEffect } from "react"; +import type { GroupedTraceItem, EvalJob } from "@/app/lib/types/evaluation"; +import { + getScoreObject, + normalizeToIndividualScores, + hasSummaryScores, + isNewScoreObjectV2, + isGroupedFormat, +} from "@/app/lib/utils/evaluation"; +import { formatScoreValue, getScoreByName } from "@/app/lib/utils"; +import GroupedResultsTable from "@/app/components/evaluations/GroupedResultsTable"; + +interface DetailedResultsTableProps { + job: EvalJob; +} + +export default function DetailedResultsTable({ + job, +}: DetailedResultsTableProps) { + const [openCommentId, setOpenCommentId] = useState(null); + const [commentPos, setCommentPos] = useState({ top: 0, left: 0 }); + + useEffect(() => { + if (!openCommentId) return; + const handleScroll = () => setOpenCommentId(null); + window.addEventListener("scroll", handleScroll, true); + return () => { + window.removeEventListener("scroll", handleScroll, true); + }; + }, [openCommentId]); + + const scoreObject = getScoreObject(job); + + if (!scoreObject || !hasSummaryScores(scoreObject)) { + return ( +
+

+ No detailed results available or using legacy format +

+
+ ); + } + + if (isNewScoreObjectV2(scoreObject)) { + if (isGroupedFormat(scoreObject.traces)) { + return ( + + ); + } + } + + const individual_scores = normalizeToIndividualScores(scoreObject); + + if (!individual_scores || individual_scores.length === 0) { + return ( +
+

+ No individual scores available. Only summary metrics are available for + this evaluation. +

+
+ ); + } + + // Get all unique score names from the first item + const scoreNames = + individual_scores[0]?.trace_scores?.map((s) => s.name) || []; + + const COLUMN_WIDTHS = { + index: 50, + question: 250, + groundTruth: 250, + answer: 250, + score: 160, + }; + const tableMinWidth = + COLUMN_WIDTHS.index + + COLUMN_WIDTHS.question + + COLUMN_WIDTHS.groundTruth + + COLUMN_WIDTHS.answer + + scoreNames.length * COLUMN_WIDTHS.score; + + return ( +
+
+ + + + + + + + {scoreNames.map((scoreName) => ( + + ))} + + + + + {individual_scores.map((item, index) => { + const question = item.input?.question || "N/A"; + const answer = item.output?.answer || "N/A"; + const groundTruth = item.metadata?.ground_truth || "N/A"; + + return ( + + + + + + + + + + {scoreNames.map((scoreName) => { + const score = getScoreByName(item.trace_scores, scoreName); + const { value, color, bg } = formatScoreValue(score); + + return ( + + ); + })} + + ); + })} + +
+ Question + + Ground Truth + + Answer + + {scoreName} +
+ {index + 1} + +
+ {question} +
+
+
+ {groundTruth} +
+
+
+ {answer} +
+
+
+
+ {value} +
+ {score?.comment && ( + <> +
{ + const rect = + e.currentTarget.getBoundingClientRect(); + const tooltipWidth = 300; + const centerX = rect.left + rect.width / 2; + const clampedLeft = Math.min( + Math.max(centerX - tooltipWidth / 2, 8), + window.innerWidth - tooltipWidth - 8, + ); + setCommentPos({ + top: rect.top - 8, + left: clampedLeft, + }); + setOpenCommentId(`${index}-${scoreName}`); + }} + onMouseLeave={() => setOpenCommentId(null)} + > + i +
+ {openCommentId === `${index}-${scoreName}` && ( +
+ {score.comment} +
+ )} + + )} +
+
+
+
+ ); +} diff --git a/app/components/evaluations/EvalDatasetDescription.tsx b/app/components/evaluations/EvalDatasetDescription.tsx index 4579101..b0b99a0 100644 --- a/app/components/evaluations/EvalDatasetDescription.tsx +++ b/app/components/evaluations/EvalDatasetDescription.tsx @@ -15,7 +15,7 @@ export default function EvalDatasetDescription({ return (
diff --git a/app/components/evaluations/EvalRunCard.tsx b/app/components/evaluations/EvalRunCard.tsx index 65990d1..64dba5b 100644 --- a/app/components/evaluations/EvalRunCard.tsx +++ b/app/components/evaluations/EvalRunCard.tsx @@ -2,18 +2,14 @@ import { useState } from "react"; import { useRouter } from "next/navigation"; -import { colors } from "@/app/lib/colors"; -import { - EvalJob, - AssistantConfig, - getScoreObject, -} from "@/app/components/types"; -import { getStatusColor, formatCostUSD } from "@/app/components/utils"; -import { timeAgo } from "@/app/lib/utils"; -import ConfigModal from "@/app/components/ConfigModal"; -import ScoreDisplay from "@/app/components/ScoreDisplay"; +import type { EvalJob, AssistantConfig } from "@/app/lib/types/evaluation"; +import { getScoreObject } from "@/app/lib/utils/evaluation"; +import { getStatusColor } from "@/app/components/utils"; +import { timeAgo, formatCostUSD } from "@/app/lib/utils"; +import { ConfigModal, InfoTooltip } from "@/app/components"; +import ScoreDisplay from "@/app/components/evaluations/ScoreDisplay"; import CostIcon from "@/app/components/icons/evaluations/CostIcon"; -import InfoTooltip from "@/app/components/InfoTooltip"; +import DatabaseIcon from "@/app/components/icons/evaluations/DatabaseIcon"; export interface EvalRunCardProps { job: EvalJob; @@ -33,91 +29,55 @@ export default function EvalRunCard({ return (
- {/* Row 1: Run Name (left) | Status (right) */}
-
+
{job.run_name}
{job.inserted_at && ( -
+
{timeAgo(job.inserted_at)}
)} {/* Error message (if failed) */} {job.error_message && ( -
+
{job.error_message}
)}
{job.status}
- {/* Row 2: Scores */} {scoreObj && (
)} - {/* Row 3: Dataset + Config + Cost (left) | Actions (right) */}
-
+
{job.dataset_name && ( - - - + {job.dataset_name} )} {job.assistant_id && assistantConfig?.name && ( - + {assistantConfig.name} )} {job.cost?.total_cost_usd != null && ( - + {formatCostUSD(job.cost.total_cost_usd)} )}
-
+
diff --git a/app/components/evaluations/EvaluationsTab.tsx b/app/components/evaluations/EvaluationsTab.tsx index 3fce988..d57ea05 100644 --- a/app/components/evaluations/EvaluationsTab.tsx +++ b/app/components/evaluations/EvaluationsTab.tsx @@ -4,12 +4,13 @@ import { useState, useEffect, useCallback } from "react"; import { apiFetch } from "@/app/lib/apiClient"; import { colors } from "@/app/lib/colors"; import { Dataset } from "@/app/lib/types/dataset"; -import { EvalJob, AssistantConfig } from "@/app/components/types"; +import { EvalJob, AssistantConfig } from "@/app/lib/types/evaluation"; import ConfigSelector from "@/app/components/ConfigSelector"; import Loader from "@/app/components/Loader"; import EvalRunCard from "./EvalRunCard"; import EvalDatasetDescription from "./EvalDatasetDescription"; import { useAuth } from "@/app/lib/context/AuthContext"; +import { RefreshIcon } from "@/app/components/icons"; type Tab = "datasets" | "evaluations"; @@ -390,23 +391,12 @@ export default function EvaluationsTab({
@@ -418,14 +408,12 @@ export default function EvaluationsTab({ boxShadow: "0 1px 3px rgba(0, 0, 0, 0.04)", }} > - {/* Loading */} {isLoading && evalJobs.length === 0 && (
)} - {/* Error */} {error && (
)} - {/* Empty State */} {!isLoading && evalJobs.length === 0 && !error && (
)} - {/* Runs List */} {evalJobs.length > 0 && (() => { const filteredJobs = diff --git a/app/components/evaluations/GroupedResultsTable.tsx b/app/components/evaluations/GroupedResultsTable.tsx new file mode 100644 index 0000000..1943d22 --- /dev/null +++ b/app/components/evaluations/GroupedResultsTable.tsx @@ -0,0 +1,261 @@ +/** + * GroupedResultsTable.tsx - Grouped view for evaluation results + * + * Displays multiple LLM answers per question in a grouped table format + */ + +import { useState, useEffect, Fragment } from "react"; +import { TraceScore, GroupedTraceItem } from "@/app/lib/types/evaluation"; +import { formatScoreValue } from "@/app/lib/utils"; + +export default function GroupedResultsTable({ + traces, +}: { + traces: GroupedTraceItem[]; +}) { + const [openCommentId, setOpenCommentId] = useState(null); + const [commentPos, setCommentPos] = useState({ top: 0, left: 0 }); + + useEffect(() => { + if (!openCommentId) return; + const handleScroll = () => setOpenCommentId(null); + window.addEventListener("scroll", handleScroll, true); + return () => { + window.removeEventListener("scroll", handleScroll, true); + }; + }, [openCommentId]); + + if (!traces || traces.length === 0) { + return ( +
+

No grouped results available

+
+ ); + } + + // Get max answers count + const maxAnswers = Math.max(...traces.map((t) => t.llm_answers.length)); + + // Fixed column widths (in pixels) for predictable layout + const COLUMN_WIDTHS = { + qId: 60, + question: 200, + groundTruth: 200, + answer: 250, + }; + + // Calculate minimum table width based on number of answers + // This ensures horizontal scroll activates at the right point + const fixedColumnsWidth = + COLUMN_WIDTHS.qId + COLUMN_WIDTHS.question + COLUMN_WIDTHS.groundTruth; + const tableMinWidth = fixedColumnsWidth + maxAnswers * COLUMN_WIDTHS.answer; + + return ( +
+
+ + + + + + + {Array.from({ length: maxAnswers }, (_, i) => ( + + ))} + + + + + {traces.map((group, index) => ( + + + + + + + + + {/* Answer */} + {Array.from({ length: maxAnswers }, (_, answerIndex) => { + const answer = group.llm_answers[answerIndex]; + return ( + + ); + })} + + + + ); + })} + + + ))} + +
+ Q.ID + + Question + + Ground Truth + + Answer {i + 1} +
+ {group.question_id} + +
+ {group.question} +
+
+
+ {group.ground_truth_answer} +
+
+ {answer ? ( +
+ {answer} +
+ ) : ( + - + )} +
+ + + + {Array.from({ length: maxAnswers }, (_, answerIndex) => { + const answerScores: TraceScore[] = + group.scores?.[answerIndex] || []; + const answer = group.llm_answers[answerIndex]; + + return ( + + {answer && answerScores.length > 0 ? ( +
+ {answerScores.map( + (score: TraceScore, scoreIdx: number) => { + if (!score) return null; + const { value, color, bg } = + formatScoreValue(score); + return ( +
+ + {score.name}: + +
+
+ {value} +
+ {score?.comment && + (() => { + const commentId = `g${index}-a${answerIndex}-s${scoreIdx}`; + return ( + <> +
{ + const rect = + e.currentTarget.getBoundingClientRect(); + const tooltipWidth = 300; + const centerX = + rect.left + rect.width / 2; + const clampedLeft = Math.min( + Math.max( + centerX - + tooltipWidth / 2, + 8, + ), + window.innerWidth - + tooltipWidth - + 8, + ); + setCommentPos({ + top: rect.top - 8, + left: clampedLeft, + }); + setOpenCommentId(commentId); + }} + onMouseLeave={() => + setOpenCommentId(null) + } + > + i +
+ {openCommentId === commentId && ( +
+ {score.comment} +
+ )} + + ); + })()} +
+
+ ); + }, + )} +
+ ) : null} +
+
+
+ ); +} diff --git a/app/components/ScoreDisplay.tsx b/app/components/evaluations/ScoreDisplay.tsx similarity index 94% rename from app/components/ScoreDisplay.tsx rename to app/components/evaluations/ScoreDisplay.tsx index 2f8b1db..68efa33 100644 --- a/app/components/ScoreDisplay.tsx +++ b/app/components/evaluations/ScoreDisplay.tsx @@ -5,7 +5,8 @@ "use client"; -import { ScoreObject, hasSummaryScores } from "./types"; +import type { ScoreObject } from "@/app/lib/types/evaluation"; +import { hasSummaryScores } from "@/app/lib/utils/evaluation"; interface ScoreDisplayProps { score: ScoreObject | null; @@ -16,7 +17,6 @@ export default function ScoreDisplay({ score, errorMessage, }: ScoreDisplayProps) { - // No score available if (!score) { return (
@@ -42,7 +42,6 @@ export default function ScoreDisplay({ ); } - // Separate numeric and categorical scores const numericScores = summaryScores.filter( (s) => s.data_type === "NUMERIC", ); @@ -83,7 +82,6 @@ export default function ScoreDisplay({ ); } - // Fallback for unsupported format return (
Score: diff --git a/app/components/icons/common/CopyIcon.tsx b/app/components/icons/common/CopyIcon.tsx new file mode 100644 index 0000000..ac3b372 --- /dev/null +++ b/app/components/icons/common/CopyIcon.tsx @@ -0,0 +1,20 @@ +interface IconProps { + className?: string; + style?: React.CSSProperties; +} + +export default function CopyIcon({ className, style }: IconProps) { + return ( + + + + + ); +} diff --git a/app/components/icons/common/RefreshIcon.tsx b/app/components/icons/common/RefreshIcon.tsx index fedb9e2..e244959 100644 --- a/app/components/icons/common/RefreshIcon.tsx +++ b/app/components/icons/common/RefreshIcon.tsx @@ -13,11 +13,13 @@ export default function RefreshIcon({ className, style }: IconProps) { strokeWidth={2} style={style} > - + + + ); } diff --git a/app/components/icons/index.tsx b/app/components/icons/index.tsx index e46af15..450a0ed 100644 --- a/app/components/icons/index.tsx +++ b/app/components/icons/index.tsx @@ -2,6 +2,7 @@ export { default as ArrowLeftIcon } from "./common/ArrowLeftIcon"; export { default as ChevronDownIcon } from "./common/ChevronDownIcon"; export { default as CheckIcon } from "./common/CheckIcon"; +export { default as CopyIcon } from "./common/CopyIcon"; export { default as EyeIcon } from "./common/EyeIcon"; export { default as EyeOffIcon } from "./common/EyeOffIcon"; export { default as RefreshIcon } from "./common/RefreshIcon"; diff --git a/app/components/index.ts b/app/components/index.ts index 318a498..9f5fbc4 100644 --- a/app/components/index.ts +++ b/app/components/index.ts @@ -1,5 +1,10 @@ export { default as Button } from "./Button"; +export { default as CodeBlock } from "./CodeBlock"; +export { default as ConfigModal } from "./ConfigModal"; +export { default as CopyableCodeBlock } from "./CopyableCodeBlock"; export { default as Field } from "./Field"; +export { default as InfoTooltip } from "./InfoTooltip"; export { default as Modal } from "./Modal"; export { default as PageHeader } from "./PageHeader"; export { default as Sidebar } from "./Sidebar"; +export { default as Tag } from "./Tag"; diff --git a/app/components/speech-to-text/EvaluationsTab.tsx b/app/components/speech-to-text/EvaluationsTab.tsx index 81dbebc..119e955 100644 --- a/app/components/speech-to-text/EvaluationsTab.tsx +++ b/app/components/speech-to-text/EvaluationsTab.tsx @@ -10,7 +10,8 @@ import Loader, { LoaderBox } from "@/app/components/Loader"; import StatusBadge from "@/app/components/StatusBadge"; import { computeWordDiff } from "./TranscriptionDiffViewer"; import { getStatusColor } from "@/app/components/utils"; -import AudioPlayerFromUrl from "./AudioPlayerFromUrl"; +import AudioPlayerFromUrl from "@/app/components/speech-to-text/AudioPlayerFromUrl"; +import { RefreshIcon } from "@/app/components/icons"; export interface EvaluationsTabProps { leftPanelWidth: number; @@ -442,22 +443,11 @@ export default function EvaluationsTab({
)} @@ -1213,27 +1203,18 @@ export default function EvaluationsTab({ return (
{/* Row 1: Run Name + Status */}
-
+
{run.run_name}
{/* Error message */} {run.error_message && ( -
+
{run.error_message}
)} diff --git a/app/components/text-to-speech/EvaluationsTab.tsx b/app/components/text-to-speech/EvaluationsTab.tsx index b4a1cce..0caa46f 100644 --- a/app/components/text-to-speech/EvaluationsTab.tsx +++ b/app/components/text-to-speech/EvaluationsTab.tsx @@ -15,6 +15,7 @@ import { useAuth } from "@/app/lib/context/AuthContext"; import { apiFetch } from "@/app/lib/apiClient"; import Loader, { LoaderBox } from "@/app/components/Loader"; import { getStatusColor } from "@/app/components/utils"; +import { RefreshIcon } from "@/app/components/icons"; import AudioPlayerFromUrl from "./AudioPlayerFromUrl"; import { useToast } from "@/app/components/Toast"; @@ -442,22 +443,11 @@ export default function EvaluationsTab({
)} @@ -1134,39 +1124,24 @@ export default function EvaluationsTab({ return (
{/* Row 1: Run Name + Status */}
-
+
{run.run_name}
{/* Error message */} {run.error_message && ( -
+
{run.error_message}
)}
{run.status} diff --git a/app/components/types.ts b/app/components/types.ts deleted file mode 100644 index b2bbc73..0000000 --- a/app/components/types.ts +++ /dev/null @@ -1,234 +0,0 @@ -/** - * Shared TypeScript types for evaluation components - */ - -export interface TraceScore { - name: string; - value: number | string; - data_type: "NUMERIC" | "CATEGORICAL"; - comment?: string; -} - -// New trace format (from evaluation-sample-3.json) -export interface TraceItem { - trace_id: string; - question: string; - llm_answer: string; - ground_truth_answer: string; - scores: TraceScore[]; -} - -export interface GroupedTraceItem { - question_id: number; - question: string; - ground_truth_answer: string; - llm_answers: string[]; - trace_ids: string[]; - scores: TraceScore[][]; -} - -// Legacy individual score format (nested structure) -export interface IndividualScore { - trace_id: string; - input?: { - question: string; - }; - output?: { - answer: string; - }; - metadata?: { - ground_truth?: string; - item_id?: string; - response_id?: string; - }; - trace_scores: TraceScore[]; -} - -export interface SummaryScore { - name: string; - avg?: number; - std?: number; - total_pairs: number; - data_type: "NUMERIC" | "CATEGORICAL"; - distribution?: Record; // For categorical data -} - -// New score object with traces array -export interface NewScoreObjectV2 { - summary_scores: SummaryScore[]; - traces: TraceItem[] | GroupedTraceItem[]; -} - -// Legacy score structure (for backward compatibility) -export interface PerItemScore { - trace_id: string; - cosine_similarity: number; -} - -export interface CosineSimilarity { - avg: number; - std: number; - total_pairs: number; - per_item_scores: PerItemScore[]; -} - -export interface LegacyScoreObject { - cosine_similarity: CosineSimilarity; -} - -// Basic score object with only summary scores (no individual scores or traces) -export interface BasicScoreObject { - summary_scores: SummaryScore[]; -} - -// Union type to support both old and new structures -export type ScoreObject = - | NewScoreObjectV2 - | BasicScoreObject - | LegacyScoreObject; - -export interface AssistantConfig { - name: string; - model: string; - knowledge_base_ids: string[]; - project_id: number; - organization_id: number; - updated_at: string; - deleted_at: string | null; - instructions: string; - assistant_id: string; - temperature: number; - max_num_results: number; - id: number; - inserted_at: string; - is_deleted: boolean; -} - -export interface EvalCostEntry { - model: string; - cost_usd: number; - input_tokens?: number; - output_tokens?: number; - prompt_tokens?: number; - total_tokens: number; -} - -export interface EvalCost { - response?: EvalCostEntry; - embedding?: EvalCostEntry; - total_cost_usd: number; -} - -export interface EvalJob { - id: number; - run_name: string; - dataset_name: string; - dataset_id: number; - batch_job_id: number; - embedding_batch_job_id: number | null; - status: string; - object_store_url: string | null; - total_items: number; - score?: ScoreObject | null; - scores?: ScoreObject | null; // Alternative field name - error_message: string | null; - config?: { - model?: string; - instructions?: string; - tools?: unknown[]; - include?: string[]; - temperature?: number; - }; - config_id?: string; - config_version?: number; - model?: string; - assistant_id?: string; - organization_id: number; - project_id: number; - cost?: EvalCost | null; - inserted_at: string; - updated_at: string; -} - -// Type guard functions - -// Shared guard: Check if score has summary_scores and intelligently narrow to NewScoreObjectV2 or BasicScoreObject -// Priority: If it has traces → NewScoreObjectV2, otherwise → BasicScoreObject -export function hasSummaryScores( - score: ScoreObject | null | undefined, -): score is NewScoreObjectV2 | BasicScoreObject { - if (!score) return false; - if (!("summary_scores" in score)) return false; - - // Prioritize traces format if available - if ("traces" in score) { - return true; - } - - // Otherwise, it's BasicScoreObject (summary_scores only, no traces, no individual_scores) - return true; -} - -export function isNewScoreObjectV2( - score: ScoreObject | null | undefined, -): score is NewScoreObjectV2 { - if (!score) return false; - return "summary_scores" in score && "traces" in score; -} - -export function isBasicScoreObject( - score: ScoreObject | null | undefined, -): score is BasicScoreObject { - if (!score) return false; - return "summary_scores" in score && !("traces" in score); -} - -export function isLegacyScoreObject( - score: ScoreObject | null | undefined, -): score is LegacyScoreObject { - if (!score) return false; - return "cosine_similarity" in score; -} - -// Helper to get score object from job -export function getScoreObject(job: EvalJob): ScoreObject | null { - return job.scores || job.score || null; -} - -export function isGroupedFormat( - traces: TraceItem[] | GroupedTraceItem[], -): traces is GroupedTraceItem[] { - if (!traces || traces.length === 0) return false; - return "llm_answers" in traces[0] && Array.isArray(traces[0].llm_answers); -} - -// Normalize traces to IndividualScore format for table display -export function normalizeToIndividualScores( - score: ScoreObject | null | undefined, -): IndividualScore[] { - if (!score) return []; - - if (isNewScoreObjectV2(score)) { - // Convert TraceItem[] to IndividualScore[] for table display - // Note: Grouped traces should be detected earlier and handled separately - return score.traces.map((trace: TraceItem | GroupedTraceItem) => { - // Handle regular TraceItem format - if ("llm_answer" in trace) { - return { - trace_id: trace.trace_id, - input: { question: trace.question }, - output: { answer: trace.llm_answer }, - metadata: { ground_truth: trace.ground_truth_answer }, - trace_scores: trace.scores, - }; - } - // Should not reach here if grouped format is handled properly - return { - trace_id: "", - trace_scores: [], - }; - }); - } - - return []; -} diff --git a/app/components/utils.ts b/app/components/utils.ts index f1386e6..90912ae 100644 --- a/app/components/utils.ts +++ b/app/components/utils.ts @@ -27,9 +27,11 @@ export const formatDate = (dateString?: string): string => { }; /** - * Returns color scheme based on job/evaluation status + * Returns Tailwind class names based on job/evaluation status. + * The colour tokens are defined in globals.css as @theme inline vars. + * * @param status - Status string (completed, processing, failed, etc.) - * @returns Object with bg, border, and text HSL color values + * @returns Object with bg, border, and text Tailwind class names */ export const getStatusColor = ( status: string, @@ -38,50 +40,35 @@ export const getStatusColor = ( case "completed": case "success": return { - bg: "hsl(134, 61%, 95%)", - border: "hsl(134, 61%, 70%)", - text: "hsl(134, 61%, 25%)", + bg: "bg-status-success-bg", + border: "border-status-success-border", + text: "text-status-success-text", }; case "processing": case "pending": case "queued": case "running": return { - bg: "hsl(46, 100%, 95%)", - border: "hsl(46, 100%, 80%)", - text: "hsl(46, 100%, 25%)", + bg: "bg-status-warning-bg", + border: "border-status-warning-border", + text: "text-status-warning-text", }; case "failed": case "error": return { - bg: "hsl(8, 86%, 95%)", - border: "hsl(8, 86%, 80%)", - text: "hsl(8, 86%, 40%)", + bg: "bg-status-error-bg", + border: "border-status-error-border", + text: "text-status-error-text", }; default: return { - bg: "hsl(0, 0%, 100%)", - border: "hsl(0, 0%, 85%)", - text: "hsl(330, 3%, 49%)", + bg: "bg-status-default-bg", + border: "border-status-default-border", + text: "text-status-default-text", }; } }; -/** - * Formats a USD cost value for display - * @param cost - Cost in USD - * @returns Formatted cost string (e.g., "$0.0013", "$1.25") - */ -export const formatCostUSD = (cost: number): string => { - if (!Number.isFinite(cost)) { - return "N/A"; - } - if (cost < 0.01) { - return `$${cost.toFixed(4)}`; - } - return `$${cost.toFixed(2)}`; -}; - /** * Calculates dynamic thresholds for color coding based on score distribution * @param scores - Array of similarity scores diff --git a/app/globals.css b/app/globals.css index 1d67ebc..05165ab 100644 --- a/app/globals.css +++ b/app/globals.css @@ -49,6 +49,34 @@ --color-status-warning: #f59e0b; } +/* Status badge colors — success */ +@theme inline { + --color-status-success-bg: hsl(134, 61%, 95%); + --color-status-success-border: hsl(134, 61%, 70%); + --color-status-success-text: hsl(134, 61%, 25%); +} + +/* Status badge colors — warning */ +@theme inline { + --color-status-warning-bg: hsl(46, 100%, 95%); + --color-status-warning-border: hsl(46, 100%, 80%); + --color-status-warning-text: hsl(46, 100%, 25%); +} + +/* Status badge colors — error */ +@theme inline { + --color-status-error-bg: hsl(8, 86%, 95%); + --color-status-error-border: hsl(8, 86%, 80%); + --color-status-error-text: hsl(8, 86%, 40%); +} + +/* Status badge colors — default */ +@theme inline { + --color-status-default-bg: hsl(0, 0%, 100%); + --color-status-default-border: hsl(0, 0%, 85%); + --color-status-default-text: hsl(330, 3%, 49%); +} + @media (prefers-color-scheme: dark) { :root { --background: #000000; diff --git a/app/lib/types/evaluation.ts b/app/lib/types/evaluation.ts new file mode 100644 index 0000000..8b01a15 --- /dev/null +++ b/app/lib/types/evaluation.ts @@ -0,0 +1,141 @@ +export interface TraceScore { + name: string; + value: number | string; + data_type: "NUMERIC" | "CATEGORICAL"; + comment?: string; +} + +export interface TraceItem { + trace_id: string; + question: string; + llm_answer: string; + ground_truth_answer: string; + scores: TraceScore[]; +} + +export interface GroupedTraceItem { + question_id: number; + question: string; + ground_truth_answer: string; + llm_answers: string[]; + trace_ids: string[]; + scores: TraceScore[][]; +} + +export interface IndividualScore { + trace_id: string; + input?: { + question: string; + }; + output?: { + answer: string; + }; + metadata?: { + ground_truth?: string; + item_id?: string; + response_id?: string; + }; + trace_scores: TraceScore[]; +} + +export interface SummaryScore { + name: string; + avg?: number; + std?: number; + total_pairs: number; + data_type: "NUMERIC" | "CATEGORICAL"; + distribution?: Record; // For categorical data +} + +export interface NewScoreObjectV2 { + summary_scores: SummaryScore[]; + traces: TraceItem[] | GroupedTraceItem[]; +} + +export interface PerItemScore { + trace_id: string; + cosine_similarity: number; +} + +export interface CosineSimilarity { + avg: number; + std: number; + total_pairs: number; + per_item_scores: PerItemScore[]; +} + +export interface LegacyScoreObject { + cosine_similarity: CosineSimilarity; +} + +export interface BasicScoreObject { + summary_scores: SummaryScore[]; +} + +export type ScoreObject = + | NewScoreObjectV2 + | BasicScoreObject + | LegacyScoreObject; + +export interface AssistantConfig { + name: string; + model: string; + knowledge_base_ids: string[]; + project_id: number; + organization_id: number; + updated_at: string; + deleted_at: string | null; + instructions: string; + assistant_id: string; + temperature: number; + max_num_results: number; + id: number; + inserted_at: string; + is_deleted: boolean; +} + +export interface EvalCostEntry { + model: string; + cost_usd: number; + input_tokens?: number; + output_tokens?: number; + prompt_tokens?: number; + total_tokens: number; +} + +export interface EvalCost { + response?: EvalCostEntry; + embedding?: EvalCostEntry; + total_cost_usd: number; +} + +export interface EvalJob { + id: number; + run_name: string; + dataset_name: string; + dataset_id: number; + batch_job_id: number; + embedding_batch_job_id: number | null; + status: string; + object_store_url: string | null; + total_items: number; + score?: ScoreObject | null; + scores?: ScoreObject | null; // Alternative field name + error_message: string | null; + config?: { + model?: string; + instructions?: string; + tools?: unknown[]; + include?: string[]; + temperature?: number; + }; + config_id?: string; + config_version?: number; + model?: string; + assistant_id?: string; + organization_id: number; + project_id: number; + cost?: EvalCost | null; + inserted_at: string; + updated_at: string; +} diff --git a/app/lib/utils.ts b/app/lib/utils.ts index 27c28c5..1ac04eb 100644 --- a/app/lib/utils.ts +++ b/app/lib/utils.ts @@ -10,6 +10,7 @@ import { import { SavedConfig, ConfigGroup } from "./types/configs"; import { isGpt5Model } from "@/app/lib/models"; import { STORAGE_KEYS } from "@/app/lib/constants"; +import { TraceScore } from "@/app/lib/types/evaluation"; export function timeAgo(dateStr: string): string { const date = @@ -193,3 +194,67 @@ export const sanitizeCSVCell = ( } return `"${sanitized}"`; }; + +export const formatScoreValue = (score: TraceScore | undefined) => { + if (!score) return { value: "N/A", color: "#737373", bg: "transparent" }; + + if (score.data_type === "CATEGORICAL") { + const catValue = String(score.value); + let color = "#171717"; + let bg = "#fafafa"; + + if (catValue === "CORRECT") { + color = "#15803d"; + bg = "#dcfce7"; + } else if (catValue === "PARTIAL") { + color = "#92400e"; + bg = "#fef3c7"; + } else if (catValue === "INCORRECT") { + color = "#dc2626"; + bg = "#fee2e2"; + } + + return { value: catValue, color, bg }; + } + + const numValue = Number(score.value); + const formattedValue = numValue.toFixed(2); + let color = "#171717"; + let bg = "transparent"; + + if (numValue >= 0.7) { + color = "#15803d"; + bg = "#dcfce7"; + } else if (numValue >= 0.5) { + color = "#92400e"; + bg = "#fef3c7"; + } else { + color = "#dc2626"; + bg = "#fee2e2"; + } + + return { value: formattedValue, color, bg }; +}; + +export const getScoreByName = ( + scores: TraceScore[], + name: string, +): TraceScore | undefined => { + if (!scores || !Array.isArray(scores)) return undefined; + return scores.find((s) => s?.name === name); +}; + +/** + * Formats a USD cost value for display + * @param cost - Cost in USD + * @returns Formatted cost string (e.g., "$0.0013", "$1.25") + */ +export const formatCostUSD = (cost: number): string => { + if (!Number.isFinite(cost)) { + return "N/A"; + } + if (cost < 0.01) { + return `$${cost.toFixed(4)}`; + } + return `$${cost.toFixed(2)}`; +}; diff --git a/app/lib/utils/evaluation.ts b/app/lib/utils/evaluation.ts new file mode 100644 index 0000000..441fe18 --- /dev/null +++ b/app/lib/utils/evaluation.ts @@ -0,0 +1,53 @@ +import type { + EvalJob, + GroupedTraceItem, + IndividualScore, + NewScoreObjectV2, + BasicScoreObject, + ScoreObject, + TraceItem, +} from "@/app/lib/types/evaluation"; + +export function hasSummaryScores( + score: ScoreObject | null | undefined, +): score is NewScoreObjectV2 | BasicScoreObject { + if (!score) return false; + return "summary_scores" in score; +} + +export function isNewScoreObjectV2( + score: ScoreObject | null | undefined, +): score is NewScoreObjectV2 { + if (!score) return false; + return "summary_scores" in score && "traces" in score; +} + +export function getScoreObject(job: EvalJob): ScoreObject | null { + return job.scores || job.score || null; +} + +export function isGroupedFormat( + traces: TraceItem[] | GroupedTraceItem[], +): traces is GroupedTraceItem[] { + if (!traces || traces.length === 0) return false; + return "llm_answers" in traces[0] && Array.isArray(traces[0].llm_answers); +} + +export function normalizeToIndividualScores( + score: ScoreObject | null | undefined, +): IndividualScore[] { + if (!score || !isNewScoreObjectV2(score)) return []; + + return score.traces.map((trace: TraceItem | GroupedTraceItem) => { + if ("llm_answer" in trace) { + return { + trace_id: trace.trace_id, + input: { question: trace.question }, + output: { answer: trace.llm_answer }, + metadata: { ground_truth: trace.ground_truth_answer }, + trace_scores: trace.scores, + }; + } + return { trace_id: "", trace_scores: [] }; + }); +} diff --git a/app/page.tsx b/app/page.tsx index 50724ca..060f549 100644 --- a/app/page.tsx +++ b/app/page.tsx @@ -7,7 +7,6 @@ import { RefreshIcon } from "@/app/components/icons"; export default function Home() { const router = useRouter(); - // Auto-redirect to evaluations page useEffect(() => { router.push("/evaluations"); }, [router]); diff --git a/instructions/CLAUDE.md b/instructions/CLAUDE.md deleted file mode 100644 index dbe1573..0000000 --- a/instructions/CLAUDE.md +++ /dev/null @@ -1,327 +0,0 @@ -# CLAUDE.md - -This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. - -## Project Overview - -Kaapi Konsole is a Next.js 16 application by Tech4Dev for LLM development and evaluation. It provides: - -- LLM response evaluation against QnA datasets -- Git-like version control for prompt templates -- Configuration management with A/B testing -- Dataset and API key management - -The application has evolved from a simple evaluation tool into a full-featured LLM development platform. - -## Technology Stack - -- **Framework**: Next.js 16.0.7 (App Router) -- **React**: 19.2.0 (with hooks-based state management) -- **Routing**: Next.js App Router + React Router DOM 7.9.5 (dual system) -- **Styling**: Tailwind CSS 4.x + centralized color system in `/app/lib/colors.ts` -- **TypeScript**: 5.x (strict mode disabled) -- **Data Fetching**: SWR 2.3.6 (not widely used yet) -- **Date/Time**: date-fns 4.1.0, date-fns-tz 3.2.0 - -## Development Commands - -```bash -# Start development server (http://localhost:3000) -npm run dev - -# Build for production -npm run build - -# Start production server -npm start - -# Run linter -npm run lint -``` - -## Application Architecture - -### Route Structure - -``` -/ → Redirects to /evaluations -/evaluations → Main eval interface (upload & results) -/evaluations/[id] → Detailed evaluation report -/datasets → Dataset upload and management -/keystore → API key management (localStorage-based) -/configurations/prompt-editor → Git-like prompt version control -/test-evaluation → Mock data testing page -``` - -**Coming Soon Routes** (placeholders): - -- `/model-testing`, `/speech-to-text`, `/text-to-speech`, `/guardrails`, `/redteaming` - -### Component Organization - -**Shared Components** (`/app/components/`): - -- `Sidebar.tsx` - Main navigation (240px collapsible) -- `TabNavigation.tsx` - Reusable tab switcher -- `ConfigModal.tsx` - Modal for viewing evaluation configs -- `DetailedResultsTable.tsx` - Evaluation traces table -- `ScoreDisplay.tsx`, `StatusBadge.tsx` - Display primitives -- `types.ts` - Shared TypeScript interfaces -- `utils.ts` - Date formatting, color utilities - -**Prompt Editor Components** (`/app/components/prompt-editor/`): - -- `Header.tsx` - Top nav with branch controls -- `EditorView.tsx` - WYSIWYG prompt editor -- `DiffView.tsx` - Side-by-side diff visualization -- `HistorySidebar.tsx` - Commit history tree -- `ConfigDrawer.tsx` - Right-side configuration drawer -- `CurrentConfigTab.tsx`, `HistoryTab.tsx`, `ABTestTab.tsx` - Drawer tabs -- `BranchModal.tsx`, `MergeModal.tsx` - Dialogs - -### State Management Pattern - -**No global state library** - uses React `useState` exclusively: - -- Component-level state with props drilling -- LocalStorage for persistence (API keys, sidebar state) -- No Context API or Redux/Zustand - -**LocalStorage Keys:** - -- `kaapi_api_keys` - API key storage -- `sidebar-expanded-menus` - Sidebar expansion state - -### API Integration Architecture - -**Proxy Pattern**: All backend calls route through Next.js API handlers in `/app/api/`: - -``` -GET/POST /api/evaluations → List/create eval jobs -GET /api/evaluations/[id] → Get job details -GET/POST /api/evaluations/datasets → List/upload datasets -GET /api/evaluations/datasets/[dataset_id] -GET /api/assistant/[assistant_id] → Fetch assistant config -``` - -**Backend URL**: Configured via `BACKEND_URL` (default: `http://localhost:8000`) - -**Authentication**: Custom header `X-API-KEY` passed from client - -**Mock Data System**: Toggle via `USE_MOCK_DATA` flag in API routes. Mock files in `/public/mock-data/`. - -### Type System - -**Complex Type Hierarchies** in `/app/components/types.ts`: - -**Evaluation Types:** - -- `EvalJob` - Main evaluation job entity -- `ScoreObject` - Union type supporting 3 formats: - - `NewScoreObjectV2` (with `traces[]` array) - - `NewScoreObject` (with `individual_scores[]`) - - `LegacyScoreObject` (old cosine similarity format) -- `TraceItem` - Individual Q&A evaluation trace -- `SummaryScore` - Aggregate metrics (NUMERIC/CATEGORICAL) - -**Type Guards**: `isNewScoreObjectV2()`, `isLegacyScoreObject()` for runtime type checking - -**Prompt Editor Types** in `/app/configurations/prompt-editor/types.ts`: - -- `Commit` - Git-like commit with branch/parent relationships -- `Config` - LLM configuration blob with versioning -- `Tool` - Vector store tool definition -- `Variant` - A/B test variant configuration -- `DiffLine` - Myers diff algorithm output - -### Styling System - -**Current Design**: Vercel-style minimalist black/white theme - -**Color Management**: - -- All colors defined in `/app/lib/colors.ts` as TypeScript object -- Synchronized with CSS variables in `globals.css` -- Dark mode support via `prefers-color-scheme` media query -- See `COLOR_SCHEME.md` for quick preset options - -**Styling Approach**: - -1. Tailwind CSS for layout and spacing -2. Inline styles for colors (referencing `colors` object) -3. Hover states managed via React event handlers -4. No custom Tailwind classes or extended theme - -**Color Palette**: - -```typescript -bg: { primary: '#ffffff', secondary: '#fafafa' } -text: { primary: '#171717', secondary: '#737373' } -border: '#e5e5e5' -accent: { primary: '#171717', hover: '#404040' } -status: { success: '#16a34a', error: '#dc2626', warning: '#f59e0b' } -``` - -## Key Features - -### 1. LLM Evaluation Pipeline - -**Workflow**: - -1. Upload CSV with `question,answer` columns -2. Configure experiment (model, instructions, vector stores) -3. Backend creates evaluation job -4. Job status polled every 10 seconds -5. Results displayed with detailed metrics - -**Evaluation Modes**: - -- Config-based: Specify model, instructions, tools -- Assistant-based: Use pre-configured assistant ID - -**Metrics Display**: - -- Summary scores (avg ± std for numeric, distribution for categorical) -- Per-item traces with expandable Q&A pairs -- Color-coded scores with dynamic thresholds -- CSV export functionality - -### 2. Git-like Prompt Version Control - -**Core Concepts** (see `/configurations/prompt-editor/page.tsx`): - -- **Commits**: Versioned prompt snapshots with author/message/timestamp -- **Branches**: Parallel development streams (e.g., main, experiment-v2) -- **Diffs**: Myers algorithm for side-by-side change visualization -- **Merges**: Branch integration with duplicate commit detection - -**Implementation Details**: - -- All commits stored in-memory (no backend persistence yet) -- `createBranch()` preserves uncommitted changes when branching from HEAD -- `switchBranch()` loads latest commit from target branch -- `commitVersion()` creates new commit on current branch -- `mergeBranch()` prevents duplicate merges - -**IMPORTANT**: When creating a new branch from current HEAD (not a specific historical commit), uncommitted changes in the editor must persist. This matches git behavior. - -### 3. Configuration Management & A/B Testing - -**Config Structure**: - -```javascript -{ - id: string, - name: string, - version: number, // Auto-incremented per name - config_blob: { - completion: { - provider: 'openai' | 'anthropic' | 'google', - params: { model, instructions, temperature, tools[] } - } - } -} -``` - -**Features**: - -- Multi-version configs (auto-incremented) -- "Use Current Prompt" syncs from editor -- History tab shows all saved configs -- A/B testing with 2-4 variants -- Simulated test runs (1.5s delay, random scores) - -See `CONFIG_AB.md` for complete feature specification. - -## Key Implementation Patterns - -### TypeScript Configuration - -- Path alias `@/*` maps to project root -- Strict mode disabled (`strict: false`) -- JSX uses `react-jsx` transform -- Module resolution: `bundler` - -### Date/Time Handling - -- IST (Indian Standard Time) used throughout -- Timezone offsets manually added to UTC dates -- Format: `date-fns` with `date-fns-tz` - -### Component Patterns - -1. **Client-Side Components**: Most pages use `"use client"` for hooks and browser APIs -2. **Props Drilling**: Deep component trees pass 10+ props (no Context API) -3. **Inline Validation**: Error handling with alerts (no toast library) -4. **Loading States**: Skeleton loaders with Tailwind pulse animation - -### Data Fetching - -- Direct `fetch()` calls (no axios/react-query) -- SWR installed but minimally used -- Polling intervals for job status (10s) -- Mock data toggle for development - -## File Path Conventions - -- Use `@/` prefix for imports: `import Component from '@/app/components/Component'` -- All application code in `/app/` (App Router structure) -- Shared components: `/app/components/` -- Feature components: `/app/components/[feature]/` -- API routes: `/app/api/` -- Utilities: `/app/lib/` - -## Development Workflow Guidelines - -1. **Styling**: Use centralized colors from `/app/lib/colors.ts`, not hardcoded hex values -2. **State**: Keep state in component hierarchy, not global stores -3. **Types**: Use shared types from `/app/components/types.ts` for evaluations -4. **Colors**: Reference `colors` object for inline styles, Tailwind for layout -5. **API Calls**: Route through `/app/api/` handlers, not direct backend calls -6. **Date Formatting**: Use `formatDateTime()` from `/app/components/utils.ts` - -## Backend Integration - -**Environment Variables**: - -```bash -BACKEND_URL=http://localhost:8000 # Backend API base URL -``` - -**Authentication**: - -- API keys stored in localStorage -- Passed via `X-API-KEY` header -- No JWT/OAuth implementation - -**Dataset Upload**: - -- CSV format: `question,expected_answer` columns -- Duplication factor supported (1-10) -- Backend handles file processing - -## Technical Debt & Known Patterns - -1. **Dual Routing**: Next.js App Router + React Router DOM coexist (avoid confusion) -2. **Props Drilling**: Consider Context API for deeply nested props -3. **Magic Strings**: Status values, localStorage keys hardcoded -4. **Mixed Styling**: Tailwind + inline styles + CSS modules (prefer consistency) -5. **No Testing**: No test files exist (add tests for critical paths) -6. **Large Files**: Some components exceed 1000 lines (consider splitting) -7. **Type Safety**: Strict mode disabled (many `any` types exist) - -## Important Notes - -1. **React 19**: Uses bleeding-edge React version (expect occasional breaking changes) -2. **LocalStorage**: API keys stored client-side (not production-ready for sensitive data) -3. **Mock Data**: Production code includes mock system (toggle via flags) -4. **IST Timezone**: All timestamps assume Indian Standard Time -5. **No Testing**: No test infrastructure exists yet -6. **Component Location**: Check both `/app/components/` and feature folders for components - -## Documentation Files - -- `/CLAUDE.md` - This file (architectural guidance) -- `/COLOR_SCHEME.md` - Quick color preset guide -- `/CONFIG_AB.md` - A/B testing feature specification -- `/README.md` - Standard Next.js boilerplate diff --git a/instructions/COLOR_SCHEME.md b/instructions/COLOR_SCHEME.md deleted file mode 100644 index 616d9d2..0000000 --- a/instructions/COLOR_SCHEME.md +++ /dev/null @@ -1,65 +0,0 @@ -# Color Scheme Configuration - -This app uses a centralized color configuration for easy experimentation. - -## Configuration File - -Edit `/app/lib/colors.ts` to change the entire app's color scheme. - -## Current Colors - -```typescript -{ - bg: { - primary: '#ffffff', // Main background (white) - secondary: '#fafafa', // Secondary background (light gray) - }, - text: { - primary: '#171717', // Main text (near black) - secondary: '#737373', // Muted text (gray) - }, - border: '#e5e5e5', // All borders - accent: { - primary: '#0070f3', // Primary buttons, links, active states (Vercel blue) - hover: '#0761d1', // Hover state for accent - }, - status: { - success: '#16a34a', // Success states (green) - error: '#dc2626', // Error states (red) - warning: '#f59e0b', // Warning states (orange) - } -} -``` - -## Quick Color Scheme Presets - -### Vercel Style (Current) - -- Accent: `#0070f3` (blue) - -### Linear Style - -- Accent: `#5E6AD2` (purple-blue) -- Update `colors.accent.primary` to `#5E6AD2` -- Update `colors.accent.hover` to `#4F5CC0` - -### GitHub Style - -- Accent: `#2DA44E` (green) -- Update `colors.accent.primary` to `#2DA44E` -- Update `colors.accent.hover` to `#238636` - -### Minimal Black - -- Accent: `#171717` (black) -- Update `colors.accent.primary` to `#171717` -- Update `colors.accent.hover` to `#404040` - -## How to Change - -1. Open `/app/lib/colors.ts` -2. Modify the color values -3. Save the file -4. Refresh your browser - -That's it! All components use these centralized values. diff --git a/instructions/CONFIG_AB.md b/instructions/CONFIG_AB.md deleted file mode 100644 index 8b16150..0000000 --- a/instructions/CONFIG_AB.md +++ /dev/null @@ -1,277 +0,0 @@ -I need to implement a configuration drawer and A/B testing feature for a prompt version control system. Here's what needs to be built: - -## Context - -We have a React-based version control system for prompt templates (similar to Git). Users can commit prompts, create branches, view diffs, and merge. Now we need to add configuration management and A/B testing. - -## Requirements - -### 1. Configuration Drawer (Right Side, 420px width) - -**Trigger:** Floating Action Button (FAB) - "⚙️" icon, bottom-right corner, 56x56px circle, blue background - -**Drawer Structure:** - -- Slides in from right when FAB clicked -- 3 tabs: "Current" | "History" | "A/B Test" -- Close button (X) top-right -- Boxshadow for depth - -### 2. Current Config Tab - -**Fields (top to bottom):** - -1. **Config Name Selector** - - Dropdown to select existing configs (shows: "Name (vX)") - - "+ New" button next to it - - If New clicked: show text input for new config name - -2. **Provider Dropdown** - - Options: OpenAI, Anthropic, Google - - Default: openai - -3. **Model Dropdown** - - Options: gpt-4o-mini, gpt-4o, gpt-4-turbo, gpt-3.5-turbo - - Default: gpt-4o-mini - -4. **Instructions Section** - - Label: "Instructions" - - Button: "Use Current Prompt" (copies from main editor) - - Textarea: multiline, monospace font, 120px min-height - -5. **Temperature Slider** - - Label: "Temperature: {value}" - - Range: 0 to 1, step 0.1 - - Labels below: "Focused (0)" | "Balanced (0.5)" | "Creative (1)" - -6. **Tools Section** - - Label: "Tools" with "+ Add Tool" button - - Each tool shows: - - Type: File Search (hardcoded for now) - - Input: Vector Store ID - - Input: Max Results (number) - - Remove button - -7. **Commit Message** - - Optional text input - - Placeholder: "Describe this configuration..." - -8. **Save Button** - - Full width, green (#2da44e) - - Text: "Save Configuration" - -**Data Structure for Config:** - -```javascript -{ - id: 'cfg1', - name: 'Main Config', - version: 1, - timestamp: Date.now(), - config_blob: { - completion: { - provider: 'openai', - params: { - model: 'gpt-4o-mini', - instructions: '...', - temperature: 0.7, - tools: [ - { - type: 'file_search', - knowledge_base_ids: ['vs_abc123'], - max_num_results: 20 - } - ] - } - } - }, - commitMessage: 'Optional message' -} -``` - -### 3. History Tab - -**Display:** - -- List of all saved configs (reverse chronological) -- Each card shows: - - Config name (vX) - - Model • temp: X - - Timestamp (formatted like "2h ago", "3d ago") - - Commit message (if exists, italicized) -- Click card to load that config into Current tab -- Active config highlighted - -### 4. A/B Test Tab - -**Variant Configuration:** - -- Show 2 variants by default (A and B) -- Each variant card contains: - - Header: "Variant A/B/C/D" - - Config dropdown: Select from saved configs - - Prompt dropdown: Select from commit history (show: "#ID: message (branch)") - - Preview box (readonly): Shows model, temp, first line of prompt -- "+ Add Variant" button (max 4 variants) - -**Test Input Section:** - -- Label: "Test Input" -- Textarea for test prompt - -**Run Test Button:** - -- Full width, green -- Text: "▶ Run Test" -- Disabled if no test input - -**Results Section (appears after running):** - -- Card for each variant showing: - - Variant name - - Score (0.00-1.00 format) - - Config name • Commit message - - Latency in ms -- Highlight best performer with "🏆 Best: Variant X" in green box - -**Test Simulation:** - -```javascript -// For PoC, simulate API call: -await new Promise((resolve) => setTimeout(resolve, 1500)); -const score = 0.7 + Math.random() * 0.25; -const latency = 200 + Math.random() * 400; -``` - -### 5. State Management - -**New State Variables Needed:** - -```javascript -// Drawer -const [drawerOpen, setDrawerOpen] = useState(false); -const [drawerTab, setDrawerTab] = useState("config"); - -// Configs -const [configs, setConfigs] = useState([]); -const [selectedConfigId, setSelectedConfigId] = useState(""); -const [configName, setConfigName] = useState(""); -const [provider, setProvider] = useState("openai"); -const [model, setModel] = useState("gpt-4o-mini"); -const [instructions, setInstructions] = useState(""); -const [temperature, setTemperature] = useState(0.7); -const [tools, setTools] = useState([]); -const [configCommitMsg, setConfigCommitMsg] = useState(""); - -// A/B Testing -const [variants, setVariants] = useState([ - { id: "A", configId: "", commitId: "", name: "Variant A" }, - { id: "B", configId: "", commitId: "", name: "Variant B" }, -]); -const [testInput, setTestInput] = useState(""); -const [testResults, setTestResults] = useState(null); -const [isRunningTest, setIsRunningTest] = useState(false); -``` - -### 6. Key Functions to Implement - -```javascript -// Save new config version -const saveConfig = () => { - // Validate config name exists - // Create new config object with incremented version - // Add to configs array - // Show success alert -}; - -// Load existing config -const loadConfig = (configId) => { - // Find config by ID - // Populate all form fields - // Set as selected config -}; - -// Add/remove/update tools -const addTool = () => { - /* Add empty tool */ -}; -const removeTool = (index) => { - /* Remove by index */ -}; -const updateTool = (index, field, value) => { - /* Update specific field */ -}; - -// Run A/B test -const runABTest = async () => { - // Validate test input exists - // Set loading state - // Simulate API calls (1.5s delay) - // Generate mock scores and latencies - // Display results -}; - -// Manage variants -const addVariant = () => { - /* Max 4 variants */ -}; -const updateVariant = (index, field, value) => { - /* Update variant config */ -}; -``` - -### 7. UI/UX Details - -**Colors:** - -Use current B/W color scheme. Make sure the design system does not diverge. - -**Spacing:** - -- Drawer padding: 20px -- Section spacing: 16px bottom margin -- Input padding: 8px -- Label font: 12px, weight 600 - -**Interactions:** - -- FAB hover: scale(1.1) transform -- Drawer animation: slide in from right (can use conditional render for MVP) -- Close drawer on: X button click, overlay click (optional) - -### 8. Integration Points - -**With Existing System:** - -- Access `currentContent` from main editor for "Use Current Prompt" -- Access `commits` array for A/B test prompt selection -- Add "▶ Run A/B Test" button in header (opens drawer to A/B tab) - -### 9. Starting Point - -If you have the existing version control code, add: - -1. FAB button positioned fixed bottom-right -2. Conditional render of drawer when `drawerOpen === true` -3. Tab switching logic -4. Form fields with controlled inputs -5. A/B test variant management - -The drawer should NOT affect the existing version control tree, editor, or diff views. It's purely additive. - -## File Structure - -- Single React component (or can split into sub-components) -- Keep all state in parent component for MVP -- No external dependencies beyond React - -## Success Criteria - -✅ FAB opens/closes drawer -✅ Can create and save configs with all fields -✅ Can load previous configs from history -✅ Can set up 2-4 A/B test variants -✅ Can run test and see simulated results -✅ Results show winner clearly -✅ "Use Current Prompt" syncs editor content -✅ UI is clean and uncluttered diff --git a/instructions/CONFIG_API.md b/instructions/CONFIG_API.md deleted file mode 100644 index c9ade1a..0000000 --- a/instructions/CONFIG_API.md +++ /dev/null @@ -1,215 +0,0 @@ -# Config Management API Integration Instructions - -## Overview - -Integrate the Config Management APIs into an existing Next.js UI. The API manages LLM configurations with version control (similar to git commits for config changes). - -## Base URL & Auth - -- Base: `/api/v1/configs` -- Auth: Bearer token via `Authorization` header OR API key via `X-API-KEY` header - ---- - -## API Endpoints - -### 1. Configs (Parent Entity) - -#### List Configs - -``` -GET /api/v1/configs/ -Query: skip (default 0), limit (default 100, max 100) -Response: { success: boolean, data: ConfigPublic[], error?: string } -``` - -#### Create Config - -``` -POST /api/v1/configs/ -Body: ConfigCreate -Response 201: { success: boolean, data: ConfigWithVersion } -``` - -#### Get Config - -``` -GET /api/v1/configs/{config_id} -Response: { success: boolean, data: ConfigPublic } -``` - -#### Update Config (metadata only) - -``` -PATCH /api/v1/configs/{config_id} -Body: ConfigUpdate -Response: { success: boolean, data: ConfigPublic } -``` - -#### Delete Config - -``` -DELETE /api/v1/configs/{config_id} -Response: { success: boolean, data: { message: string } } -``` - -### 2. Config Versions (Child Entity) - -#### List Versions - -``` -GET /api/v1/configs/{config_id}/versions -Query: skip, limit -Response: { success: boolean, data: ConfigVersionItems[] } -``` - -#### Create Version - -``` -POST /api/v1/configs/{config_id}/versions -Body: ConfigVersionCreate -Response 201: { success: boolean, data: ConfigVersionPublic } -``` - -#### Get Specific Version - -``` -GET /api/v1/configs/{config_id}/versions/{version_number} -Response: { success: boolean, data: ConfigVersionPublic } -``` - -#### Delete Version - -``` -DELETE /api/v1/configs/{config_id}/versions/{version_number} -Response: { success: boolean, data: { message: string } } -``` - ---- - -## TypeScript Types - -```typescript -// Request Types -interface ConfigCreate { - name: string; // 1-128 chars, unique per project - description?: string | null; // max 512 chars - config_blob: ConfigBlob; - commit_message?: string | null; // max 512 chars -} - -interface ConfigUpdate { - name?: string | null; // 1-128 chars - description?: string | null; // max 512 chars -} - -interface ConfigVersionCreate { - config_blob: ConfigBlob; - commit_message?: string | null; // max 512 chars -} - -interface ConfigBlob { - completion: CompletionConfig; -} - -interface CompletionConfig { - provider: "openai"; // currently only "openai" - params: Record; // provider-specific params (model, temperature, etc.) -} - -// Response Types -interface ConfigPublic { - id: string; // UUID - name: string; - description: string | null; - project_id: number; - inserted_at: string; // ISO datetime - updated_at: string; // ISO datetime -} - -interface ConfigWithVersion extends ConfigPublic { - version: ConfigVersionPublic; -} - -interface ConfigVersionPublic { - id: string; // UUID - config_id: string; // UUID - version: number; // starts at 1, auto-increments - config_blob: Record; - commit_message: string | null; - inserted_at: string; - updated_at: string; -} - -interface ConfigVersionItems { - id: string; // UUID - config_id: string; // UUID - version: number; - commit_message: string | null; - inserted_at: string; - updated_at: string; - // Note: config_blob excluded for list performance -} - -interface APIResponse { - success: boolean; - data: T | null; - error?: string | null; - metadata?: Record | null; -} -``` - ---- - -## Example config_blob - -```json -{ - "completion": { - "provider": "openai", - "params": { - "model": "gpt-4o-mini", - "instructions": "You are a helpful assistant...", - "temperature": 1, - "tools": [ - { - "type": "file_search", - "knowledge_base_ids": ["vs_692d71f3f5708191b1c46525f3c1e196"], - "max_num_results": 20 - } - ] - } - } -} -``` - ---- - -## UI Implementation Notes - -1. **Config List View**: Display name, description, updated_at. Click to view versions. - -2. **Config Create Form**: - - name (required, unique) - - description (optional) - - config_blob JSON editor or structured form - - commit_message (optional, for initial version) - -3. **Version History View**: - - Show versions in descending order (newest first) - - Display version number, commit_message, timestamps - - Click version to view full config_blob - -4. **Create New Version**: - - Load current version's config_blob as starting point - - Allow editing config_blob - - Add commit_message to describe changes - - Auto-increments version number - -5. **Diff View** (optional enhancement): - - Compare config_blob between versions - - Highlight changes - -6. **Error Handling**: - - 422: Validation errors (check response.error) - - Duplicate name error when creating config diff --git a/instructions/TESTING_MOCK_DATA.md b/instructions/TESTING_MOCK_DATA.md deleted file mode 100644 index f62f758..0000000 --- a/instructions/TESTING_MOCK_DATA.md +++ /dev/null @@ -1,222 +0,0 @@ -# Testing with Mock Evaluation Data - -This guide explains how to test the new evaluation report UI with mock data. - -## Quick Start - -### Option 1: Using the Test Page (Easiest) - -1. Start the development server: - - ```bash - npm run dev - ``` - -2. Navigate to: **http://localhost:3000/test-evaluation** - -3. Click on either evaluation card to view the mock data - -### Option 2: Direct URL Access - -Navigate directly to the evaluation detail pages: - -- **Evaluation #43 (Hindi)**: http://localhost:3000/evaluations/43 -- **Evaluation #44 (English)**: http://localhost:3000/evaluations/44 - -## Mock Data Files - -Located in `/public/mock-data/`: - -### `evaluation-sample-1.json` (ID: 43) - -- **Language**: Hindi -- **Items**: 4 Q&A pairs -- **Scores**: - - cosine_similarity (NUMERIC) - - SNEHA correctness (NUMERIC) - - llm_judge_relevance (NUMERIC) - - response_category (CATEGORICAL) -- **Features**: Mix of CORRECT, PARTIAL, and INCORRECT responses - -### `evaluation-sample-2.json` (ID: 44) - -- **Language**: English -- **Items**: 3 Q&A pairs -- **Scores**: Same as above -- **Features**: Higher average scores, includes assistant config -- **Special**: 2 CORRECT, 1 PARTIAL (no INCORRECT) - -## What to Test - -### 1. Table View - -- ✅ Question, Answer, Ground Truth columns display properly -- ✅ All score columns appear dynamically -- ✅ Long text truncates with expand/collapse (details/summary) -- ✅ Score values are color-coded (green/yellow/red) -- ✅ Comments appear below scores -- ✅ No trace IDs visible (as requested) -- ✅ Row hover effects work - -### 2. Metrics Overview - -- ✅ All NUMERIC metrics show avg ± std -- ✅ CATEGORICAL metrics show distribution -- ✅ Responsive grid layout -- ✅ Proper formatting (3 decimal places for scores) - -### 3. CSV Export - -- ✅ Click "Export CSV" button -- ✅ File downloads with all columns -- ✅ Q&A pairs and scores included -- ✅ Proper CSV escaping - -### 4. Navigation - -- ✅ Back button returns to /evaluations?tab=results -- ✅ View Config button opens modal -- ✅ Sidebar navigation works - -### 5. Assistant Info - -- ✅ Evaluation #44 shows assistant badge -- ✅ Evaluation #43 shows no assistant - -## Switching Between Mock and Real Data - -### Enable Mock Data (Default) - -In `/app/api/evaluations/[id]/route.ts`: - -```typescript -const USE_MOCK_DATA = true; -``` - -### Disable Mock Data (Use Real Backend) - -```typescript -const USE_MOCK_DATA = false; -``` - -**Note**: After changing this, restart your dev server. - -## ID Mapping - -The mock API maps IDs to files: - -- **ID 43, 1, or any other number** → `evaluation-sample-1.json` -- **ID 44 or 2** → `evaluation-sample-2.json` - -You can modify this mapping in `/app/api/evaluations/[id]/route.ts` - -## Adding More Mock Data - -1. Create a new JSON file in `/public/mock-data/` -2. Follow the structure in existing samples -3. Update the ID mapping in the API route: - -```typescript -let mockFileName = "evaluation-sample-1.json"; -if (id === "44" || id === "2") { - mockFileName = "evaluation-sample-2.json"; -} else if (id === "45") { - mockFileName = "your-new-file.json"; // Add your mapping -} -``` - -## Expected Response Structure - -The mock data follows this structure: - -```json -{ - "id": 43, - "run_name": "...", - "dataset_name": "...", - "status": "completed", - "total_items": 4, - "scores": { - "summary_scores": [ - { - "name": "cosine_similarity", - "avg": 0.453, - "std": 0.06, - "total_pairs": 4, - "data_type": "NUMERIC" - }, - { - "name": "response_category", - "distribution": { "CORRECT": 1, "PARTIAL": 2, "INCORRECT": 1 }, - "total_pairs": 4, - "data_type": "CATEGORICAL" - } - ], - "individual_scores": [ - { - "trace_id": "...", - "input": { "question": "..." }, - "output": { "answer": "..." }, - "metadata": { "ground_truth": "..." }, - "trace_scores": [ - { - "name": "cosine_similarity", - "value": 0.452, - "data_type": "NUMERIC" - }, - { - "name": "response_category", - "value": "INCORRECT", - "data_type": "CATEGORICAL" - } - ] - } - ] - } -} -``` - -## Troubleshooting - -### Mock data not loading - -- Check console for `[MOCK MODE]` logs -- Verify files exist in `/public/mock-data/` -- Ensure `USE_MOCK_DATA = true` - -### Table not showing - -- Check browser console for errors -- Verify `scores.individual_scores` exists in JSON -- Check that all required fields are present - -### Scores not color-coded - -- Verify `data_type` is set correctly -- Check that NUMERIC values are numbers, not strings -- Ensure CATEGORICAL values match expected values - -## Production Deployment - -**IMPORTANT**: Before deploying to production: - -1. Set `USE_MOCK_DATA = false` in `/app/api/evaluations/[id]/route.ts` -2. Delete or hide `/app/test-evaluation/page.tsx` (optional) -3. Test with real backend to ensure everything works - -## Next Steps - -After testing with mock data and confirming the UI works: - -1. Update the backend API to return the new structure -2. Set `USE_MOCK_DATA = false` -3. Test with real evaluation data -4. Deploy to production - ---- - -**Need Help?** Check the implementation files: - -- Type definitions: `/app/components/types.ts` -- Table component: `/app/components/DetailedResultsTable.tsx` -- Detail page: `/app/evaluations/[id]/page.tsx` diff --git a/instructions/VERCEL_DESIGN_SYSTEM.md b/instructions/VERCEL_DESIGN_SYSTEM.md deleted file mode 100644 index b1c1ba9..0000000 --- a/instructions/VERCEL_DESIGN_SYSTEM.md +++ /dev/null @@ -1,708 +0,0 @@ -# Vercel/shadcn Design System Aesthetics - -A comprehensive guide to reproducing the minimalist, modern design aesthetic inspired by Vercel and shadcn/ui. - -## Philosophy - -**Minimalism First**: Every element serves a purpose. No decorative flourishes, no unnecessary effects. The design is invisible until it needs to be visible. - -**Subtle Interactions**: Transitions are quick (0.15-0.2s) and purposeful. Hover states provide immediate feedback without being distracting. - -**Hierarchy Through Restraint**: Visual hierarchy comes from careful use of weight, spacing, and subtle color variations—not bold colors or heavy effects. - ---- - -## Color Palette - -### Core Colors - -**Light Mode** - -``` -Backgrounds: -- Primary: #ffffff (pure white) -- Secondary: #fafafa (barely-there gray) - -Text: -- Primary: #171717 (near-black, not pure black) -- Secondary: #737373 (muted gray for less important text) - -Borders: -- Standard: #e5e5e5 (very light gray, barely visible) - -Accent: -- Primary: #171717 (same as text primary—unified system) -- Hover: #404040 (slightly lighter on hover) -``` - -**Dark Mode** - -``` -Backgrounds: -- Primary: #000000 (pure black) -- Secondary: #0a0a0a (barely-there lighter) - -Text: -- Primary: #ededed (off-white) -- Secondary: #a1a1a1 (muted gray) - -Borders: -- Standard: #262626 (subtle dark gray) -``` - -### Semantic Colors - -Used sparingly for status and feedback: - -``` -Success: #16a34a (green-600) -Error: #dc2626 (red-600) -Warning: #f59e0b (amber-500) -``` - -### Color Usage Rules - -1. **Never use pure black (#000) for text** in light mode—use #171717 instead -2. **Borders should be barely visible**—#e5e5e5 is the standard -3. **Background variations are subtle**—primary (#fff) vs secondary (#fafafa) -4. **Accent colors match text colors**—creates unified, cohesive system -5. **Status colors only appear when needed**—success/error states - ---- - -## Typography - -### Font Stack - -- **Sans-serif**: System font stack or Geist Sans (Vercel's font) -- **Monospace**: Geist Mono for code - -### Text Sizing - -``` -Extra Small: 10px (badges, labels) -Small: 12px (secondary UI, submenus) -Base: 14px (primary UI, body text) -Medium: 16px (headings, emphasized text) -Large: 20px+ (page titles, hero text) -``` - -### Font Weights - -``` -Regular: 400 (default text) -Medium: 500 (interactive elements, subheadings) -Semibold: 600 (active states, emphasis) -``` - -### Typography Rules - -1. **Use font weight for hierarchy**, not size differences -2. **Active/selected states use weight 500-600** -3. **Secondary text uses lighter weight AND color** -4. **Letter spacing**: -0.01em for headings (tight tracking) -5. **Line height**: Tight for UI (1.2-1.4), comfortable for body (1.5-1.6) - ---- - -## Spacing System - -### Scale (based on 4px grid) - -``` -0.5 → 2px (tight gaps) -1 → 4px (minimal spacing) -1.5 → 6px (small gaps) -2 → 8px (standard small) -2.5 → 10px (compact spacing) -3 → 12px (standard medium) -4 → 16px (comfortable spacing) -5 → 20px (generous spacing) -6 → 24px (section spacing) -``` - -### Padding Patterns - -``` -Buttons: px-3 py-2 (12px × 8px) -Inputs: px-3 py-2 (12px × 8px) -Cards: p-4 to p-6 (16px-24px) -Containers: px-6 py-6 (24px all sides) -Sections: py-8 to py-12 (32px-48px vertical) -``` - -### Margin Patterns - -``` -Between elements: 8-12px (space-y-2 to space-y-3) -Between sections: 24-32px (my-6 to my-8) -Page margins: 24px minimum (px-6) -``` - ---- - -## Components - -### Buttons - -**Primary Button** - -``` -Background: #171717 -Text: #ffffff -Padding: 12px 16px -Border: none -Radius: 6px -Font: 14px, weight 500 -Transition: all 0.2s ease - -Hover: -- Background: #404040 -- No scale/shadow effects - -Disabled: -- Background: #e5e5e5 -- Text: #a1a1a1 -- Cursor: not-allowed -``` - -**Secondary Button** - -``` -Background: transparent -Text: #171717 -Border: 1px solid #e5e5e5 -Padding: 12px 16px -Radius: 6px -Font: 14px, weight 500 - -Hover: -- Background: #fafafa -- Border: #d4d4d4 -``` - -**Ghost Button** - -``` -Background: transparent -Text: #737373 -Border: none -Padding: 8px 12px - -Hover: -- Text: #171717 -- Background: #fafafa -``` - -### Input Fields - -``` -Background: #ffffff -Border: 1px solid #e5e5e5 -Padding: 12px -Radius: 6px -Font: 14px -Text: #171717 - -Focus: -- Border: #171717 -- No glow/shadow -- Outline: none (use border instead) - -Placeholder: -- Color: #a1a1a1 -- Font style: normal (not italic) -``` - -### Cards - -``` -Background: #ffffff -Border: 1px solid #e5e5e5 -Radius: 8px -Padding: 16-24px -Shadow: none (or very subtle: 0 1px 2px rgba(0,0,0,0.05)) - -Hover (if interactive): -- Border: #d4d4d4 -- No shadow increase -``` - -### Navigation Items - -**Sidebar Item** - -``` -Default: -- Background: transparent -- Text: #737373 -- Font weight: 400-500 -- Padding: 8px 12px -- Radius: 6px - -Hover: -- Background: #ffffff (or primary bg) -- Text: #171717 - -Active: -- Background: #ffffff -- Text: #171717 -- Font weight: 600 -- Border: 1px solid #e5e5e5 -``` - -**Tab Navigation** - -``` -Default: -- Border bottom: 2px transparent -- Text: #737373 -- Font weight: 400 -- Padding: 12px 16px - -Active: -- Border bottom: 2px #171717 -- Text: #171717 -- Font weight: 500 -``` - -### Badges/Pills - -``` -Background: #fafafa -Text: #171717 -Padding: 4px 8px -Radius: 4px (fully rounded: 999px) -Font: 11-12px -Font weight: 500 - -Status Variants: -- Success: bg #dcfce7, text #15803d -- Error: bg #fee2e2, text #dc2626 -- Warning: bg #fef3c7, text #92400e -``` - -### Modals/Dialogs - -``` -Backdrop: -- Background: rgba(0, 0, 0, 0.4) -- Animation: fade in 0.2s - -Container: -- Background: #ffffff -- Border: 1px solid #e5e5e5 -- Radius: 12px -- Padding: 24px -- Max width: 500px -- Shadow: 0 4px 12px rgba(0, 0, 0, 0.1) -- Animation: fade + scale (0.95 → 1.0) 0.3s - -Close button: -- Position: top-right -- Size: 32px -- Icon: X mark -- Color: #737373 -- Hover: #171717 -``` - -### Tables - -``` -Container: -- Border: 1px solid #e5e5e5 -- Radius: 8px -- Overflow: hidden - -Header: -- Background: #fafafa -- Text: #171717 -- Font weight: 600 -- Padding: 12px 16px -- Border bottom: 1px solid #e5e5e5 - -Row: -- Background: #ffffff -- Border bottom: 1px solid #e5e5e5 -- Padding: 12px 16px - -Row Hover: -- Background: #fafafa - -Last row: -- No border bottom -``` - ---- - -## Layout Patterns - -### Sidebar Navigation - -``` -Width: 240px -Background: #fafafa -Border: 1px solid #e5e5e5 (right) -Height: 100vh -Flex: column - -Collapse: -- Width: 0px -- Overflow: hidden -- Transition: 0.3s ease -``` - -### Page Container - -``` -Max width: 1280px (or 100% for full-width) -Padding: 24px -Margin: 0 auto -``` - -### Content Sections - -``` -Background: #ffffff -Border: 1px solid #e5e5e5 -Radius: 8px -Padding: 24px -Margin: 16px 0 -``` - ---- - -## Animation & Transitions - -### Timing Functions - -``` -Standard: ease-in-out -Quick: ease (for micro-interactions) -Entry: ease-out -Exit: ease-in -``` - -### Duration Scale - -``` -Instant: 50ms (color changes) -Quick: 150ms (hover states, text color) -Standard: 200ms (backgrounds, borders) -Medium: 300ms (modals, drawers) -Slow: 500ms (layout changes) -``` - -### Common Animations - -**Fade In** - -```css -@keyframes fadeIn { - from { - opacity: 0; - transform: translateY(-4px); - } - to { - opacity: 1; - transform: translateY(0); - } -} -duration: 0.2s; -``` - -**Modal Entry** - -```css -@keyframes modalSlideUp { - from { - opacity: 0; - transform: translateY(20px) scale(0.95); - } - to { - opacity: 1; - transform: translateY(0) scale(1); - } -} -duration: 0.3s; -``` - -**Page Transition** - -```css -@keyframes pageIn { - from { - opacity: 0; - transform: translateY(8px); - } - to { - opacity: 1; - transform: translateY(0); - } -} -duration: 0.3s; -``` - -### Animation Rules - -1. **Hover transitions are 150-200ms**—fast enough to feel instant -2. **No easing curves longer than cubic-bezier**—keep it simple -3. **Entrance animations are subtle**—4-8px movement max -4. **Never animate on exit unless closing**—just fade out -5. **No bounce, elastic, or attention-seeking effects** - ---- - -## Interaction Patterns - -### Hover States - -**General Rules** - -- Background lightens slightly (#fafafa) -- Text darkens to primary color (#171717) -- Border darkens one shade -- No scale/transform effects -- Transition: 150ms - -### Focus States - -**Keyboard Navigation** - -- Use border color change, not glow -- Border: 2px solid #171717 -- No box-shadow outline -- Visible and clear - -### Active/Pressed States - -**On Click** - -- Slightly darker background -- No scale down -- 100ms transition (faster than hover) - -### Loading States - -**Skeleton Loaders** - -``` -Background: #fafafa -Animation: pulse (opacity 1 → 0.5 → 1) -Duration: 2s infinite -Border: same as element would have -Radius: match final element -``` - -**Spinners** - -``` -Size: 16-24px -Color: #171717 -Animation: spin 1s linear infinite -Line width: 2px -``` - ---- - -## Iconography - -### Icon Style - -- **Outline style** (not filled) -- **2px stroke width** -- **24px default size** (scale down to 16px for compact UI) -- **Rounded line caps and joins** -- **Match text color** of surrounding context - -### Icon Spacing - -- **Gap from text**: 8-10px (0.5rem to 0.625rem) -- **Icon-only buttons**: 32px × 32px touch target minimum - ---- - -## Shadows (Use Sparingly) - -``` -None: (default—no shadow) -Subtle: 0 1px 2px rgba(0, 0, 0, 0.05) -Light: 0 1px 3px rgba(0, 0, 0, 0.1) -Medium: 0 4px 6px rgba(0, 0, 0, 0.1) -Heavy: 0 10px 15px rgba(0, 0, 0, 0.1) -``` - -**When to Use Shadows** - -- Modals/dialogs: medium -- Dropdown menus: light -- Cards: none or subtle -- Buttons: never -- Popovers: light - ---- - -## Border Radius Scale - -``` -Small: 4px (badges, pills) -Default: 6px (buttons, inputs) -Medium: 8px (cards, containers) -Large: 12px (modals, large panels) -Full: 9999px (circular buttons, pills) -``` - ---- - -## Responsive Breakpoints - -``` -Mobile: < 640px -Tablet: 640px - 1024px -Desktop: 1024px+ -Wide: 1280px+ -``` - -### Mobile Adaptations - -- Reduce padding: 16px instead of 24px -- Collapse sidebar to overlay/drawer -- Stack horizontal layouts vertically -- Reduce font sizes slightly (13px base instead of 14px) -- Increase touch targets to 44px minimum - ---- - -## Dark Mode Considerations - -### Automatic Switching - -```css -@media (prefers-color-scheme: dark) { - /* Apply dark theme */ -} -``` - -### Dark Mode Colors - -**Backgrounds** - -- Pure black (#000) for drama -- Slightly lighter (#0a0a0a) for panels -- Very subtle borders (#262626) - -**Text** - -- Off-white (#ededed) not pure white -- Gray (#a1a1a1) for secondary - -**Borders** - -- Much darker but still subtle (#262626) - -**Key Difference**: Dark mode has higher contrast between elements to maintain readability. - ---- - -## Common Mistakes to Avoid - -1. ❌ **Heavy drop shadows**—use subtle borders instead -2. ❌ **Bold accent colors**—keep it monochrome with rare color use -3. ❌ **Complex gradients**—solid colors only -4. ❌ **Slow animations**—keep everything under 300ms -5. ❌ **Scale/transform on hover**—just color/background changes -6. ❌ **Too much border radius**—8px is usually the max -7. ❌ **Pure black text**—use #171717 in light mode -8. ❌ **Thick borders**—1px is standard, 2px for focus only -9. ❌ **Colorful UI elements**—status colors only when needed -10. ❌ **Overly tight spacing**—respect the 4px grid - ---- - -## Design Checklist - -When implementing a new component, ensure: - -- [ ] Uses colors from centralized palette -- [ ] Border is 1px solid #e5e5e5 (or transparent) -- [ ] Border radius is 6-8px -- [ ] Padding follows 4px grid -- [ ] Font size is 14px (or 12px for compact) -- [ ] Font weight is 400-600 range -- [ ] Hover transition is 150-200ms -- [ ] No drop shadows (except modals) -- [ ] Text color is #171717 or #737373 -- [ ] Background is #ffffff or #fafafa -- [ ] Icons are 16-24px outline style -- [ ] Touch targets are 32px+ for interactive elements -- [ ] Animation is subtle and quick -- [ ] Responsive on mobile (16px padding minimum) - ---- - -## Implementation Notes - -### CSS Variables Approach - -```css -:root { - --bg-primary: #ffffff; - --bg-secondary: #fafafa; - --text-primary: #171717; - --text-secondary: #737373; - --border: #e5e5e5; - --radius: 8px; - --transition: 0.2s ease; -} -``` - -### Tailwind CSS Approach - -```javascript -// tailwind.config.js -theme: { - colors: { - bg: { primary: '#ffffff', secondary: '#fafafa' }, - text: { primary: '#171717', secondary: '#737373' }, - border: '#e5e5e5', - }, - borderRadius: { - DEFAULT: '6px', - lg: '8px', - xl: '12px', - }, - transitionDuration: { - DEFAULT: '200ms', - fast: '150ms', - } -} -``` - ---- - -## Inspiration Sources - -- **Vercel Dashboard**: vercel.com/dashboard -- **shadcn/ui**: ui.shadcn.com -- **Linear**: linear.app -- **GitHub**: github.com (2023+ design) -- **Raycast**: raycast.com - ---- - -## Summary - -The Vercel/shadcn aesthetic is defined by: - -1. **Extreme minimalism**—every pixel has purpose -2. **Near-monochrome palette**—black, white, grays -3. **Subtle borders and backgrounds**—barely visible until needed -4. **Quick, purposeful transitions**—150-200ms standard -5. **Typography-driven hierarchy**—weight and spacing over color -6. **No decorative effects**—no shadows, gradients, or transforms -7. **System fonts**—fast loading, native feel -8. **Generous whitespace**—let content breathe -9. **Status colors used sparingly**—only when semantically needed -10. **Dark mode as first-class**—not an afterthought - -This creates interfaces that feel fast, professional, and get out of the user's way. diff --git a/public/mock-data/evaluation-sample-1.json b/public/mock-data/evaluation-sample-1.json deleted file mode 100644 index 75dc123..0000000 --- a/public/mock-data/evaluation-sample-1.json +++ /dev/null @@ -1,211 +0,0 @@ -{ - "id": 43, - "run_name": "Hindi FAQ Evaluation - Run 1", - "dataset_name": "hindi_policy_qa_5_rows", - "config": { - "model": "gpt-4", - "instructions": "You are a helpful FAQ assistant for policy questions.", - "temperature": 0.7 - }, - "assistant_id": null, - "dataset_id": 50, - "batch_job_id": 71, - "embedding_batch_job_id": 72, - "status": "completed", - "object_store_url": "s3://ai-platform-documents-staging/evaluations/43", - "total_items": 4, - "scores": { - "summary_scores": [ - { - "name": "cosine_similarity", - "avg": 0.45267303673682135, - "std": 0.06016189626290471, - "total_pairs": 4, - "data_type": "NUMERIC" - }, - { - "name": "SNEHA correctness", - "avg": 0.25, - "std": 0.4330127018922193, - "total_pairs": 4, - "data_type": "NUMERIC" - }, - { - "name": "llm_judge_relevance", - "avg": 0.75, - "std": 0.25, - "total_pairs": 4, - "data_type": "NUMERIC" - }, - { - "name": "response_category", - "distribution": { - "CORRECT": 1, - "PARTIAL": 2, - "INCORRECT": 1 - }, - "total_pairs": 4, - "data_type": "CATEGORICAL" - } - ], - "individual_scores": [ - { - "trace_id": "97ec280e-883e-450e-a3e9-6cd8cfc0741a", - "input": { - "question": "सीएलएफ में उपसमिति के कार्य की समीक्षा कौन करता है?" - }, - "output": { - "answer": "सीएलएफ (CLF) में उपसमिति के कार्य की समीक्षा मुख्य रूप से सीएलएफ की प्रबंधन समिति या संबंधित अधिकृत प्रतिनिधि द्वारा की जाती है।" - }, - "metadata": { - "ground_truth": "सीएलएफ (Cluster Level Federation) में उपसमिति के कार्य की समीक्षा कार्यकारिणी समिति (Executive Committee) करती है।", - "item_id": "f444e132-6a25-403e-b0fa-9fd0c5452248", - "response_id": "resp_0a2a9f840105b5ca006923fa5855f8819680d7c280c082011f" - }, - "trace_scores": [ - { - "name": "cosine_similarity", - "value": 0.45159810574988546, - "data_type": "NUMERIC" - }, - { - "name": "SNEHA correctness", - "value": 0, - "data_type": "NUMERIC", - "comment": "The generation incorrectly claims there is no information about the specific committee" - }, - { - "name": "llm_judge_relevance", - "value": 0.5, - "data_type": "NUMERIC", - "comment": "Response is somewhat relevant but misses key details" - }, - { - "name": "response_category", - "value": "INCORRECT", - "data_type": "CATEGORICAL" - } - ] - }, - { - "trace_id": "f1833206-e9b6-49ac-864c-90dea4076fa9", - "input": { - "question": "ग्राम संगठन की बैठक में कितने सदस्यों की उपस्थिति आवश्यक है?" - }, - "output": { - "answer": "ग्राम संगठन की बैठक में कम से कम 50% सदस्यों की उपस्थिति आवश्यक है। यह गणपूर्ति सुनिश्चित करती है कि निर्णय लोकतांत्रिक और प्रतिनिधि हों।" - }, - "metadata": { - "ground_truth": "ग्राम संगठन की बैठक में कम से कम 50% सदस्यों की उपस्थिति आवश्यक है।", - "item_id": "a234b567-8901-234c-567d-890123456789", - "response_id": "resp_1b3c5d7e9f11a13b15c17d19e21f23a25b27c29d31e33f35a37b" - }, - "trace_scores": [ - { - "name": "cosine_similarity", - "value": 0.4654791998066623, - "data_type": "NUMERIC" - }, - { - "name": "SNEHA correctness", - "value": 1, - "data_type": "NUMERIC", - "comment": "Response correctly identifies all key points" - }, - { - "name": "llm_judge_relevance", - "value": 1, - "data_type": "NUMERIC", - "comment": "Highly relevant and accurate response" - }, - { - "name": "response_category", - "value": "CORRECT", - "data_type": "CATEGORICAL" - } - ] - }, - { - "trace_id": "b4a85769-96e2-402b-ab79-c1338e85c8b1", - "input": { - "question": "स्वयं सहायता समूह में कितनी महिलाएं होनी चाहिए?" - }, - "output": { - "answer": "स्वयं सहायता समूह में आमतौर पर 10-20 महिलाएं होनी चाहिए। यह आदर्श संख्या समूह की प्रभावी कार्यप्रणाली के लिए उपयुक्त मानी जाती है।" - }, - "metadata": { - "ground_truth": "स्वयं सहायता समूह में 10-20 महिलाएं होनी चाहिए। यह आदर्श संख्या प्रभावी संचालन और प्रबंधन के लिए उपयुक्त है। **Policy Source:** SHG Formation Guidelines", - "item_id": "c345d678-9012-345e-678f-901234567890", - "response_id": "resp_2c4d6e8f10a12b14c16d18e20f22a24b26c28d30e32f34a36c" - }, - "trace_scores": [ - { - "name": "cosine_similarity", - "value": 0.5311989216166836, - "data_type": "NUMERIC" - }, - { - "name": "SNEHA correctness", - "value": 0, - "data_type": "NUMERIC", - "comment": "Missing policy source and reference" - }, - { - "name": "llm_judge_relevance", - "value": 0.75, - "data_type": "NUMERIC", - "comment": "Good coverage but incomplete" - }, - { - "name": "response_category", - "value": "PARTIAL", - "data_type": "CATEGORICAL" - } - ] - }, - { - "trace_id": "51c505d8-efe0-454a-a339-f42a13fe94b8", - "input": { - "question": "समुदाय संसाधन व्यक्ति (CRP) का मुख्य कार्य क्या है?" - }, - "output": { - "answer": "समुदाय संसाधन व्यक्ति (CRP) का मुख्य कार्य समूहों को प्रशिक्षण देना और उन्हें तकनीकी सहायता प्रदान करना है।" - }, - "metadata": { - "ground_truth": "समुदाय संसाधन व्यक्ति (CRP) का मुख्य कार्य SHG सदस्यों को प्रशिक्षण, क्षमता निर्माण, और सामुदायिक संगठन में सहायता प्रदान करना है। वे फील्ड-स्तरीय सहायता और मेंटरशिप भी प्रदान करते हैं।", - "item_id": "d456e789-0123-456f-789g-012345678901", - "response_id": "resp_3d5e7f9g11a13b15c17d19e21f23a25b27c29d31e33f35a37d" - }, - "trace_scores": [ - { - "name": "cosine_similarity", - "value": 0.36241591977405424, - "data_type": "NUMERIC" - }, - { - "name": "SNEHA correctness", - "value": 0, - "data_type": "NUMERIC", - "comment": "Factually incomplete - misses key responsibilities" - }, - { - "name": "llm_judge_relevance", - "value": 0.5, - "data_type": "NUMERIC", - "comment": "Tangentially related but misses main point" - }, - { - "name": "response_category", - "value": "PARTIAL", - "data_type": "CATEGORICAL" - } - ] - } - ] - }, - "error_message": null, - "organization_id": 1, - "project_id": 1, - "inserted_at": "2025-11-17T11:07:44.609916", - "updated_at": "2025-11-17T11:18:44.235194" -} diff --git a/public/mock-data/evaluation-sample-2.json b/public/mock-data/evaluation-sample-2.json deleted file mode 100644 index bff4d0f..0000000 --- a/public/mock-data/evaluation-sample-2.json +++ /dev/null @@ -1,173 +0,0 @@ -{ - "id": 44, - "run_name": "English FAQ Evaluation - Test Run", - "dataset_name": "english_policy_qa_3_rows", - "config": { - "model": "gpt-4-turbo", - "instructions": "You are a helpful assistant answering policy-related questions.", - "temperature": 0.3 - }, - "assistant_id": "asst_abc123xyz", - "dataset_id": 51, - "batch_job_id": 73, - "embedding_batch_job_id": 74, - "status": "completed", - "object_store_url": "s3://ai-platform-documents-staging/evaluations/44", - "total_items": 3, - "scores": { - "summary_scores": [ - { - "name": "cosine_similarity", - "avg": 0.782, - "std": 0.123, - "total_pairs": 3, - "data_type": "NUMERIC" - }, - { - "name": "SNEHA correctness", - "avg": 0.667, - "std": 0.471, - "total_pairs": 3, - "data_type": "NUMERIC" - }, - { - "name": "llm_judge_relevance", - "avg": 0.833, - "std": 0.236, - "total_pairs": 3, - "data_type": "NUMERIC" - }, - { - "name": "response_category", - "distribution": { - "CORRECT": 2, - "PARTIAL": 1, - "INCORRECT": 0 - }, - "total_pairs": 3, - "data_type": "CATEGORICAL" - } - ], - "individual_scores": [ - { - "trace_id": "aaa11111-1111-1111-1111-111111111111", - "input": { - "question": "What is the minimum attendance required for a Village Organization meeting?" - }, - "output": { - "answer": "The minimum attendance required for a Village Organization meeting is 50% of the total members. This quorum ensures that decisions are representative and democratic." - }, - "metadata": { - "ground_truth": "A minimum of 50% of members must be present for a Village Organization meeting to proceed with decision-making. This is specified in the Community Operational Manual.", - "item_id": "item-001", - "response_id": "resp-eng-001" - }, - "trace_scores": [ - { - "name": "cosine_similarity", - "value": 0.89, - "data_type": "NUMERIC" - }, - { - "name": "SNEHA correctness", - "value": 1, - "data_type": "NUMERIC", - "comment": "Accurate and complete response" - }, - { - "name": "llm_judge_relevance", - "value": 1, - "data_type": "NUMERIC", - "comment": "Fully relevant with additional context" - }, - { - "name": "response_category", - "value": "CORRECT", - "data_type": "CATEGORICAL" - } - ] - }, - { - "trace_id": "bbb22222-2222-2222-2222-222222222222", - "input": { - "question": "How many women should be in a Self Help Group?" - }, - "output": { - "answer": "A Self Help Group should typically have between 10 to 20 women members." - }, - "metadata": { - "ground_truth": "A Self Help Group (SHG) should have 10-20 women members for effective operation and management.", - "item_id": "item-002", - "response_id": "resp-eng-002" - }, - "trace_scores": [ - { - "name": "cosine_similarity", - "value": 0.85, - "data_type": "NUMERIC" - }, - { - "name": "SNEHA correctness", - "value": 1, - "data_type": "NUMERIC", - "comment": "Correct range provided" - }, - { - "name": "llm_judge_relevance", - "value": 0.9, - "data_type": "NUMERIC", - "comment": "Directly answers the question" - }, - { - "name": "response_category", - "value": "CORRECT", - "data_type": "CATEGORICAL" - } - ] - }, - { - "trace_id": "ccc33333-3333-3333-3333-333333333333", - "input": { - "question": "What are the responsibilities of a Community Resource Person?" - }, - "output": { - "answer": "A Community Resource Person (CRP) is responsible for providing training and technical support to groups." - }, - "metadata": { - "ground_truth": "A Community Resource Person (CRP) provides training, capacity building, field-level support, mentorship, and assists in community organization activities for SHG members.", - "item_id": "item-003", - "response_id": "resp-eng-003" - }, - "trace_scores": [ - { - "name": "cosine_similarity", - "value": 0.601, - "data_type": "NUMERIC" - }, - { - "name": "SNEHA correctness", - "value": 0, - "data_type": "NUMERIC", - "comment": "Incomplete - missing key responsibilities like mentorship and capacity building" - }, - { - "name": "llm_judge_relevance", - "value": 0.6, - "data_type": "NUMERIC", - "comment": "Partially relevant but lacks detail" - }, - { - "name": "response_category", - "value": "PARTIAL", - "data_type": "CATEGORICAL" - } - ] - } - ] - }, - "error_message": null, - "organization_id": 1, - "project_id": 1, - "inserted_at": "2025-11-18T09:30:15.123456", - "updated_at": "2025-11-18T09:42:30.654321" -}