React + AI: patterns, streaming, and production readiness

About this repository

This project is a documentation-first reference for React + AI features: how to structure apps so secrets stay on the server, how to stream and cancel responses, and what to verify before production.

A practical guide for building React frontends that talk to LLM providers safely: streaming responses, cancellation, observability, and a production checklist.

Principles
Architecture: keys stay on the server
Minimal server route (Node / Express-style)
Streaming from the client (Fetch + ReadableStream)
Server-Sent Events (SSE) alternative
Cancel streams: Stop button and navigation
Embeddings and caching
Model routing (cheap vs capable)
Cost control and observability
Production checklist

1. Principles

Do	Don’t
Call providers from backend routes or server actions	Expose API keys in the browser bundle
Stream tokens to the UI for long answers	Buffer the full completion in memory on the client before showing anything (unless you have a good reason)
Use `AbortController` so users can stop generation	Leave long-running requests uncancelled
Log metadata (model, latency, rough tokens, cost), not raw prompts by default	Log full prompts/responses without a redaction policy

2. Architecture: keys stay on the server

[Browser: React]  --HTTPS-->  [Your API: Node/Bun/Edge]  --HTTPS-->  [OpenAI / Anthropic / ...]
        |                              |
   no API key                    API key + rate limits

The React app sends user messages (and optional session IDs) to your endpoint. Your server attaches the secret key, enforces auth/rate limits, and optionally logs observability fields.

3. Minimal server route (Node / Express-style)

Below is a pattern only: swap fetch to the provider’s SDK and the exact URL/body headers they require.

// server/chat.js: example shape, not tied to a specific provider
import express from "express";

const app = express();
app.use(express.json({ limit: "256kb" }));

app.post("/api/chat", async (req, res) => {
  const started = Date.now();
  const { messages } = req.body;

  // TODO: auth, rate limit, validate `messages`

  res.setHeader("Content-Type", "text/plain; charset=utf-8");
  res.setHeader("Transfer-Encoding", "chunked");
  // If behind nginx/CDN: ensure buffering is off for this location (see checklist)

  const upstream = await fetch("https://api.provider.example/v1/chat/completions", {
    method: "POST",
    headers: {
      Authorization: `Bearer ${process.env.LLM_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model: "gpt-4.1-mini",
      messages,
      stream: true,
    }),
  });

  if (!upstream.ok || !upstream.body) {
    res.status(502).end("upstream_error");
    return;
  }

  const reader = upstream.body.getReader();
  try {
    while (true) {
      const { done, value } = await reader.read();
      if (done) break;
      res.write(Buffer.from(value));
    }
  } finally {
    reader.releaseLock?.();
    res.end();
    const ms = Date.now() - started;
    // See §9 for structured logging (model, tokens, cost)
    console.log(JSON.stringify({ route: "/api/chat", latencyMs: ms }));
  }
});

app.listen(3001);

Your React app then fetches /api/chat with stream: true on the response (see next section).

4. Streaming from the client (Fetch + `ReadableStream`)

Accumulate assistant text as chunks arrive; drive UI from React state.

// hooks/useChatStream.ts
import { useCallback, useEffect, useRef, useState } from "react";

type Message = { role: "user" | "assistant"; content: string };

export function useChatStream(apiPath = "/api/chat") {
  const [messages, setMessages] = useState<Message[]>([]);
  const [assistant, setAssistant] = useState("");
  const [loading, setLoading] = useState(false);
  const abortRef = useRef<AbortController | null>(null);
  const messagesRef = useRef<Message[]>([]);
  messagesRef.current = messages;

  const send = useCallback(
    async (userText: string) => {
      const userMsg: Message = { role: "user", content: userText };
      const history = [...messagesRef.current, userMsg];

      setMessages(history);
      setAssistant("");
      setLoading(true);

      abortRef.current?.abort();
      abortRef.current = new AbortController();
      const { signal } = abortRef.current;

      let fullAssistant = "";

      try {
        const res = await fetch(apiPath, {
          method: "POST",
          headers: { "Content-Type": "application/json" },
          body: JSON.stringify({
            messages: history.map(({ role, content }) => ({ role, content })),
          }),
          signal,
        });

        if (!res.ok || !res.body) throw new Error(`HTTP ${res.status}`);

        const reader = res.body.getReader();
        const dec = new TextDecoder();

        while (true) {
          const { done, value } = await reader.read();
          if (done) break;
          const chunk = dec.decode(value, { stream: true });
          fullAssistant += chunk;
          // Raw text deltas from server; if you use SSE, parse lines here instead.
          setAssistant(fullAssistant);
        }

        if (fullAssistant.trim()) {
          setMessages((m) => [...m, { role: "assistant", content: fullAssistant }]);
        }
        setAssistant("");
      } catch (e: unknown) {
        if ((e as Error).name === "AbortError") return;
        console.error(e);
      } finally {
        setLoading(false);
        abortRef.current = null;
      }
    },
    [apiPath]
  );

  const stop = useCallback(() => abortRef.current?.abort(), []);

  useEffect(() => {
    return () => abortRef.current?.abort();
  }, []);

  return { messages, assistant, loading, send, stop };
}

5. Server-Sent Events (SSE) alternative

SSE is one frame per event and works well with EventSource for server → client one-way streams. For POST bodies (typical for chat), use fetch + stream (§4) or a small POST that returns an SSE stream ID.

Headers that often matter for SSE through proxies:

Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive

Parse data: ... lines on the client, or use a library that normalizes provider-specific SSE.

6. Cancel streams: Stop button and navigation

Stop button

Wire the stop function from AbortController to a Stop control.

function Chat() {
  const { messages, assistant, loading, send, stop } = useChatStream();
  return (
    <>
      <button type="button" disabled={!loading} onClick={stop}>
        Stop
      </button>
      {/* render messages + streaming assistant */}
    </>
  );
}

Navigate away

The useChatStream example above uses a useEffect cleanup that calls abort() on unmount so in-flight streams stop when the user navigates away.

If your provider supports cancelling the upstream generation (not only closing the HTTP response), call their cancel/disconnect API from the server when the client disconnects. That usually requires passing through request IDs from the provider’s streaming API.

7. Embeddings and caching

Embeddings are deterministic for the same input model: ideal for caching.

Server-side cache keys (example):

import crypto from "node:crypto";

function embeddingCacheKey(model: string, text: string) {
  return `emb:${model}:${crypto.createHash("sha256").update(text).digest("hex")}`;
}

Store results in Redis, your DB, or an LRU in memory (with a max size). Always cap input length and normalize whitespace to avoid cache fragmentation.

8. Model routing (cheap vs capable)

Use small / cheap models for:

intent detection, classification, safety triage
extracting structured JSON with a tight schema
routing (“needs reasoning?” → escalate)

Use larger models for:

multi-step reasoning, long context synthesis, fragile tool use

Pseudo-flow:

user message → classifier (mini) → if hard: reasoning model; else: mini completes

Implement routing on the server so clients cannot override billing-sensitive choices without authorization.

9. Cost control and observability

Minimal viable observability

Per request, log (structured JSON is ideal):

Field	Why
model	Attribution and pricing lookup
Rough token counts	Input/output estimates (or provider usage fields when available)
Latency	SLAs, regressions, timeouts
Estimated cost	Rough daily totals; compare to budgets

Example server-side log line shape:

{
  "event": "llm_completion",
  "model": "gpt-4.1-mini",
  "inputTokensEst": 420,
  "outputTokensEst": 180,
  "latencyMs": 910,
  "costUsdEst": 0.0012,
  "userId": "anon_or_authed",
  "route": "/api/chat"
}

Redaction: by default log hashes or lengths of prompts, not raw text, unless you have a compliance-reviewed pipeline.

Cache deterministic calls

Especially embeddings (§7) and idempotent classification calls with fixed temperature 0.

Route models

Small/cheap for classification; larger for hard reasoning (§8).

Cancel streams

When users navigate away or hit Stop, abort the client fetch and propagate cancellation to the server where possible (§6).

10. Production checklist

API keys only on server (env vars, secret manager; never NEXT_PUBLIC_* for provider keys)
Rate limits + basic abuse controls (per IP / per user / per org)
Streaming works through your CDN / reverse proxy (no surprise buffering), e.g. nginx proxy_buffering off; (and often proxy_cache off;, gzip off; for that location) on the route that streams chat; verify chunked end-to-end with a real client
Client cancel aborts upstream generation where supported (disconnect handlers, provider cancel APIs)
Fallback UI when provider is down (cached copy, graceful message, retry)
Redacted logging policy for prompts/responses
Cost alerts (daily spend thresholds, anomaly detection on token spikes)

Example (nginx): disable buffering for your streaming location so chunks reach the browser promptly.

location /api/chat {
  proxy_pass http://backend;
  proxy_http_version 1.1;
  proxy_set_header Connection "";
  proxy_buffering off;
  proxy_cache off;
  gzip off;
}

Tune paths and upstream names for your stack; Cloudflare and other CDNs have their own streaming/buffering knobs, so test with a slow token stream.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

React + AI: patterns, streaming, and production readiness

About this repository

Table of contents

1. Principles

2. Architecture: keys stay on the server

3. Minimal server route (Node / Express-style)

4. Streaming from the client (Fetch + `ReadableStream`)

5. Server-Sent Events (SSE) alternative

6. Cancel streams: Stop button and navigation

Stop button

Navigate away

7. Embeddings and caching

8. Model routing (cheap vs capable)

9. Cost control and observability

Minimal viable observability

Cache deterministic calls

Route models

Cancel streams

10. Production checklist

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

React + AI: patterns, streaming, and production readiness

About this repository

Table of contents

1. Principles

2. Architecture: keys stay on the server

3. Minimal server route (Node / Express-style)

4. Streaming from the client (Fetch + ReadableStream)

5. Server-Sent Events (SSE) alternative

6. Cancel streams: Stop button and navigation

Stop button

Navigate away

7. Embeddings and caching

8. Model routing (cheap vs capable)

9. Cost control and observability

Minimal viable observability

Cache deterministic calls

Route models

Cancel streams

10. Production checklist

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

4. Streaming from the client (Fetch + `ReadableStream`)

Packages