Most LLM APIs lock you into a single provider. If you want to switch from OpenAI to Anthropic or Google, you rewrite your integration, change authentication, and update response parsing. OpenGradient gives you a single unified API that wraps every major provider -- swap one enum value and everything else stays the same.
But the real differentiator is settlement. Every inference call settles on-chain via the x402 payment protocol, producing a cryptographic receipt you can use for compliance, billing audits, or dispute resolution. You choose how much data goes on-chain: just a hash (privacy), a batch digest (cost savings), or full metadata (complete transparency).
This tutorial walks through the llm.chat() API, covering non-streaming and
streaming responses, multi-provider switching, settlement modes, and function calling
-- all in one place.
pip install opengradientExport your OpenGradient private key:
export OG_PRIVATE_KEY="0x..."Faucet: Get free OPG tokens on Base Sepolia at https://faucet.opengradient.ai/
All x402 LLM payments currently settle on Base Sepolia using OPG tokens. If you see x402 payment errors, make sure your wallet has sufficient OPG on Base Sepolia.
Start with the simplest possible call -- send a message and get a response. Before
making any LLM calls, ensure sufficient OPG token allowance for the x402 payment
protocol using ensure_opg_approval. This only sends a transaction when the
current allowance drops below the threshold.
import asyncio
import os
import sys
import opengradient as og
private_key = os.environ.get("OG_PRIVATE_KEY")
if not private_key:
print("Error: set the OG_PRIVATE_KEY environment variable.")
sys.exit(1)
llm = og.LLM(private_key=private_key)
# Ensure sufficient OPG allowance for x402 payments (only sends tx when below threshold).
llm.ensure_opg_approval(min_allowance=5)
async def main():
result = await llm.chat(
model=og.TEE_LLM.GPT_5,
messages=[{"role": "user", "content": "What is the x402 payment protocol?"}],
max_tokens=200,
temperature=0.0,
)
# result is a TextGenerationOutput dataclass
print(result.chat_output.get("content", "")) # The model's text response
print(result.finish_reason) # "stop", "length", or "tool_calls"
print(result.payment_hash) # On-chain x402 receipt
asyncio.run(main())The chat_output dictionary follows the OpenAI message format: it has role,
content, and optionally tool_calls keys. The payment_hash is your on-chain
settlement proof -- every call gets one.
The model parameter accepts any og.TEE_LLM enum value. Swap the model and
everything else -- message format, authentication, response parsing -- stays
identical.
# All LLM methods are async -- use await in an async function
# OpenAI
result_openai = await llm.chat(
model=og.TEE_LLM.GPT_5,
messages=[{"role": "user", "content": "Hello from OpenAI!"}],
max_tokens=100,
)
# Anthropic
result_anthropic = await llm.chat(
model=og.TEE_LLM.CLAUDE_SONNET_4_6,
messages=[{"role": "user", "content": "Hello from Anthropic!"}],
max_tokens=100,
)
# Google
result_google = await llm.chat(
model=og.TEE_LLM.GEMINI_2_5_FLASH,
messages=[{"role": "user", "content": "Hello from Google!"}],
max_tokens=100,
)
# xAI
result_xai = await llm.chat(
model=og.TEE_LLM.GROK_4,
messages=[{"role": "user", "content": "Hello from xAI!"}],
max_tokens=100,
)This makes A/B testing trivial -- run the same prompt across providers and compare quality, latency, and cost without changing any infrastructure.
For chat UIs and real-time applications, pass stream=True to get tokens as they
are generated. The return value changes from a TextGenerationOutput to an async
generator that yields StreamChunk objects.
stream = await llm.chat(
model=og.TEE_LLM.GPT_5,
messages=[
{"role": "system", "content": "You are a concise technical writer."},
{"role": "user", "content": "Explain TEEs in one paragraph."},
],
max_tokens=300,
temperature=0.0,
stream=True,
)
async for chunk in stream:
# Each chunk has a choices list. The first choice's delta
# contains the incremental content for this token.
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="", flush=True)
# The final chunk has a finish_reason and optional usage stats.
if chunk.is_final:
print(f"\n\nModel: {chunk.model}")
if chunk.usage:
print(f"Tokens used: {chunk.usage.total_tokens}")The StreamChunk dataclass has these fields:
| Field | Type | Description |
|---|---|---|
choices |
List[StreamChoice] |
Incremental choices (usually one) |
model |
str |
Model identifier |
usage |
StreamUsage or None |
Token counts (final chunk only) |
is_final |
bool |
True when the stream is ending |
Each StreamChoice contains a StreamDelta with optional content, role, and
tool_calls fields.
Every LLM call settles on-chain. The x402_settlement_mode parameter controls the
privacy/cost/transparency trade-off:
| Mode | On-Chain Data | Use Case |
|---|---|---|
PRIVATE |
Input/output hashes only | Privacy -- prove execution without revealing content |
BATCH_HASHED |
Batch digest of multiple calls | Cost efficiency -- lower gas per inference (default) |
INDIVIDUAL_FULL |
Full model, input, output, metadata | Transparency -- complete audit trail |
# Privacy-first: only hashes stored on-chain
result_private = await llm.chat(
model=og.TEE_LLM.CLAUDE_SONNET_4_6,
messages=[{"role": "user", "content": "Sensitive query here."}],
max_tokens=100,
x402_settlement_mode=og.x402SettlementMode.PRIVATE,
)
print(f"Payment hash (SETTLE): {result_private.payment_hash}")
# Cost-efficient: batched settlement (this is the default)
result_batch = await llm.chat(
model=og.TEE_LLM.CLAUDE_SONNET_4_6,
messages=[{"role": "user", "content": "Regular query."}],
max_tokens=100,
x402_settlement_mode=og.x402SettlementMode.BATCH_HASHED,
)
print(f"Payment hash (BATCH_HASHED): {result_batch.payment_hash}")
# Full transparency: everything on-chain
result_transparent = await llm.chat(
model=og.TEE_LLM.CLAUDE_SONNET_4_6,
messages=[{"role": "user", "content": "Auditable query."}],
max_tokens=100,
x402_settlement_mode=og.x402SettlementMode.INDIVIDUAL_FULL,
)
print(f"Payment hash (INDIVIDUAL_FULL): {result_transparent.payment_hash}")All three calls return a payment_hash you can look up on-chain. The difference is
how much detail the on-chain record contains. Store these hashes if you need an
audit trail -- they are the on-chain receipts for each inference call.
You can pass tools to llm.chat() in the standard OpenAI function-calling
format. This works with any model that supports tool use.
tools = [
{
"type": "function",
"function": {
"name": "get_token_price",
"description": "Get the current USD price of a cryptocurrency.",
"parameters": {
"type": "object",
"properties": {
"symbol": {
"type": "string",
"description": "Token ticker symbol, e.g. ETH, BTC.",
}
},
"required": ["symbol"],
},
},
}
]
result = await llm.chat(
model=og.TEE_LLM.GEMINI_2_5_FLASH,
messages=[{"role": "user", "content": "What's the current price of ETH?"}],
max_tokens=200,
tools=tools,
tool_choice="auto",
)
if result.chat_output.get("tool_calls"):
# The model decided to call a tool instead of responding with text.
# We check for tool_calls in the message rather than relying on finish_reason,
# since the exact finish_reason string may vary by provider.
for tc in result.chat_output["tool_calls"]:
func = tc["function"]
print(f"Tool: {func['name']}, Args: {func['arguments']}")
else:
print(result.chat_output.get("content", ""))When the model returns tool calls, execute the requested functions locally,
then send the results back in a follow-up llm.chat() call with a "tool"
role message. See Tutorial 3 for a complete multi-turn tool-calling loop.
"""Streaming Multi-Provider Chat -- complete working example."""
import asyncio
import os
import sys
import opengradient as og
# ── Initialize ────────────────────────────────────────────────────────────
private_key = os.environ.get("OG_PRIVATE_KEY")
if not private_key:
print("Error: set the OG_PRIVATE_KEY environment variable.")
sys.exit(1)
llm = og.LLM(private_key=private_key)
# Ensure sufficient OPG allowance for x402 payments (only sends tx when below threshold).
llm.ensure_opg_approval(min_allowance=5)
PROMPT = "Explain what a Trusted Execution Environment is in two sentences."
async def main():
# ── Multi-provider comparison ─────────────────────────────────────────
models = [
("GPT-5", og.TEE_LLM.GPT_5),
("Claude Sonnet 4.6", og.TEE_LLM.CLAUDE_SONNET_4_6),
("Gemini 2.5 Flash", og.TEE_LLM.GEMINI_2_5_FLASH),
("Grok 4", og.TEE_LLM.GROK_4),
]
for name, model in models:
try:
result = await llm.chat(
model=model,
messages=[{"role": "user", "content": PROMPT}],
max_tokens=200,
temperature=0.0,
)
print(f"[{name}] {result.chat_output.get('content', '')}")
print(f" Payment hash: {result.payment_hash}\n")
except Exception as e:
print(f"[{name}] Error: {e}\n")
# ── Streaming ─────────────────────────────────────────────────────────
print("--- Streaming from GPT-5 ---")
stream = await llm.chat(
model=og.TEE_LLM.GPT_5,
messages=[{"role": "user", "content": "What is x402? Keep it under 50 words."}],
max_tokens=100,
stream=True,
)
async for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
print("\n")
# ── Settlement modes ──────────────────────────────────────────────────
for mode_name, mode in [
("PRIVATE", og.x402SettlementMode.PRIVATE),
("BATCH_HASHED", og.x402SettlementMode.BATCH_HASHED),
("INDIVIDUAL_FULL", og.x402SettlementMode.INDIVIDUAL_FULL),
]:
try:
r = await llm.chat(
model=og.TEE_LLM.CLAUDE_SONNET_4_6,
messages=[{"role": "user", "content": "Say hello."}],
max_tokens=50,
x402_settlement_mode=mode,
)
print(f"[{mode_name}] payment_hash={r.payment_hash}")
except Exception as e:
print(f"[{mode_name}] Error: {e}")
# ── Function calling ──────────────────────────────────────────────────
tools = [{
"type": "function",
"function": {
"name": "get_token_price",
"description": "Get the current USD price of a cryptocurrency.",
"parameters": {
"type": "object",
"properties": {
"symbol": {"type": "string", "description": "Token ticker, e.g. ETH."}
},
"required": ["symbol"],
},
},
}]
result = await llm.chat(
model=og.TEE_LLM.GEMINI_2_5_FLASH,
messages=[{"role": "user", "content": "What is the price of ETH?"}],
max_tokens=200,
tools=tools,
tool_choice="auto",
)
if result.chat_output.get("tool_calls"):
for tc in result.chat_output["tool_calls"]:
func = tc["function"]
print(f"Tool call: {func['name']}({func['arguments']})")
else:
print(result.chat_output.get("content", ""))
asyncio.run(main())- Build a chat UI: Use the streaming API with a web framework to build a real-time chat interface backed by verifiable inference.
- Add tool calling: See Tutorial 3 for a full multi-turn agent loop with tool dispatch and result feeding.
- Build an agent: See Tutorial 1 to combine LangChain with on-chain model tools for a fully verifiable AI agent.