Skip to content

Commit 4b11ab5

Browse files
feat(openai): add support for shell tool (#9579)
1 parent 4628afd commit 4b11ab5

File tree

7 files changed

+644
-0
lines changed

7 files changed

+644
-0
lines changed

.changeset/kind-jokes-attack.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
"@langchain/openai": minor
3+
---
4+
5+
feat(openai): add support for shell tool

libs/providers/langchain-openai/README.md

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -508,6 +508,85 @@ const response = await llmWithShell.invoke(
508508

509509
For more information, see [OpenAI's Local Shell Documentation](https://platform.openai.com/docs/guides/tools-local-shell).
510510

511+
### Shell Tool
512+
513+
The Shell tool allows models to run shell commands through your integration. Unlike Local Shell, this tool supports executing multiple commands concurrently and is designed for `gpt-5.1`.
514+
515+
> **Security Warning**: Running arbitrary shell commands can be dangerous. Always sandbox execution or add strict allow/deny-lists before forwarding commands to the system shell.
516+
517+
**Use cases**:
518+
519+
- **Automating filesystem or process diagnostics** – e.g., "find the largest PDF under ~/Documents"
520+
- **Extending model capabilities** – Using built-in UNIX utilities, Python runtime, and other CLIs
521+
- **Running multi-step build and test flows** – Chaining commands like `pip install` and `pytest`
522+
- **Complex agentic coding workflows** – Using with `apply_patch` for file operations
523+
524+
```typescript
525+
import { ChatOpenAI, tools } from "@langchain/openai";
526+
import { exec } from "node:child_process/promises";
527+
528+
const model = new ChatOpenAI({ model: "gpt-5.1" });
529+
530+
// With execute callback for automatic command handling
531+
const shellTool = tools.shell({
532+
execute: async (action) => {
533+
const outputs = await Promise.all(
534+
action.commands.map(async (cmd) => {
535+
try {
536+
const { stdout, stderr } = await exec(cmd, {
537+
timeout: action.timeout_ms ?? undefined,
538+
});
539+
return {
540+
stdout,
541+
stderr,
542+
outcome: { type: "exit" as const, exit_code: 0 },
543+
};
544+
} catch (error) {
545+
const timedOut = error.killed && error.signal === "SIGTERM";
546+
return {
547+
stdout: error.stdout ?? "",
548+
stderr: error.stderr ?? String(error),
549+
outcome: timedOut
550+
? { type: "timeout" as const }
551+
: { type: "exit" as const, exit_code: error.code ?? 1 },
552+
};
553+
}
554+
})
555+
);
556+
return {
557+
output: outputs,
558+
maxOutputLength: action.max_output_length,
559+
};
560+
},
561+
});
562+
563+
const llmWithShell = model.bindTools([shellTool]);
564+
const response = await llmWithShell.invoke(
565+
"Find the largest PDF file in ~/Documents"
566+
);
567+
```
568+
569+
**Action properties**: The model returns actions with these properties:
570+
571+
- `commands` - Array of shell commands to execute (can run concurrently)
572+
- `timeout_ms` - Optional timeout in milliseconds (enforce your own limits)
573+
- `max_output_length` - Optional maximum characters to return per command
574+
575+
**Return format**: Your execute function should return a `ShellResult`:
576+
577+
```typescript
578+
interface ShellResult {
579+
output: Array<{
580+
stdout: string;
581+
stderr: string;
582+
outcome: { type: "exit"; exit_code: number } | { type: "timeout" };
583+
}>;
584+
maxOutputLength?: number | null; // Pass back from action if provided
585+
}
586+
```
587+
588+
For more information, see [OpenAI's Shell Documentation](https://platform.openai.com/docs/guides/tools-shell).
589+
511590
### Apply Patch Tool
512591

513592
The Apply Patch tool allows models to propose structured diffs that your integration applies. This enables iterative, multi-step code editing workflows where the model can create, update, and delete files in your codebase.

libs/providers/langchain-openai/src/tools/index.ts

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,16 @@ export type {
6868
LocalShellAction,
6969
} from "./localShell.js";
7070

71+
import { shell } from "./shell.js";
72+
export type {
73+
ShellTool,
74+
ShellOptions,
75+
ShellAction,
76+
ShellResult,
77+
ShellCommandOutput,
78+
ShellCallOutcome,
79+
} from "./shell.js";
80+
7181
import { applyPatch } from "./applyPatch.js";
7282
export type {
7383
ApplyPatchTool,
@@ -86,5 +96,6 @@ export const tools = {
8696
imageGeneration,
8797
computerUse,
8898
localShell,
99+
shell,
89100
applyPatch,
90101
};
Lines changed: 266 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,266 @@
1+
import { OpenAI as OpenAIClient } from "openai";
2+
import { tool } from "@langchain/core/tools";
3+
4+
/**
5+
* Re-export action type from OpenAI SDK for convenience.
6+
* The action contains command details like commands array, timeout, and max output length.
7+
*/
8+
export type ShellAction =
9+
OpenAIClient.Responses.ResponseFunctionShellToolCall.Action;
10+
11+
/**
12+
* Result of a single shell command execution.
13+
* Contains stdout, stderr, and the outcome (exit code or timeout).
14+
*/
15+
export type ShellCommandOutput =
16+
OpenAIClient.Responses.ResponseFunctionShellCallOutputContent;
17+
18+
/**
19+
* Outcome type for shell command execution - either exit with code or timeout.
20+
*/
21+
export type ShellCallOutcome = ShellCommandOutput["outcome"];
22+
23+
/**
24+
* Result of executing shell commands.
25+
* Contains an array of outputs (one per command) and the max_output_length parameter.
26+
*/
27+
export interface ShellResult {
28+
/**
29+
* Array of command outputs. Each entry corresponds to a command from the action.
30+
* The order should match the order of commands in the action.
31+
*/
32+
output: ShellCommandOutput[];
33+
/**
34+
* The max_output_length from the action, which must be passed back to the API.
35+
* If not provided in the action, can be omitted.
36+
*/
37+
maxOutputLength?: number | null;
38+
}
39+
40+
/**
41+
* Options for the Shell tool.
42+
*/
43+
export interface ShellOptions {
44+
/**
45+
* Execute function that handles shell command execution.
46+
* This function receives the action input containing the commands and limits,
47+
* and should return a ShellResult with stdout, stderr, and outcome for each command.
48+
*
49+
* @example
50+
* ```typescript
51+
* execute: async (action) => {
52+
* const outputs = await Promise.all(
53+
* action.commands.map(async (cmd) => {
54+
* try {
55+
* const { stdout, stderr } = await exec(cmd, {
56+
* timeout: action.timeout_ms ?? undefined,
57+
* });
58+
* return {
59+
* stdout,
60+
* stderr,
61+
* outcome: { type: "exit" as const, exit_code: 0 },
62+
* };
63+
* } catch (error) {
64+
* const timedOut = error.killed && error.signal === "SIGTERM";
65+
* return {
66+
* stdout: error.stdout ?? "",
67+
* stderr: error.stderr ?? String(error),
68+
* outcome: timedOut
69+
* ? { type: "timeout" as const }
70+
* : { type: "exit" as const, exit_code: error.code ?? 1 },
71+
* };
72+
* }
73+
* })
74+
* );
75+
* return {
76+
* output: outputs,
77+
* maxOutputLength: action.max_output_length,
78+
* };
79+
* }
80+
* ```
81+
*/
82+
execute: (action: ShellAction) => ShellResult | Promise<ShellResult>;
83+
}
84+
85+
/**
86+
* OpenAI Shell tool type for the Responses API.
87+
*/
88+
export type ShellTool = OpenAIClient.Responses.FunctionShellTool;
89+
90+
const TOOL_NAME = "shell";
91+
92+
/**
93+
* Creates a Shell tool that allows models to run shell commands through your integration.
94+
*
95+
* The shell tool allows the model to interact with your local computer through a controlled
96+
* command-line interface. The model proposes shell commands; your integration executes them
97+
* and returns the outputs. This creates a simple plan-execute loop that lets models inspect
98+
* the system, run utilities, and gather data until they can finish the task.
99+
*
100+
* **Important**: The shell tool is available through the Responses API for use with `GPT-5.1`.
101+
* It is not available on other models, or via the Chat Completions API.
102+
*
103+
* **When to use**:
104+
* - **Automating filesystem or process diagnostics** – For example, "find the largest PDF
105+
* under ~/Documents" or "show running gunicorn processes."
106+
* - **Extending the model's capabilities** – Using built-in UNIX utilities, python runtime
107+
* and other CLIs in your environment.
108+
* - **Running multi-step build and test flows** – Chaining commands like `pip install` and `pytest`.
109+
* - **Complex agentic coding workflows** – Using other tools like `apply_patch` to complete
110+
* workflows that involve complex file operations.
111+
*
112+
* **How it works**:
113+
* The tool operates in a continuous loop:
114+
* 1. Model sends shell commands (`shell_call` with `commands` array)
115+
* 2. Your code executes the commands (can be concurrent)
116+
* 3. You return stdout, stderr, and outcome for each command
117+
* 4. Repeat until the task is complete
118+
*
119+
* **Security Warning**: Running arbitrary shell commands can be dangerous.
120+
* Always sandbox execution or add strict allow/deny-lists before forwarding
121+
* a command to the system shell.
122+
*
123+
* @see {@link https://platform.openai.com/docs/guides/tools-shell | OpenAI Shell Documentation}
124+
* @see {@link https://github.com/openai/codex | Codex CLI} for reference implementation.
125+
*
126+
* @param options - Configuration for the Shell tool
127+
* @returns A Shell tool that can be passed to `bindTools`
128+
*
129+
* @example
130+
* ```typescript
131+
* import { ChatOpenAI, tools } from "@langchain/openai";
132+
* import { exec } from "child_process/promises";
133+
*
134+
* const model = new ChatOpenAI({ model: "gpt-5.1" });
135+
*
136+
* // With execute callback for automatic command handling
137+
* const shellTool = tools.shell({
138+
* execute: async (action) => {
139+
* const outputs = await Promise.all(
140+
* action.commands.map(async (cmd) => {
141+
* try {
142+
* const { stdout, stderr } = await exec(cmd, {
143+
* timeout: action.timeout_ms ?? undefined,
144+
* });
145+
* return {
146+
* stdout,
147+
* stderr,
148+
* outcome: { type: "exit" as const, exit_code: 0 },
149+
* };
150+
* } catch (error) {
151+
* const timedOut = error.killed && error.signal === "SIGTERM";
152+
* return {
153+
* stdout: error.stdout ?? "",
154+
* stderr: error.stderr ?? String(error),
155+
* outcome: timedOut
156+
* ? { type: "timeout" as const }
157+
* : { type: "exit" as const, exit_code: error.code ?? 1 },
158+
* };
159+
* }
160+
* })
161+
* );
162+
* return {
163+
* output: outputs,
164+
* maxOutputLength: action.max_output_length,
165+
* };
166+
* },
167+
* });
168+
*
169+
* const llmWithShell = model.bindTools([shellTool]);
170+
* const response = await llmWithShell.invoke(
171+
* "Find the largest PDF file in ~/Documents"
172+
* );
173+
* ```
174+
*
175+
* @example
176+
* ```typescript
177+
* // Full shell loop example
178+
* async function shellLoop(model, task) {
179+
* let response = await model.invoke(task, {
180+
* tools: [tools.shell({ execute: myExecutor })],
181+
* });
182+
*
183+
* while (true) {
184+
* const shellCall = response.additional_kwargs.tool_outputs?.find(
185+
* (output) => output.type === "shell_call"
186+
* );
187+
*
188+
* if (!shellCall) break;
189+
*
190+
* // Execute commands (with proper sandboxing!)
191+
* const result = await executeCommands(shellCall.action);
192+
*
193+
* // Send output back to model
194+
* response = await model.invoke([
195+
* response,
196+
* {
197+
* type: "shell_call_output",
198+
* call_id: shellCall.call_id,
199+
* output: result.output,
200+
* max_output_length: result.maxOutputLength,
201+
* },
202+
* ], {
203+
* tools: [tools.shell({ execute: myExecutor })],
204+
* });
205+
* }
206+
*
207+
* return response;
208+
* }
209+
* ```
210+
*
211+
* @remarks
212+
* - Only available through the Responses API (not Chat Completions)
213+
* - Designed for use with `gpt-5.1` model
214+
* - Commands are provided as an array of strings that can be executed concurrently
215+
* - Action includes: `commands`, `timeout_ms`, `max_output_length`
216+
* - Always sandbox or validate commands before execution
217+
* - The `timeout_ms` from the model is only a hint—enforce your own limits
218+
* - If `max_output_length` exists in the action, always pass it back in the output
219+
* - Many CLI tools return non-zero exit codes for warnings; still capture stdout/stderr
220+
*/
221+
export function shell(options: ShellOptions) {
222+
// Wrapper that converts ShellResult to string for LangChain tool compatibility
223+
const executeWrapper = async (action: ShellAction): Promise<string> => {
224+
const result = await options.execute(action);
225+
// Return a JSON string representation for the tool result
226+
return JSON.stringify({
227+
output: result.output,
228+
max_output_length: result.maxOutputLength,
229+
});
230+
};
231+
232+
const shellTool = tool(executeWrapper, {
233+
name: TOOL_NAME,
234+
description:
235+
"Execute shell commands in a managed environment. Commands can be run concurrently.",
236+
schema: {
237+
type: "object",
238+
properties: {
239+
commands: {
240+
type: "array",
241+
items: { type: "string" },
242+
description: "Array of shell commands to execute",
243+
},
244+
timeout_ms: {
245+
type: "number",
246+
description: "Optional timeout in milliseconds for the commands",
247+
},
248+
max_output_length: {
249+
type: "number",
250+
description:
251+
"Optional maximum number of characters to return from each command",
252+
},
253+
},
254+
required: ["commands"],
255+
},
256+
});
257+
258+
shellTool.extras = {
259+
...(shellTool.extras ?? {}),
260+
providerToolDefinition: {
261+
type: "shell",
262+
} as ShellTool,
263+
};
264+
265+
return shellTool;
266+
}

0 commit comments

Comments
 (0)