Improve NVIDIA TensorRT-LLM guide formatting and add Brev integration (openai#1989)

imran-binhasan · pap-openai · imran-binhasan · commit b26a375e7183 · 2025-08-05T14:03:49.000-07:00
Co-authored-by: pap-openai &lt;pap@openai.com&gt;
diff --git a/articles/run-nvidia.ipynb b/articles/run-nvidia.ipynb
@@ -18,7 +18,21 @@
     "- `gpt-oss-20b`\n",
     "- `gpt-oss-120b`\n",
     "\n",
-    "In this guide, we will run `gpt-oss-20b`, if you want to try the larger model or want more customization refer to [this](https://github.com/NVIDIA/TensorRT-LLM/tree/main/docs/source/blogs/tech_blog) deployment guide."
+    "In this guide, we will run `gpt-oss-20b`, if you want to try the larger model or want more customization refer to [this](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/blogs/tech_blog/blog9_Deploying_GPT_OSS_on_TRTLLM.md) deployment guide.\n",
+    "\n",
+    "Note: Your input prompts should use the [harmony response](http://cookbook.openai.com/articles/openai-harmony) format for the model to work properly, though this guide does not require it."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Launch on NVIDIA Brev\n",
+    "You can simplify the environment setup by using [NVIDIA Brev](https://developer.nvidia.com/brev). Click the button below to launch this project on a Brev instance with the necessary dependencies pre-configured.\n",
+    "\n",
+    "Once deployed, click on the \"Open Notebook\" button to get start with this guide\n",
+    "\n",
+    "[![Launch on Brev](https://brev-assets.s3.us-west-1.amazonaws.com/nv-lb-dark.svg)](https://brev.nvidia.com/launchable/deploy?launchableID=env-30i1YjHsRWT109HL6eYxLUeHIwF)"
    ]
   },
   {
@@ -33,69 +47,45 @@
    "metadata": {},
    "source": [
     "### Hardware\n",
-    "To run the 20B model and the TensorRT-LLM build process, you will need an NVIDIA GPU with at least 20 GB of VRAM.\n",
+    "To run the gpt-oss-20b model, you will need an NVIDIA GPU with at least 20 GB of VRAM.\n",
     "\n",
-    "> Recommended GPUs: NVIDIA RTX 50 Series (e.g.RTX 5090), NVIDIA H100, or L40S.\n",
+    "Recommended GPUs: NVIDIA Hopper (e.g., H100, H200), NVIDIA Blackwell (e.g., B100, B200), NVIDIA RTX PRO, NVIDIA RTX 50 Series (e.g., RTX 5090).\n",
     "\n",
     "### Software\n",
     "- CUDA Toolkit 12.8 or later\n",
-    "- Python 3.12 or later\n",
-    "- Access to the Orangina model checkpoint from Hugging Face"
+    "- Python 3.12 or later"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Installling TensorRT-LLM"
+    "## Installing TensorRT-LLM\n",
+    "\n",
+    "There are multiple ways to install TensorRT-LLM. In this guide, we'll cover using a pre-built Docker container from NVIDIA NGC as well as building from source.\n",
+    "\n",
+    "If you're using NVIDIA Brev, you can skip this section."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Using NGC\n",
+    "## Using NVIDIA NGC\n",
     "\n",
-    "Pull the pre-built TensorRT-LLM container for GPT-OSS from NVIDIA NGC.\n",
+    "Pull the pre-built [TensorRT-LLM container](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/release/tags) for GPT-OSS from [NVIDIA NGC](https://www.nvidia.com/en-us/gpu-cloud/).\n",
     "This is the easiest way to get started and ensures all dependencies are included.\n",
     "\n",
-    "`docker pull nvcr.io/nvidia/tensorrt-llm/release:gpt-oss-dev`\n",
-    "`docker run --gpus all -it --rm -v $(pwd):/workspace nvcr.io/nvidia/tensorrt-llm/release:gpt-oss-dev`\n",
+    "```bash\n",
+    "docker pull nvcr.io/nvidia/tensorrt-llm/release:gpt-oss-dev\n",
+    "docker run --gpus all -it --rm -v $(pwd):/workspace nvcr.io/nvidia/tensorrt-llm/release:gpt-oss-dev\n",
+    "```\n",
     "\n",
-    "## Using Docker (build from source)\n",
+    "## Using Docker (Build from Source)\n",
     "\n",
     "Alternatively, you can build the TensorRT-LLM container from source.\n",
-    "This is useful if you want to modify the source code or use a custom branch.\n",
-    "See the official instructions here: https://github.com/NVIDIA/TensorRT-LLM/tree/feat/gpt-oss/docker\n",
-    "\n",
-    "The following commands will install required dependencies, clone the repository,\n",
-    "check out the GPT-OSS feature branch, and build the Docker container:\n",
-    " ```\n",
-    "#Update package lists and install required system packages\n",
-    "sudo apt-get update && sudo apt-get -y install git git-lfs build-essential cmake\n",
-    "\n",
-    "# Initialize Git LFS (Large File Storage) for handling large model files\n",
-    "git lfs install\n",
-    "\n",
-    "# Clone the TensorRT-LLM repository\n",
-    "git clone https://github.com/NVIDIA/TensorRT-LLM.git\n",
-    "cd TensorRT-LLM\n",
-    "\n",
-    "# Check out the branch with GPT-OSS support\n",
-    "git checkout feat/gpt-oss\n",
-    "\n",
-    "# Initialize and update submodules (required for build)\n",
-    "git submodule update --init --recursive\n",
-    "\n",
-    "# Pull large files (e.g., model weights) managed by Git LFS\n",
-    "git lfs pull\n",
-    "\n",
-    "# Build the release Docker image\n",
-    "make -C docker release_build\n",
-    "\n",
-    "# Run the built Docker container\n",
-    "make -C docker release_run \n",
-    "```"
+    "This approach is useful if you want to modify the source code or use a custom branch.\n",
+    "For detailed instructions, see the [official documentation](https://github.com/NVIDIA/TensorRT-LLM/tree/feat/gpt-oss/docker)."
    ]
   },
   {
diff --git a/registry.yaml b/registry.yaml
@@ -4,8 +4,7 @@
 # should build pages for, and indicates metadata such as tags, creation date and
 # authors for each page.
 
-
-- title: Using NVIDIA TensorRT-LLM to run the 20B model
+- title: Using NVIDIA TensorRT-LLM to run gpt-oss-20b
   path: articles/run-nvidia.ipynb
   date: 2025-08-05
   authors: