diff --git a/examples/index/demo_inline_citations_streaming.ipynb b/examples/index/demo_inline_citations_streaming.ipynb
new file mode 100644
index 00000000..dacf5966
--- /dev/null
+++ b/examples/index/demo_inline_citations_streaming.ipynb
@@ -0,0 +1,404 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Inline Citations with LlamaCloud with Streaming\n",
+    "In this notebook we show you how to perform inline citations with LlamaCloud. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Setup\n",
+    "\n",
+    "Install core packages, download files. You will need to upload these documents to LlamaCloud."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install llama-index-core llama-cloud-services llama-index-llms-openai"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# download Apple\n",
+    "!wget \"https://s2.q4cdn.com/470004039/files/doc_earnings/2023/q4/filing/_10-K-Q4-2023-As-Filed.pdf\" -O data/apple_2023.pdf\n",
+    "!wget \"https://s2.q4cdn.com/470004039/files/doc_financials/2022/q4/_10-K-2022-(As-Filed).pdf\" -O data/apple_2022.pdf\n",
+    "!wget \"https://s2.q4cdn.com/470004039/files/doc_financials/2021/q4/_10-K-2021-(As-Filed).pdf\" -O data/apple_2021.pdf\n",
+    "!wget \"https://s2.q4cdn.com/470004039/files/doc_financials/2020/ar/_10-K-2020-(As-Filed).pdf\" -O data/apple_2020.pdf\n",
+    "!wget \"https://www.dropbox.com/scl/fi/i6vk884ggtq382mu3whfz/apple_2019_10k.pdf?rlkey=eudxh3muxh7kop43ov4bgaj5i&dl=1\" -O data/apple_2019.pdf\n",
+    "\n",
+    "# download Tesla\n",
+    "!wget \"https://ir.tesla.com/_flysystem/s3/sec/000162828024002390/tsla-20231231-gen.pdf\" -O data/tesla_2023.pdf\n",
+    "!wget \"https://ir.tesla.com/_flysystem/s3/sec/000095017023001409/tsla-20221231-gen.pdf\" -O data/tesla_2022.pdf\n",
+    "!wget \"https://www.dropbox.com/scl/fi/ptk83fmye7lqr7pz9r6dm/tesla_2021_10k.pdf?rlkey=24kxixeajbw9nru1sd6tg3bye&dl=1\" -O data/tesla_2021.pdf\n",
+    "!wget \"https://ir.tesla.com/_flysystem/s3/sec/000156459021004599/tsla-10k_20201231-gen.pdf\" -O data/tesla_2020.pdf\n",
+    "!wget \"https://ir.tesla.com/_flysystem/s3/sec/000156459020004475/tsla-10k_20191231-gen_0.pdf\" -O data/tesla_2019.pdf"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Some OpenAI and LlamaParse details. The OpenAI LLM is used for response synthesis."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# llama-parse is async-first, running the async code in a notebook requires the use of nest_asyncio\n",
+    "import nest_asyncio\n",
+    "\n",
+    "nest_asyncio.apply()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "# API access to llama-cloud\n",
+    "os.environ[\"LLAMA_CLOUD_API_KEY\"] = \"\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Using OpenAI API for embeddings/llms\n",
+    "os.environ[\"OPENAI_API_KEY\"] = \"\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Load Documents into LlamaCloud\n",
+    "\n",
+    "The first order of business is to download the 5 Apple and Tesla 10Ks and upload them into LlamaCloud.\n",
+    "\n",
+    "You can easily do this by creating a pipeline and uploading docs via the \"Files\" mode.\n",
+    "\n",
+    "After this is done, proceed to the next section."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Define NodeCitationPostProcessor\n",
+    "Add node id to metadata to match the citation links"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from typing import List, Optional\n",
+    "\n",
+    "from llama_index.core import QueryBundle\n",
+    "from llama_index.core.postprocessor.types import BaseNodePostprocessor\n",
+    "from llama_index.core.schema import NodeWithScore\n",
+    "\n",
+    "\n",
+    "class NodeCitationProcessor(BaseNodePostprocessor):\n",
+    "    \"\"\"\n",
+    "    Append node_id into metadata for citation purpose.\n",
+    "    Config SYSTEM_CITATION_PROMPT in your runtime environment variable to enable this feature.\n",
+    "    \"\"\"\n",
+    "\n",
+    "    def _postprocess_nodes(\n",
+    "        self,\n",
+    "        nodes: List[NodeWithScore],\n",
+    "        query_bundle: Optional[QueryBundle] = None,\n",
+    "    ) -> List[NodeWithScore]:\n",
+    "        for node_score in nodes:\n",
+    "            node_score.node.metadata[\"node_id\"] = node_score.node.node_id\n",
+    "        return nodes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Define System Citation Prompt\n",
+    "Modify the system prompt to add the citation links based on the metadata"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "SYSTEM_CITATION_PROMPT = \"\"\"You have provided information from a knowledge base that has been passed to you in nodes of information.\n",
+    "Each node has useful metadata such as node ID, file name, page, etc.\n",
+    "Please add the citation to the data node for each sentence or paragraph that you reference in the provided information.\n",
+    "The citation format is: . [citation:<node_id>]()\n",
+    "Where the <node_id> is the unique identifier of the data node.\n",
+    "\n",
+    "Example:\n",
+    "We have two nodes:\n",
+    "  node_id: xyz\n",
+    "  file_name: llama.pdf\n",
+    "  \n",
+    "  node_id: abc\n",
+    "  file_name: animal.pdf\n",
+    "\n",
+    "User question: Tell me a fun fact about Llama.\n",
+    "Your answer:\n",
+    "A baby llama is called \"Cria\" [citation:xyz]().\n",
+    "It often live in desert [citation:abc]().\n",
+    "It\\\\'s cute animal.\"\"\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Define LlamaCloud Retriever over Documents\n",
+    "\n",
+    "In this section we define LlamaCloud Retriever over these documents."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_cloud_services import LlamaCloudIndex\n",
+    "import os\n",
+    "\n",
+    "index = LlamaCloudIndex(\n",
+    "    name=\"apple_demo\",\n",
+    "    project_name=\"llamacloud_demo\",\n",
+    "    api_key=os.environ[\"LLAMA_CLOUD_API_KEY\"],\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Define chunk retriever\n",
+    "\n",
+    "The chunk-level retriever does vector search with a final reranked set of `rerank_top_n=5`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "chunk_retriever = index.as_retriever(retrieval_mode=\"chunks\", rerank_top_n=5)\n",
+    "from llama_index.core.query_engine import RetrieverQueryEngine\n",
+    "from llama_index.llms.openai import OpenAI\n",
+    "\n",
+    "llm = OpenAI(model=\"gpt-4o-mini\", system_prompt=SYSTEM_CITATION_PROMPT)\n",
+    "query_engine = RetrieverQueryEngine.from_args(\n",
+    "    chunk_retriever,\n",
+    "    llm=llm,\n",
+    "    response_mode=\"tree_summarize\",\n",
+    "    node_postprocessors=[NodeCitationProcessor()],\n",
+    "    streaming=True,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Generate final output matching citations with page labela\n",
+    "Given the found nodes, match the page assigned and build a final url"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import re\n",
+    "from typing import Generator\n",
+    "\n",
+    "# Accepts [citation:ID] and [citation:ID]() (spaces/newlines allowed, case-insensitive)\n",
+    "_CITATION_RX = re.compile(\n",
+    "    r\"\\[\\s*citation\\s*:\\s*([^\\]]+?)\\s*\\]\\s*(?:\\(\\s*\\))?\", re.IGNORECASE\n",
+    ")\n",
+    "# Detects a trailing, incomplete \"[citation:\" at the END of a string\n",
+    "_INCOMPLETE_RX = re.compile(r\"\\[\\s*citation\\s*:\\s*[^\\]]*$\", re.IGNORECASE)\n",
+    "\n",
+    "\n",
+    "def stream_citations_with_sources(\n",
+    "    resp, check_every: int = 64\n",
+    ") -> Generator[str, None, None]:\n",
+    "    \"\"\"\n",
+    "    Incrementally replace [citation:ID] / [citation:ID]() with:\n",
+    "      [n](https://fake.url/SampleFile#page=<page_label>)\n",
+    "    Emits only the *new* safe prefix each time; never flushes partial tags.\n",
+    "    \"\"\"\n",
+    "\n",
+    "    # Build id -> page_label now (OK if empty; we'll use 'unknown')\n",
+    "    nodes = getattr(resp, \"source_nodes\", []) or []\n",
+    "    id_to_label = {str(n.id_): n.metadata.get(\"page_label\", \"unknown\") for n in nodes}\n",
+    "\n",
+    "    order: dict[str, int] = {}\n",
+    "    counter = [1]  # mutable to avoid nonlocal\n",
+    "\n",
+    "    def _link_for(cid: str) -> str:\n",
+    "        cid = cid.strip()\n",
+    "        if cid not in order:\n",
+    "            order[cid] = counter[0]\n",
+    "            counter[0] += 1\n",
+    "        n = order[cid]\n",
+    "        page = id_to_label.get(cid, \"unknown\")\n",
+    "        # TODO: replace fake URL with node.metadata[\"web_url\"] when available\n",
+    "        return f\"[{n}](https://fake.url/SampleFile#page={page})\"\n",
+    "\n",
+    "    def _replace_complete(text: str) -> str:\n",
+    "        def _repl(m: re.Match) -> str:\n",
+    "            return _link_for(m.group(1))\n",
+    "\n",
+    "        # Replace only complete tags; do NOT strip any incomplete tail here\n",
+    "        return _CITATION_RX.sub(_repl, text)\n",
+    "\n",
+    "    acc = \"\"  # full accumulated text so far\n",
+    "    emitted_upto = 0  # index in acc we've already emitted\n",
+    "    since = 0\n",
+    "\n",
+    "    for chunk in resp.response_gen:\n",
+    "        acc += chunk\n",
+    "        acc = _replace_complete(acc)  # replace anywhere tags became complete\n",
+    "        since += len(chunk)\n",
+    "\n",
+    "        # Find safe end: don't include a trailing incomplete \"[citation:\"\n",
+    "        safe_end = len(acc)\n",
+    "        m = _INCOMPLETE_RX.search(acc)\n",
+    "        if m and m.end() == len(acc):\n",
+    "            safe_end = m.start()\n",
+    "\n",
+    "        # Emit only the newly available safe prefix\n",
+    "        if safe_end > emitted_upto and (\n",
+    "            \"]\" in chunk or \")\" in chunk or since >= check_every\n",
+    "        ):\n",
+    "            yield acc[emitted_upto:safe_end]\n",
+    "            emitted_upto = safe_end\n",
+    "            since = 0\n",
+    "\n",
+    "    # End of stream: drop any dangling start, replace once more, emit the rest\n",
+    "    tail = acc[emitted_upto:]\n",
+    "    if tail:\n",
+    "        # Remove trailing incomplete start if present\n",
+    "        m = _INCOMPLETE_RX.search(tail)\n",
+    "        if m and m.end() == len(tail):\n",
+    "            tail = tail[: m.start()]\n",
+    "        tail = _replace_complete(tail)\n",
+    "        if tail:\n",
+    "            yield tail"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Query it"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "query = \"What are the tiny risks for apple\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<pre>The Company faces several risks that could be considered \"tiny\" or less significant in the broader context of its operations. These include:\n",
+       "\n",
+       "1. **Credit Risk**: The Company is exposed to credit risk on its trade accounts receivable and vendor non-trade receivables, particularly during periods of economic downturns. This risk is heightened when economic conditions worsen, which could lead to difficulties in collecting receivables [1](https://fake.url/SampleFile#page=19).\n",
+       "\n",
+       "2. **Dependence on Outsourcing Partners**: The Company relies on outsourcing partners for manufacturing and logistics. While this can lower operating costs, it also reduces direct control over production and distribution, which could lead to quality issues or supply disruptions [2](https://fake.url/SampleFile#page=11).\n",
+       "\n",
+       "3. **Single-Source Suppliers**: The Company depends on single-source suppliers for many components, which exposes it to supply and pricing risks. Any failure of these suppliers to perform can negatively impact the Company's operations [2](https://fake.url/SampleFile#page=11).\n",
+       "\n",
+       "4. **Volatility in Stock Price**: The Company's stock price has experienced significant volatility in the past and may continue to do so. This volatility can be influenced by factors unrelated to the Company's operating performance, which could affect investor confidence [3](https://fake.url/SampleFile#page=8).\n",
+       "\n",
+       "5. **Impact of Political and Economic Conditions**: The Company’s operations can be affected by political events, trade disputes, and other international issues, which could disrupt commerce and impact its business [3](https://fake.url/SampleFile#page=8).\n",
+       "\n",
+       "While these risks may not be the most significant compared to larger operational or financial risks, they still represent areas where the Company must maintain vigilance to mitigate potential impacts.</pre>"
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "from IPython.display import display, HTML\n",
+    "\n",
+    "resp = query_engine.query(query)\n",
+    "\n",
+    "buf = []\n",
+    "handle = display(HTML(\"<pre></pre>\"), display_id=True)\n",
+    "\n",
+    "for part in stream_citations_with_sources(resp):\n",
+    "    buf.append(part)\n",
+    "    html = \"<pre>\" + \"\".join(buf) + \"</pre>\"\n",
+    "    handle.update(HTML(html))"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}