From e76c2f621cd1f321fac1bf36d6db21aa094d9510 Mon Sep 17 00:00:00 2001 From: "promptless[bot]" <179508745+promptless[bot]@users.noreply.github.com> Date: Tue, 17 Dec 2024 04:26:25 +0000 Subject: [PATCH] Docs update (01e1928) --- docs/docs/concepts.mdx | 13 ++++++++++++- docs/docs/how_to/index.mdx | 1 + 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/docs/docs/concepts.mdx b/docs/docs/concepts.mdx index 6cc0f135bff28..ff0216cb97830 100644 --- a/docs/docs/concepts.mdx +++ b/docs/docs/concepts.mdx @@ -422,6 +422,17 @@ That means there are two different axes along which you can customize your text For specifics on how to use text splitters, see the [relevant how-to guides here](/docs/how_to/#text-splitters). +| Type | Examples | Characters | Metadata | Description | +|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|---------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Recursive | [RecursiveCharacterTextSplitter](/docs/how_to/recursive_text_splitter/), [RecursiveJsonSplitter](/docs/how_to/recursive_json_splitter/) | A list of user defined characters | | Recursively splits text. This splitting is trying to keep related pieces of text next to each other. This is the `recommended way` to start splitting text. | +| HTML | [HTMLHeaderTextSplitter](/docs/how_to/HTML_header_metadata_splitter/), [HTMLSectionSplitter](/docs/how_to/HTML_section_aware_splitter/) | HTML specific characters | ✅ | Splits text based on HTML-specific characters. Notably, this adds in relevant information about where that chunk came from (based on the HTML) | +| Markdown | [MarkdownHeaderTextSplitter](/docs/how_to/markdown_header_metadata_splitter/), [ExperimentalMarkdownSyntaxTextSplitter](/docs/how_to/experimental_markdown_syntax_text_splitter/) | Markdown specific characters | ✅ | Splits text based on Markdown-specific characters. The `ExperimentalMarkdownSyntaxTextSplitter` retains the original whitespace and formatting, addressing issues with code blocks and nested lists. | +| Code | [many languages](/docs/how_to/code_splitter/) | Code (Python, JS) specific characters | | Splits text based on characters specific to coding languages. 15 different languages are available to choose from. | +| Token | [many classes](/docs/how_to/split_by_token/) | Tokens | | Splits text on tokens. There exist a few different ways to measure tokens. | +| Character | [CharacterTextSplitter](/docs/how_to/character_text_splitter/) | A user defined character | | Splits text based on a user defined character. One of the simpler methods. | + +For more detailed guidance on using these splitters, refer to the [How to: split Markdown with experimental syntax retention](/docs/how_to/experimental_markdown_syntax_text_splitter) guide. + ### Embedding models @@ -1038,7 +1049,7 @@ Table columns: |----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------|---------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Recursive | [RecursiveCharacterTextSplitter](/docs/how_to/recursive_text_splitter/), [RecursiveJsonSplitter](/docs/how_to/recursive_json_splitter/) | A list of user defined characters | | Recursively splits text. This splitting is trying to keep related pieces of text next to each other. This is the `recommended way` to start splitting text. | | HTML | [HTMLHeaderTextSplitter](/docs/how_to/HTML_header_metadata_splitter/), [HTMLSectionSplitter](/docs/how_to/HTML_section_aware_splitter/) | HTML specific characters | ✅ | Splits text based on HTML-specific characters. Notably, this adds in relevant information about where that chunk came from (based on the HTML) | -| Markdown | [MarkdownHeaderTextSplitter](/docs/how_to/markdown_header_metadata_splitter/), | Markdown specific characters | ✅ | Splits text based on Markdown-specific characters. Notably, this adds in relevant information about where that chunk came from (based on the Markdown) | +| Markdown | [MarkdownHeaderTextSplitter](/docs/how_to/markdown_header_metadata_splitter/), [ExperimentalMarkdownSyntaxTextSplitter](/docs/how_to/experimental_markdown_syntax_text_splitter/) | Markdown specific characters | ✅ | Splits text based on Markdown-specific characters. The `ExperimentalMarkdownSyntaxTextSplitter` retains the original whitespace and formatting, addressing issues with code blocks and nested lists. | | Code | [many languages](/docs/how_to/code_splitter/) | Code (Python, JS) specific characters | | Splits text based on characters specific to coding languages. 15 different languages are available to choose from. | | Token | [many classes](/docs/how_to/split_by_token/) | Tokens | | Splits text on tokens. There exist a few different ways to measure tokens. | | Character | [CharacterTextSplitter](/docs/how_to/character_text_splitter/) | A user defined character | | Splits text based on a user defined character. One of the simpler methods. | diff --git a/docs/docs/how_to/index.mdx b/docs/docs/how_to/index.mdx index b481805eaafaf..90e493727c4d1 100644 --- a/docs/docs/how_to/index.mdx +++ b/docs/docs/how_to/index.mdx @@ -134,6 +134,7 @@ What LangChain calls [LLMs](/docs/concepts/#llms) are older forms of language mo - [How to: split by character](/docs/how_to/character_text_splitter) - [How to: split code](/docs/how_to/code_splitter) - [How to: split Markdown by headers](/docs/how_to/markdown_header_metadata_splitter) +- [How to: split Markdown with experimental syntax retention](/docs/how_to/experimental_markdown_syntax_text_splitter) - [How to: recursively split JSON](/docs/how_to/recursive_json_splitter) - [How to: split text into semantic chunks](/docs/how_to/semantic-chunker) - [How to: split by tokens](/docs/how_to/split_by_token)