hassancs91 · joel-lzb · Aug 23, 2024 · Aug 23, 2024 · Aug 23, 2024 · Aug 23, 2024
diff --git a/.gitignore b/.gitignore
@@ -8,6 +8,7 @@ venv/
 /dist
 /codes
 /build
+/SimplerLLM/workflow
 /SimplerLLM.egg-info
 .pypirc 
 generate_images.py
@@ -30,5 +31,4 @@ test.csv
 test_data_agent.py
 test_sql_agent.py
 vb_ui.py
-vd_test.py
-Documentation
+vd_test.py
diff --git a/Documentation/docs/AI Agents/Getting Started.md b/Documentation/docs/AI Agents/Getting Started.md
@@ -0,0 +1,5 @@
+---
+sidebar_position: 1
+--- 
+
+# Coming Soon
diff --git a/Documentation/docs/AI Agents/_category_.json b/Documentation/docs/AI Agents/_category_.json
@@ -0,0 +1,4 @@
+{
+    "label": "AI Agents",
+    "position": 6
+}  
diff --git a/Documentation/docs/Advanced Tools/Chunking Methods.md b/Documentation/docs/Advanced Tools/Chunking Methods.md
@@ -0,0 +1,99 @@
+---
+sidebar_position: 4
+--- 
+
+# Chunking Methods
+
+This section provides detailed information on the text chunking capabilities of the SimplerLLM library. These functions allow users to split text into pieces based on sentence, paragraph, size, or semantic similarity.
+
+Each method is designed to accommodate different analytical needs, enhancing text processing tasks in various applications such as data preprocessing, content analysis, and information retrieval.
+
+The Data by all these functions is returned in form of a `Text Chunks` object that includes the following parameters:
+- `chunk_list` (List): This is a list of `ChunkInfo` objects that includes:
+    - `text` (string): The text of the chunk itself.
+    - `num_characters` (string): The number of characters in the chunk.
+    - `num_words` (string): The number of words in the chunk.
+- `num_chunks` (int): The total number of chunks returned.
+
+## chunk_by_sentences Function
+
+Breaks down the provided text into sentences using punctuation marks as delimiters. It takes 1 parameter which is:
+- `text` (str): Text you want to chunk into sentences.
+
+It then returns a `Text Chunks` object. Here's a sample usage:
+
+```python
+from SimplerLLM.tools.text_chunker import chunk_by_sentences
+
+text = "First sentence. Second sentence? Third sentence!"
+
+sentences = chunk_by_sentences(text)
+
+print(sentences)
+```
+
+## chunk_by_paragraphs Function
+
+Segments the provided text into paragraphs based on newline characters. It takes 1 parameter:
+- `text` (str): Text you want to chunk into paragraphs.
+
+It then returns a `Text Chunks` object. Here's a sample usage:
+
+```python
+from SimplerLLM.tools.text_chunker import chunk_by_paragraphs
+
+text = "First paragraph, still going.\n\nSecond paragraph starts."
+
+paragraphs = chunk_by_paragraphs(text)
+
+print(paragraphs)
+```
+
+## chunk_by_max_chunk_size Function
+
+Splits the input text into chunks that do not exceed a specified size. Additionally, it can preserve the meaning of sentences by ensuring that chunks do not split sentences in the middle. It takes 3 parameters:
+- `text` (str): The text you want to chunk.
+- `max_chunk_size` (int): The maximum size of each chunk in characters.
+- `preserve_sentence_structure` (bool, optional): Whether you want to preserve sentence meaning. Set to False by default.
+
+It returns a `Text Chunks` object. Here's how you can use it:
+
+```python
+from SimplerLLM.tools.text_chunker import chunk_by_max_chunk_size
+
+text = "Hello world! This is an example of text chunking. Enjoy using SimplerLLM."
+
+chunks = chunk_by_max_chunk_size(text, 50, True)
+
+print(chunks)
+```
+
+## chunk_by_semantics Function
+
+Uses semantic similarity to divide text into chunks. It takes 2 parameters:
+- `text` (str): Text to be segmented based on semantic content.
+- `llm_embeddings_instance` [(EmbeddingsLLM)](https://docs.simplerllm.com/Vector%20Storage/Vector%20Embeddings): An instance of a language model used to generate text embeddings for semantic analysis.
+- `threshold_percentage` (int, Optional): The percentile threshold you want to use to chunk the text. It is set by default to 90.
+
+It returns a list of `ChunkInfo` objects, each representing a semantically coherent segment of the original text. However, keep in mind that you need to have your OpenAI API key in the `.env` file so that the llm embedding instance can generate the text embeddings. Enter it in this format:
+
+```
+OPENAI_API_KEY="your_openai_api_key"
+```
+
+Anyways, Here's an example usage of the code:
+
+```python
+from SimplerLLM.tools.text_chunker import chunk_by_semantics
+from SimplerLLM.language.embeddings import EmbeddingsLLM, EmbeddingsProvider
+
+text = "Discussing AI. Artificial intelligence has many applications. However, Dogs like bones"
+embeddings_model = EmbeddingsLLM.create(provider=EmbeddingsProvider.OPENAI,
+                                        model_name="text-embedding-3-small"
+                                        threshold_percentage=80) 
+
+semantic_chunks = chunk_by_semantics(text, embeddings_model)
+
+print(semantic_chunks)
+```
+That's how you can benefit from SimplerLLM to make Text Chunking Simpler!
diff --git a/Documentation/docs/Advanced Tools/Extract YouTube Data.md b/Documentation/docs/Advanced Tools/Extract YouTube Data.md
@@ -0,0 +1,86 @@
+---
+sidebar_position: 3
+--- 
+
+# Extract YouTube Data
+
+The functions in this section are designed to extract detailed information from YouTube videos, including metadata and transcripts. You can benefit from these capabilities to build powerful APIs / tools / applications.
+
+## `get_video_meta(video_url)` Function
+
+This function takes only the `video_url` as input and it fetches detailed metadata from a specified YouTube video. It retrieves a ton of information returning them in a dictionary format: 
+- `video_id`: The unique identifier for the video.
+- `video_title`: The title of the video.
+- `video_description`: A description of the video.
+- `video_length`: The duration of the video in seconds.
+- `video_views`: The number of times the video has been viewed.
+- `video_author`: The creator of the video.
+- `video_publish_date`: The publication date of the video.
+- `video_thumbnail_url`: The URL of the video's thumbnail.
+- `video_rating`: The average user rating of the video.
+- `video_keywords`: Keywords associated with the video.
+
+### Example Usage
+
+```python
+from SimplerLLM.tools.youtube import get_video_meta
+
+video_meta = get_video_meta("https://www.youtube.com/watch?v=r9PjzmUmk1w")
+
+print(video_meta)
+```
+
+The video meta is returned in the following format:
+```
+{'video_id': 'r9PjzmUmk1w', 'video_title': 'Build SaaS with WordPress With 3 Plugins Only!', 'video_description': None, 'video_length': 252, 'video_views': 25845, 'video_author': 'Hasan Aboul Hasan', 'video_publish_date': datetime.datetime(2024, 2, 15, 0, 0), 'video_thumbnail_url': 'https://i.ytimg.com/vi/r9PjzmUmk1w/hqdefault.jpg?sqp=-oaymwEXCJADEOABSFryq4qpAwkIARUAAIhCGAE=&rs=AOn4CLDLDIEv0NjGrhaAKQ8GL2SpvwDDng', 'video_rating': None, 'video_keywords': []}
+```
+
+Let's say you only want to get the video title, here's how you get it:
+
+```python
+from SimplerLLM.tools.youtube import get_video_meta
+
+video_meta = get_video_meta("https://www.youtube.com/watch?v=r9PjzmUmk1w")
+
+print(video_meta.get('video_title'))
+```
+
+Use the same method to extract any value you want.
+
+## `get_youtube_transcript(video_url)` Function
+
+This function also takes only the `video_url`, and it returns the transcript of the YouTube video, formatting it into a simple readable string. 
+
+### Example Usage
+
+```python
+from SimplerLLM.tools.youtube import get_youtube_transcript
+
+video_transcript = get_youtube_transcript("https://www.youtube.com/watch?v=r9PjzmUmk1w")
+
+print(video_transcript)
+```
+
+## `get_youtube_transcript_with_timing(video_url)` Function
+
+This function also takes only the `video_url`, and retrieves the transcript of a YouTube video, including timing information for each line. It returns a list of dictionaries, where each dictionary refers to a part of the transcript and it contains the following:
+- `text`: The transcript text of a specific segment of the video.
+- `start`: The start time of the segment in seconds.
+- `duration`: The duration of the segment in seconds.
+
+### Example Usage
+
+```python
+from SimplerLLM.tools.youtube import get_youtube_transcript_with_timing
+
+video_transcript = get_youtube_transcript_with_timing("https://www.youtube.com/watch?v=r9PjzmUmk1w")
+
+print(video_transcript)
+```
+
+Here's the output format of a small section:
+```
+[{'text': 'hi friends in this video I will show you', 'start': 0.12, 'duration': 6.08}, {'text': 'how to turn any WordPress website into a', 'start': 2.639, 'duration': 7.481}, {'text': 'full SAS business using only three', 'start': 6.2, 'duration': 7.639}, {'text': 'plugins this is exactly what I did on my', 'start': 10.12, 'duration': 6.56}, {'text': 'website you will see here I have a list', 'start': 13.839, 'duration': 5.401}]
+```
+
+That's how you can benefit from SimplerLLM to make extracting YouTube data Simpler!
diff --git a/Documentation/docs/Advanced Tools/File Operations.md b/Documentation/docs/Advanced Tools/File Operations.md
@@ -0,0 +1,124 @@
+---
+sidebar_position: 1
+--- 
+
+# File Operations
+
+SimplerLLM supports creating and loading the content of various file types. This makes it easy to load the content of any file or even content from the internet using a generic function.
+
+The file operations available are categorized into three primary areas:
+- **Saving Text to Files**: This functionality allows for the writing of text data to files ensuring errors are handled correctly.
+- **Loading CSV Files**: This functionality allows easy reading of any CSV file document, where it returns specific strucutred data.
+- **Generic File Loading**: This includes loading the details of various types of files, such as plain text, PDFs, DOCX, web pages, and even youtube video data. 
+
+Here's how each of them works:
+
+## Saving Text to File
+
+This operation contains a single function `save_text_to_file` which takes as input the text you want to save and the name of the file you want to save in.
+
+If the file is already present in your directory it just rewrites its content, however if it's not present it creates the file and adds the the input text to it. Here's an example:
+
+```python
+from SimplerLLM.tools.file_functions import save_text_to_file
+
+input_text = save_text_to_file("This is the text saved in the file", "file.txt")
+
+print(input_text)
+```
+
+As you can see it takes 2 paramters:
+- `text (str)`: The text content to save.
+- `filename (str)` (Optional): The destination filename. Defaults to "output.txt".
+
+Then, it returns a bool (True/False), representing if the file was created successfully or not.
+
+## Loading CSV Files
+
+This operation also contains a single function `load_csv_file` which takes as input the path to the CSV, and returns a `CSVDocument` object which provides a structured way to access the CSV data, including the following attributes that you can access independently:
+- `file_size`: The size of the CSV file in bytes.
+- `row_count`: Number of rows in the CSV.
+- `column_count`: Number of columns in the CSV.
+- `total_fields`: Total number of data fields.
+- `content`: Nested list representing rows and columns.
+- `title`: Title of the document (Will be set to None in this function)
+- `url_or_path`: CSV file name.
+
+Here's an example of the function in action:
+
+```python
+from SimplerLLM.tools.file_loader import read_csv_file
+
+csv_data = read_csv_file("text.csv")
+
+print(csv_data)
+```
+
+When you print the csv_data as is it will return the whole `CSVDocument` object with all its attributes. However, if you want access for example only the content of the file, here's how you do it:
+
+```python
+from SimplerLLM.tools.file_loader import read_csv_file
+
+csv_data = read_csv_file("text.csv")
+
+print(csv_data.content)
+```
+
+Use the same method for accessing the other attributes. 
+Here's another example on how to access the column count:
+
+```python
+from SimplerLLM.tools.file_loader import read_csv_file
+
+csv_data = read_csv_file("text.csv")
+
+print(csv_data.column_count)
+```
+
+## Generic Loading Of Other File Types
+
+This generic loader supports a ton of file types which are:
+- Web Articles
+- YouTube video transcripts
+- Traditional formats like TXT, PDF, CSV, and DOCX.
+
+The `load_content` function takes the file name as input, and returns a `Text Document` object that has the following attributes:
+- `file_size`: The size of the file in bytes.
+- `word_count`: The number of words in the file.
+- `character_count`: The number of characters in the file.
+- `content`: String representing the contents of the file.
+- `title`: Title of the document (if it has one)
+- `url_or_path`: file name.
+
+Here's an example of the function in action:
+
+```python
+from SimplerLLM.tools.generic_loader import load_content
+
+file_data = load_content("file_name.csv")
+
+print(file_data)
+```
+
+When you print the file_data as is it will return the whole `Text Document` object with all its attributes. However, if you want access for example only the content of the file, here's how you do it:
+
+```python
+from SimplerLLM.tools.generic_loader import load_content
+
+file_data = load_content("file_name.csv")
+
+print(file_data.content)
+```
+
+Use the same method for accessing the other attributes. 
+Here's another example on how to access the word count:
+
+```python
+from SimplerLLM.tools.generic_loader import load_content
+
+file_data = load_content("file_name.csv")
+
+print(file_data.word_count)
+```
+
+That's how you can benefit from SimplerLLM to make interaction with files Simpler!