Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
aa4cf22
Update .gitignore
huseinaa Aug 23, 2024
6687b80
Merge pull request #3 from huseinaa/patch-2
hassancs91 Aug 23, 2024
b288bfe
updated
hassancs91 Aug 23, 2024
ace80c1
Merge branch 'main' into main
hassancs91 Aug 23, 2024
943bac2
Merge pull request #2 from huseinaa/main
hassancs91 Aug 23, 2024
1d793dd
Update .gitignore
hassancs91 Aug 23, 2024
dfb86cd
Merge branch 'main' of https://github.com/hassancs91/SimplerLLM
hassancs91 Aug 23, 2024
10812d5
Update .gitignore
hassancs91 Aug 23, 2024
a035ccb
Update .gitignore
hassancs91 Aug 23, 2024
0d6e431
Update .gitignore
hassancs91 Aug 23, 2024
edf16da
Merge branch 'main' of https://github.com/huseinaa/SimplerLLM
huseinaa Aug 23, 2024
1f86b3f
Update main_home_page.js
hassancs91 Aug 23, 2024
28ca084
Update .gitignore
huseinaa Aug 24, 2024
18af824
Update intro.md
huseinaa Aug 24, 2024
c2da635
Create Choose the LLM.md
huseinaa Aug 24, 2024
aad0fdb
Create Getting Started.md
huseinaa Aug 24, 2024
50aa074
Update docusaurus.config.js
huseinaa Aug 24, 2024
9384112
Update main_home_page.js
huseinaa Aug 24, 2024
293be07
Merge branch 'main' into main
huseinaa Aug 24, 2024
2093c2a
Merge pull request #4 from huseinaa/main
hassancs91 Aug 24, 2024
d6a8f0c
Update docusaurus.config.js
huseinaa Aug 24, 2024
1c8343e
Merge branch 'main' of https://github.com/huseinaa/SimplerLLM
huseinaa Aug 24, 2024
0f9bcec
Merge pull request #5 from huseinaa/main
hassancs91 Aug 24, 2024
9898faa
Added all folders
huseinaa Aug 25, 2024
d78222c
updated yotube and RapidAPI
huseinaa Aug 25, 2024
b6ceef0
finished the tools section
huseinaa Aug 26, 2024
d586093
Merge pull request #2 from hassancs91/main
huseinaa Aug 26, 2024
b131d2c
Merge pull request #6 from huseinaa/main
hassancs91 Aug 26, 2024
922c78c
fixed ollama generation
hassancs91 Aug 26, 2024
03ccd7d
Merge branch 'main' of https://github.com/hassancs91/SimplerLLM
hassancs91 Aug 26, 2024
58312f2
Update setup.py
hassancs91 Aug 26, 2024
08f29ea
finalize first deployment
huseinaa Aug 27, 2024
2274cf0
Merge branch 'main' of https://github.com/huseinaa/SimplerLLM
huseinaa Aug 27, 2024
7679150
Merge pull request #3 from hassancs91/main
huseinaa Aug 27, 2024
2bdd1cb
add Google Analytics
huseinaa Aug 27, 2024
bebdc7b
Merge branch 'main' of https://github.com/huseinaa/SimplerLLM
huseinaa Aug 27, 2024
c304f5e
Merge pull request #7 from huseinaa/main
hassancs91 Aug 27, 2024
de3fc68
Update lwh_llm.py
hassancs91 Aug 28, 2024
86981ca
Updated requirements
hassancs91 Sep 9, 2024
83fcf2c
Add Antropic Claude Caching
huseinaa Sep 18, 2024
927b416
Merge pull request #8 from huseinaa/main
hassancs91 Oct 1, 2024
57a9d1c
added AzureOpenAI call
joel-lzb Nov 22, 2024
449229b
added AzureOpenAI call
joel-lzb Nov 22, 2024
d8e109e
Update readme.md
joel-lzb Nov 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ venv/
/dist
/codes
/build
/SimplerLLM/workflow
/SimplerLLM.egg-info
.pypirc
generate_images.py
Expand All @@ -30,5 +31,4 @@ test.csv
test_data_agent.py
test_sql_agent.py
vb_ui.py
vd_test.py
Documentation
vd_test.py
5 changes: 5 additions & 0 deletions Documentation/docs/AI Agents/Getting Started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
sidebar_position: 1
---

# Coming Soon
4 changes: 4 additions & 0 deletions Documentation/docs/AI Agents/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"label": "AI Agents",
"position": 6
}
99 changes: 99 additions & 0 deletions Documentation/docs/Advanced Tools/Chunking Methods.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
---
sidebar_position: 4
---

# Chunking Methods

This section provides detailed information on the text chunking capabilities of the SimplerLLM library. These functions allow users to split text into pieces based on sentence, paragraph, size, or semantic similarity.

Each method is designed to accommodate different analytical needs, enhancing text processing tasks in various applications such as data preprocessing, content analysis, and information retrieval.

The Data by all these functions is returned in form of a `Text Chunks` object that includes the following parameters:
- `chunk_list` (List): This is a list of `ChunkInfo` objects that includes:
- `text` (string): The text of the chunk itself.
- `num_characters` (string): The number of characters in the chunk.
- `num_words` (string): The number of words in the chunk.
- `num_chunks` (int): The total number of chunks returned.

## chunk_by_sentences Function

Breaks down the provided text into sentences using punctuation marks as delimiters. It takes 1 parameter which is:
- `text` (str): Text you want to chunk into sentences.

It then returns a `Text Chunks` object. Here's a sample usage:

```python
from SimplerLLM.tools.text_chunker import chunk_by_sentences

text = "First sentence. Second sentence? Third sentence!"

sentences = chunk_by_sentences(text)

print(sentences)
```

## chunk_by_paragraphs Function

Segments the provided text into paragraphs based on newline characters. It takes 1 parameter:
- `text` (str): Text you want to chunk into paragraphs.

It then returns a `Text Chunks` object. Here's a sample usage:

```python
from SimplerLLM.tools.text_chunker import chunk_by_paragraphs

text = "First paragraph, still going.\n\nSecond paragraph starts."

paragraphs = chunk_by_paragraphs(text)

print(paragraphs)
```

## chunk_by_max_chunk_size Function

Splits the input text into chunks that do not exceed a specified size. Additionally, it can preserve the meaning of sentences by ensuring that chunks do not split sentences in the middle. It takes 3 parameters:
- `text` (str): The text you want to chunk.
- `max_chunk_size` (int): The maximum size of each chunk in characters.
- `preserve_sentence_structure` (bool, optional): Whether you want to preserve sentence meaning. Set to False by default.

It returns a `Text Chunks` object. Here's how you can use it:

```python
from SimplerLLM.tools.text_chunker import chunk_by_max_chunk_size

text = "Hello world! This is an example of text chunking. Enjoy using SimplerLLM."

chunks = chunk_by_max_chunk_size(text, 50, True)

print(chunks)
```

## chunk_by_semantics Function

Uses semantic similarity to divide text into chunks. It takes 2 parameters:
- `text` (str): Text to be segmented based on semantic content.
- `llm_embeddings_instance` [(EmbeddingsLLM)](https://docs.simplerllm.com/Vector%20Storage/Vector%20Embeddings): An instance of a language model used to generate text embeddings for semantic analysis.
- `threshold_percentage` (int, Optional): The percentile threshold you want to use to chunk the text. It is set by default to 90.

It returns a list of `ChunkInfo` objects, each representing a semantically coherent segment of the original text. However, keep in mind that you need to have your OpenAI API key in the `.env` file so that the llm embedding instance can generate the text embeddings. Enter it in this format:

```
OPENAI_API_KEY="your_openai_api_key"
```

Anyways, Here's an example usage of the code:

```python
from SimplerLLM.tools.text_chunker import chunk_by_semantics
from SimplerLLM.language.embeddings import EmbeddingsLLM, EmbeddingsProvider

text = "Discussing AI. Artificial intelligence has many applications. However, Dogs like bones"
embeddings_model = EmbeddingsLLM.create(provider=EmbeddingsProvider.OPENAI,
model_name="text-embedding-3-small"
threshold_percentage=80)

semantic_chunks = chunk_by_semantics(text, embeddings_model)

print(semantic_chunks)
```
That's how you can benefit from SimplerLLM to make Text Chunking Simpler!
86 changes: 86 additions & 0 deletions Documentation/docs/Advanced Tools/Extract YouTube Data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
---
sidebar_position: 3
---

# Extract YouTube Data

The functions in this section are designed to extract detailed information from YouTube videos, including metadata and transcripts. You can benefit from these capabilities to build powerful APIs / tools / applications.

## `get_video_meta(video_url)` Function

This function takes only the `video_url` as input and it fetches detailed metadata from a specified YouTube video. It retrieves a ton of information returning them in a dictionary format:
- `video_id`: The unique identifier for the video.
- `video_title`: The title of the video.
- `video_description`: A description of the video.
- `video_length`: The duration of the video in seconds.
- `video_views`: The number of times the video has been viewed.
- `video_author`: The creator of the video.
- `video_publish_date`: The publication date of the video.
- `video_thumbnail_url`: The URL of the video's thumbnail.
- `video_rating`: The average user rating of the video.
- `video_keywords`: Keywords associated with the video.

### Example Usage

```python
from SimplerLLM.tools.youtube import get_video_meta

video_meta = get_video_meta("https://www.youtube.com/watch?v=r9PjzmUmk1w")

print(video_meta)
```

The video meta is returned in the following format:
```
{'video_id': 'r9PjzmUmk1w', 'video_title': 'Build SaaS with WordPress With 3 Plugins Only!', 'video_description': None, 'video_length': 252, 'video_views': 25845, 'video_author': 'Hasan Aboul Hasan', 'video_publish_date': datetime.datetime(2024, 2, 15, 0, 0), 'video_thumbnail_url': 'https://i.ytimg.com/vi/r9PjzmUmk1w/hqdefault.jpg?sqp=-oaymwEXCJADEOABSFryq4qpAwkIARUAAIhCGAE=&rs=AOn4CLDLDIEv0NjGrhaAKQ8GL2SpvwDDng', 'video_rating': None, 'video_keywords': []}
```

Let's say you only want to get the video title, here's how you get it:

```python
from SimplerLLM.tools.youtube import get_video_meta

video_meta = get_video_meta("https://www.youtube.com/watch?v=r9PjzmUmk1w")

print(video_meta.get('video_title'))
```

Use the same method to extract any value you want.

## `get_youtube_transcript(video_url)` Function

This function also takes only the `video_url`, and it returns the transcript of the YouTube video, formatting it into a simple readable string.

### Example Usage

```python
from SimplerLLM.tools.youtube import get_youtube_transcript

video_transcript = get_youtube_transcript("https://www.youtube.com/watch?v=r9PjzmUmk1w")

print(video_transcript)
```

## `get_youtube_transcript_with_timing(video_url)` Function

This function also takes only the `video_url`, and retrieves the transcript of a YouTube video, including timing information for each line. It returns a list of dictionaries, where each dictionary refers to a part of the transcript and it contains the following:
- `text`: The transcript text of a specific segment of the video.
- `start`: The start time of the segment in seconds.
- `duration`: The duration of the segment in seconds.

### Example Usage

```python
from SimplerLLM.tools.youtube import get_youtube_transcript_with_timing

video_transcript = get_youtube_transcript_with_timing("https://www.youtube.com/watch?v=r9PjzmUmk1w")

print(video_transcript)
```

Here's the output format of a small section:
```
[{'text': 'hi friends in this video I will show you', 'start': 0.12, 'duration': 6.08}, {'text': 'how to turn any WordPress website into a', 'start': 2.639, 'duration': 7.481}, {'text': 'full SAS business using only three', 'start': 6.2, 'duration': 7.639}, {'text': 'plugins this is exactly what I did on my', 'start': 10.12, 'duration': 6.56}, {'text': 'website you will see here I have a list', 'start': 13.839, 'duration': 5.401}]
```

That's how you can benefit from SimplerLLM to make extracting YouTube data Simpler!
124 changes: 124 additions & 0 deletions Documentation/docs/Advanced Tools/File Operations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
---
sidebar_position: 1
---

# File Operations

SimplerLLM supports creating and loading the content of various file types. This makes it easy to load the content of any file or even content from the internet using a generic function.

The file operations available are categorized into three primary areas:
- **Saving Text to Files**: This functionality allows for the writing of text data to files ensuring errors are handled correctly.
- **Loading CSV Files**: This functionality allows easy reading of any CSV file document, where it returns specific strucutred data.
- **Generic File Loading**: This includes loading the details of various types of files, such as plain text, PDFs, DOCX, web pages, and even youtube video data.

Here's how each of them works:

## Saving Text to File

This operation contains a single function `save_text_to_file` which takes as input the text you want to save and the name of the file you want to save in.

If the file is already present in your directory it just rewrites its content, however if it's not present it creates the file and adds the the input text to it. Here's an example:

```python
from SimplerLLM.tools.file_functions import save_text_to_file

input_text = save_text_to_file("This is the text saved in the file", "file.txt")

print(input_text)
```

As you can see it takes 2 paramters:
- `text (str)`: The text content to save.
- `filename (str)` (Optional): The destination filename. Defaults to "output.txt".

Then, it returns a bool (True/False), representing if the file was created successfully or not.

## Loading CSV Files

This operation also contains a single function `load_csv_file` which takes as input the path to the CSV, and returns a `CSVDocument` object which provides a structured way to access the CSV data, including the following attributes that you can access independently:
- `file_size`: The size of the CSV file in bytes.
- `row_count`: Number of rows in the CSV.
- `column_count`: Number of columns in the CSV.
- `total_fields`: Total number of data fields.
- `content`: Nested list representing rows and columns.
- `title`: Title of the document (Will be set to None in this function)
- `url_or_path`: CSV file name.

Here's an example of the function in action:

```python
from SimplerLLM.tools.file_loader import read_csv_file

csv_data = read_csv_file("text.csv")

print(csv_data)
```

When you print the csv_data as is it will return the whole `CSVDocument` object with all its attributes. However, if you want access for example only the content of the file, here's how you do it:

```python
from SimplerLLM.tools.file_loader import read_csv_file

csv_data = read_csv_file("text.csv")

print(csv_data.content)
```

Use the same method for accessing the other attributes.
Here's another example on how to access the column count:

```python
from SimplerLLM.tools.file_loader import read_csv_file

csv_data = read_csv_file("text.csv")

print(csv_data.column_count)
```

## Generic Loading Of Other File Types

This generic loader supports a ton of file types which are:
- Web Articles
- YouTube video transcripts
- Traditional formats like TXT, PDF, CSV, and DOCX.

The `load_content` function takes the file name as input, and returns a `Text Document` object that has the following attributes:
- `file_size`: The size of the file in bytes.
- `word_count`: The number of words in the file.
- `character_count`: The number of characters in the file.
- `content`: String representing the contents of the file.
- `title`: Title of the document (if it has one)
- `url_or_path`: file name.

Here's an example of the function in action:

```python
from SimplerLLM.tools.generic_loader import load_content

file_data = load_content("file_name.csv")

print(file_data)
```

When you print the file_data as is it will return the whole `Text Document` object with all its attributes. However, if you want access for example only the content of the file, here's how you do it:

```python
from SimplerLLM.tools.generic_loader import load_content

file_data = load_content("file_name.csv")

print(file_data.content)
```

Use the same method for accessing the other attributes.
Here's another example on how to access the word count:

```python
from SimplerLLM.tools.generic_loader import load_content

file_data = load_content("file_name.csv")

print(file_data.word_count)
```

That's how you can benefit from SimplerLLM to make interaction with files Simpler!
Loading