PDF Chat Assistant 📚

A Streamlit-based application that allows users to upload PDF documents and interact with their content using natural language queries. The app uses OpenAI's GPT-4 model and FAISS vector storage for efficient document retrieval and question answering.

Features

PDF document upload and processing
Natural language querying of PDF content
Conversation memory for context-aware responses
Response caching for improved performance
Cost estimation for embeddings and queries
Clear chat history functionality
Vector store persistence for faster subsequent loads

Prerequisites

Python 3.8 or higher
OpenAI API key

Installation

Clone the repository:

git clone https://github.com/rajat343/multiple_pdf_qa.git
cd multiple_pdf_qa

Create and activate a virtual environment (recommended):

python -m venv venv
# On Windows
venv\Scripts\activate
# On macOS/Linux
source venv/bin/activate

Install the required packages:

pip install -r requirements.txt

Create a .env file in the project root directory and add your OpenAI API key:

OPENAI_API_KEY=your_api_key_here

Usage

Start the Streamlit app:

streamlit run app.py

Open your web browser and navigate to the URL shown in the terminal (typically http://localhost:8501).
Upload one or more PDF documents using the file uploader.
Start asking questions about the content of your PDFs in the chat interface.

Project Structure

pdf-chat-assistant/
├── app.py              # Main application file
├── requirements.txt    # Python dependencies
├── .env               # Environment variables
├── uploaded_pdfs/     # Directory for stored PDFs
└── embeddings/        # Directory for stored vector embeddings

Features in Detail

Document Processing

PDFs are processed once and stored locally
Document embeddings are cached for faster subsequent loads
Multiple PDFs can be processed simultaneously

Conversation Management

Maintains conversation history for context-aware responses
Allows clearing of chat history
Caches question-answer pairs for improved performance

Cost Management

Displays estimated costs for embeddings generation
Shows per-query costs for API usage
Uses efficient retrieval methods to minimize API calls

Technical Implementation

The application uses:

langchain for document processing and chat chain management
FAISS for efficient vector similarity search
OpenAI's GPT-4 for generating responses
Streamlit for the web interface
Document hashing for efficient storage and retrieval

Limitations

PDF processing may take longer for large documents
API costs can accumulate with heavy usage
Requires stable internet connection for API calls
Maximum token limit applies based on GPT-4 model constraints

Cost Considerations

The application uses OpenAI's API which has associated costs:

Embedding generation: $0.0001 per 1K tokens
Query processing:
- Input: $0.0015 per 1K tokens
- Output: $0.002 per 1K tokens

Contributing

Feel free to submit issues, fork the repository, and create pull requests for any improvements.

Acknowledgments

Built with Streamlit
Uses LangChain for document processing
Powered by OpenAI GPT-4

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Chat Assistant 📚

Features

Prerequisites

Installation

Usage

Project Structure

Features in Detail

Document Processing

Conversation Management

Cost Management

Technical Implementation

Limitations

Cost Considerations

Contributing

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PDF Chat Assistant 📚

Features

Prerequisites

Installation

Usage

Project Structure

Features in Detail

Document Processing

Conversation Management

Cost Management

Technical Implementation

Limitations

Cost Considerations

Contributing

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages