Welcome to the DocChat project! This repository provides a chatbot interface for processing documents using advanced machine learning techniques. Follow the instructions below to set up the project on your local machine.
- Prerequisites
- Cloning the Project
- Setting Up the Virtual Environment
- Installing Dependencies
- Setting Up Environment Variables
- Running the Application
- Database Setup (Optional)
- Testing and Validation
- Deactivating the Virtual Environment
Before setting up the project, ensure you have the following software and tools installed:
- Python 3.8 or later: Download Python from python.org.
- Virtual Environment Tool: Recommended tools are
venvorvirtualenvto manage project dependencies in an isolated environment. - Git: Version control tool for cloning the repository.
- pip: Python package manager, typically bundled with Python.
Start by cloning the project repository. Open a terminal window and run:
git clone https://github.com/SushainDevi/DocChat.gitOnce the project is cloned, create a virtual environment to manage dependencies:
cd DocChat
python3 -m venv .venvAfter creating the virtual environment, activate it using the following commands:
-
On Windows:
.\.venv\Scripts\activate
-
On MacOS/Linux:
source .venv/bin/activate
With the virtual environment activated, install the required dependencies listed in the requirements.txt file:
pip install -r requirements.txtThis will ensure that all necessary Python libraries, including streamlit, PyPDF2, transformers, and others, are installed correctly.
The project uses environment variables for secure and flexible configurations, such as API keys and access tokens.
- Open the
.envfile located in the project root directory. - Replace placeholders with the actual values (e.g., Hugging Face token, API keys). Example
.envfile format:HF_Token=<Your_Hugging_Face_Token> API_KEY=<Your_API_Key>
- Save the
.envfile after updating the values.
The core application is powered by Streamlit, which serves the chatbot interface. To run the chatbot, follow these steps:
- Open the terminal.
- Ensure the virtual environment is activated.
- Run the Streamlit app:
streamlit run st-Qwen1.5–110B-Chat.pyThis command will launch the Streamlit interface, accessible in your browser (typically at http://localhost:8501/ by default).
The project uses SQLite for storing chatbot history (chatbot_history.db). No additional setup is required, as the database is automatically created and managed by the project scripts. However, if needed, you can inspect or modify the database using SQLite tools.
After the setup, you can test the application by:
- Uploading documents (e.g., PDFs or DOCX files) to check the text extraction and summarization process.
- Interacting with the chatbot for document-related queries and responses.
- Requesting the generation of files (PDFs or DOCX) based on user prompts.
Once you're done working on the project, you can deactivate the virtual environment by running:
deactivateThis ensures that your global Python environment remains unaffected by project-specific dependencies.