This project focuses on implementing Named Entity Recognition (NER) to extract meaningful entities from text data. NER is a subtask of Natural Language Processing (NLP) that identifies and categorizes entities such as names, organizations, locations, dates, and more.
- Preprocessing of raw text data.
- Implementation of NER using state-of-the-art models.
- Support for custom entity types.
- Evaluation metrics for model performance.
- Clone the repository:
git clone https://github.com/your-username/ner-paragraph.git
- Navigate to the project directory:
cd ner-paragraph - Install dependencies:
pip install -r requirements.txt
- Prepare your input text file.
- Run the NER script:
python ner.py --input input.txt --output output.json
- View the extracted entities in the output file.
Input:
Barack Obama was born in Hawaii and served as the 44th President of the United States.
Output:
{
"PERSON": ["Barack Obama"],
"LOCATION": ["Hawaii", "United States"],
"ORDINAL": ["44th"]
}- Python
- SpaCy / Hugging Face Transformers
- NLTK / Custom Preprocessing
Contributions are welcome! Please fork the repository and submit a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.
For questions or suggestions, feel free to reach out:
- Email: tuyenvt455@gmail.com
- GitHub: your-username