Welcome to the repository for our group project in the PMLDL subject at Innopolis University. This project is based on a Kaggle competition aimed at building an NLP model to classify Twitter messages sent during emergency situations.
Twitter has become a critical platform for real-time communication during emergencies. Many organizations, such as disaster relief agencies and news media, are interested in automatically monitoring and classifying these tweets to respond quickly and effectively.
In this project, we aim to develop a machine learning model that can classify whether a tweet is related to a disaster or not. We are using the Kaggle dataset provided for this competition.
The dataset contains Twitter messages that are labeled as either "disaster-related" or "non-disaster-related". Our goal is to train a model using this labeled data to predict the classification of new, unseen tweets. The data is relatively small, making it perfect for this kind of classification task.
- Dataset source: Kaggle - NLP with Disaster Tweets
-
Clone the repository:
git clone https://github.com/IVproger/PMDL-DisasterTweets.git cd PMDL-DisasterTweets -
Install dependencies: Python 3.11 and Poetry 1.8.3
Configure
.venvlocation$ poetry config virtualenvs.in-project trueCreate
.venvwith Python 3.11 (make sure you have it installed)$ poetry env use python3.11
Install dependencies
$ poetry install
-
Download the dataset from the Kaggle competition and place it in the
data/rawfolder:- Sign in to Kaggle and download the dataset from this link.
- Preprocessing:
- Modeling:
- Evaluation:
- Kaggle Notebooks:
- GitHub repos:
- Libraries:
- Ivan Golov
- Alexey Shulmin
- Ilnaz Magizov
This project is licensed under the MIT License