Skip to content

This project is part of the PMDL subject at Innopolis University. It focuses on Natural Language Processing (NLP) using Twitter data to classify messages during emergencies. The competition aims to develop a model that can programmatically identify disaster-related tweets to assist agencies such as disaster relief organizations and news outlets.

Notifications You must be signed in to change notification settings

IVproger/PMDL-DisasterTweets

Repository files navigation

NLP Disaster Tweets Classification Project

Welcome to the repository for our group project in the PMLDL subject at Innopolis University. This project is based on a Kaggle competition aimed at building an NLP model to classify Twitter messages sent during emergency situations.

Project Overview

Twitter has become a critical platform for real-time communication during emergencies. Many organizations, such as disaster relief agencies and news media, are interested in automatically monitoring and classifying these tweets to respond quickly and effectively.

In this project, we aim to develop a machine learning model that can classify whether a tweet is related to a disaster or not. We are using the Kaggle dataset provided for this competition.

Dataset

The dataset contains Twitter messages that are labeled as either "disaster-related" or "non-disaster-related". Our goal is to train a model using this labeled data to predict the classification of new, unseen tweets. The data is relatively small, making it perfect for this kind of classification task.

Getting Started

  1. Clone the repository:

    git clone https://github.com/IVproger/PMDL-DisasterTweets.git
    cd PMDL-DisasterTweets
  2. Install dependencies: Python 3.11 and Poetry 1.8.3

    Configure .venv location

    $ poetry config virtualenvs.in-project true

    Create .venv with Python 3.11 (make sure you have it installed)

    $ poetry env use python3.11

    Install dependencies

    $ poetry install 
  3. Download the dataset from the Kaggle competition and place it in the data/raw folder:

    • Sign in to Kaggle and download the dataset from this link.

Approach and Methodology

  • Preprocessing:
  • Modeling:
  • Evaluation:

Tools

  • Kaggle Notebooks:
  • GitHub repos:
  • Libraries:

Contributors

  • Ivan Golov
  • Alexey Shulmin
  • Ilnaz Magizov

License

This project is licensed under the MIT License

About

This project is part of the PMDL subject at Innopolis University. It focuses on Natural Language Processing (NLP) using Twitter data to classify messages during emergencies. The competition aims to develop a model that can programmatically identify disaster-related tweets to assist agencies such as disaster relief organizations and news outlets.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages