Skip to content

Anzywiz/ishowspeed-twitter-sentiment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IShowSpeed Twitter Sentiment Analysis

📌 Project Overview

This project presents an end-to-end Twitter sentiment analysis pipeline built around public reactions to IShowSpeed’s African tour.

The pipeline covers:

  • Large-scale Twitter data collection using an unofficial scraping approach
  • Data cleaning and preprocessing
  • Sentiment analysis using VADER and a pretrained RoBERTa transformer
  • Comparative evaluation of both models
  • Temporal sentiment trend analysis

🎯 Objectives

  • Collect English-language tweets related to the African tour
  • Design a resilient scraping workflow capable of running for extended periods
  • Compare lexicon-based and transformer-based sentiment models
  • Analyze how public sentiment evolved over time
  • Visualize sentiment distributions and trends

🔎 Data Collection & Scraping

Twitter data was collected using an unofficial scraping method executed on a Linux virtual machine, enabling long-running data collection across multiple days.

The following query was used:

("ishowspeed" OR "iShowSpeed")
-is:retweet
-filter:replies
lang:en
since:2026-01-07
until:2026-01-28

Scraping Characteristics

  • English-language tweets only
  • Retweets and replies excluded
  • Date range aligned with the tour timeline
  • Fault-tolerant execution to handle rate limiting and connection interruptions

Scraping was intentionally decoupled from analysis to allow reliable data acquisition over extended periods.


📦 Dataset

Collected fields include:

  • Tweet text
  • Timestamp
  • Language
  • Engagement metrics (likes, retweets, views, etc.)
  • Basic metadata required for analysis

All personally identifiable information (PII) was removed prior to publication.


🧠 Sentiment Analysis Models

1️⃣ VADER (Lexicon-Based)

VADER is a rule-based sentiment analyzer optimized for social media text.

Advantages

  • Fast and lightweight
  • Interpretable scoring

Limitations

  • Limited contextual understanding
  • Struggles with sarcasm and complex language

2️⃣ RoBERTa (Transformer-Based)

Sentiment classification was performed using the pretrained model:

cardiffnlp/twitter-roberta-base-sentiment

Built on RoBERTa, this model:

  • Leverages self-attention for contextual understanding
  • Was fine-tuned on Twitter data
  • Classifies sentiment as negative, neutral, or positive

Inference was executed using batched GPU processing with checkpointing, allowing the process to resume seamlessly after interruptions.


📊 Analysis & Visualization

The following analyses were conducted:

  • Sentiment distribution comparison between VADER and RoBERTa
  • Daily sentiment trends over the tour period
  • Cross-model comparison of classification behavior

Visual outputs include:

  • Pie charts for sentiment distribution
  • Line charts for temporal sentiment evolution

⚙️ How to Run the Project

1️⃣ Install Dependencies

pip install -r requirements.txt

2️⃣ Run the Scraper

python scraper.py

Output:

data/raw_tweets.csv

⚖️ Ethical Considerations

  • Only publicly available tweets were collected
  • All direct and indirect personal identifiers were removed
  • Analysis focuses on aggregated sentiment patterns
  • This project is intended for educational and analytical purposes only

🚀 Future Work

  • Domain-specific fine-tuning of RoBERTa
  • Topic modeling alongside sentiment
  • Country-level sentiment segmentation
  • Engagement-weighted sentiment analysis

🧾 Key Takeaway

This project demonstrates a robust, end-to-end NLP pipeline combining data engineering, classical NLP, and modern transformer-based modeling to analyze real-world social media discourse.

About

End-to-end Twitter scraping and sentiment analysis pipeline comparing VADER and RoBERTa on iShowSpeed’s African tour.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors