This project presents an end-to-end Twitter sentiment analysis pipeline built around public reactions to IShowSpeed’s African tour.
The pipeline covers:
- Large-scale Twitter data collection using an unofficial scraping approach
- Data cleaning and preprocessing
- Sentiment analysis using VADER and a pretrained RoBERTa transformer
- Comparative evaluation of both models
- Temporal sentiment trend analysis
- Collect English-language tweets related to the African tour
- Design a resilient scraping workflow capable of running for extended periods
- Compare lexicon-based and transformer-based sentiment models
- Analyze how public sentiment evolved over time
- Visualize sentiment distributions and trends
Twitter data was collected using an unofficial scraping method executed on a Linux virtual machine, enabling long-running data collection across multiple days.
The following query was used:
("ishowspeed" OR "iShowSpeed")
-is:retweet
-filter:replies
lang:en
since:2026-01-07
until:2026-01-28
- English-language tweets only
- Retweets and replies excluded
- Date range aligned with the tour timeline
- Fault-tolerant execution to handle rate limiting and connection interruptions
Scraping was intentionally decoupled from analysis to allow reliable data acquisition over extended periods.
Collected fields include:
- Tweet text
- Timestamp
- Language
- Engagement metrics (likes, retweets, views, etc.)
- Basic metadata required for analysis
All personally identifiable information (PII) was removed prior to publication.
VADER is a rule-based sentiment analyzer optimized for social media text.
Advantages
- Fast and lightweight
- Interpretable scoring
Limitations
- Limited contextual understanding
- Struggles with sarcasm and complex language
Sentiment classification was performed using the pretrained model:
cardiffnlp/twitter-roberta-base-sentiment
Built on RoBERTa, this model:
- Leverages self-attention for contextual understanding
- Was fine-tuned on Twitter data
- Classifies sentiment as negative, neutral, or positive
Inference was executed using batched GPU processing with checkpointing, allowing the process to resume seamlessly after interruptions.
The following analyses were conducted:
- Sentiment distribution comparison between VADER and RoBERTa
- Daily sentiment trends over the tour period
- Cross-model comparison of classification behavior
Visual outputs include:
- Pie charts for sentiment distribution
- Line charts for temporal sentiment evolution
pip install -r requirements.txtpython scraper.pyOutput:
data/raw_tweets.csv
- Only publicly available tweets were collected
- All direct and indirect personal identifiers were removed
- Analysis focuses on aggregated sentiment patterns
- This project is intended for educational and analytical purposes only
- Domain-specific fine-tuning of RoBERTa
- Topic modeling alongside sentiment
- Country-level sentiment segmentation
- Engagement-weighted sentiment analysis
This project demonstrates a robust, end-to-end NLP pipeline combining data engineering, classical NLP, and modern transformer-based modeling to analyze real-world social media discourse.