IShowSpeed Twitter Sentiment Analysis

📌 Project Overview

This project presents an end-to-end Twitter sentiment analysis pipeline built around public reactions to IShowSpeed’s African tour.

The pipeline covers:

Large-scale Twitter data collection using an unofficial scraping approach
Data cleaning and preprocessing
Sentiment analysis using VADER and a pretrained RoBERTa transformer
Comparative evaluation of both models
Temporal sentiment trend analysis

🎯 Objectives

Collect English-language tweets related to the African tour
Design a resilient scraping workflow capable of running for extended periods
Compare lexicon-based and transformer-based sentiment models
Analyze how public sentiment evolved over time
Visualize sentiment distributions and trends

🔎 Data Collection & Scraping

Twitter data was collected using an unofficial scraping method executed on a Linux virtual machine, enabling long-running data collection across multiple days.

The following query was used:

("ishowspeed" OR "iShowSpeed")
-is:retweet
-filter:replies
lang:en
since:2026-01-07
until:2026-01-28

Scraping Characteristics

English-language tweets only
Retweets and replies excluded
Date range aligned with the tour timeline
Fault-tolerant execution to handle rate limiting and connection interruptions

Scraping was intentionally decoupled from analysis to allow reliable data acquisition over extended periods.

📦 Dataset

Collected fields include:

Tweet text
Timestamp
Language
Engagement metrics (likes, retweets, views, etc.)
Basic metadata required for analysis

All personally identifiable information (PII) was removed prior to publication.

🧠 Sentiment Analysis Models

1️⃣ VADER (Lexicon-Based)

VADER is a rule-based sentiment analyzer optimized for social media text.

Advantages

Fast and lightweight
Interpretable scoring

Limitations

Limited contextual understanding
Struggles with sarcasm and complex language

2️⃣ RoBERTa (Transformer-Based)

Sentiment classification was performed using the pretrained model:

cardiffnlp/twitter-roberta-base-sentiment

Built on RoBERTa, this model:

Leverages self-attention for contextual understanding
Was fine-tuned on Twitter data
Classifies sentiment as negative, neutral, or positive

Inference was executed using batched GPU processing with checkpointing, allowing the process to resume seamlessly after interruptions.

📊 Analysis & Visualization

The following analyses were conducted:

Sentiment distribution comparison between VADER and RoBERTa
Daily sentiment trends over the tour period
Cross-model comparison of classification behavior

Visual outputs include:

Pie charts for sentiment distribution
Line charts for temporal sentiment evolution

⚙️ How to Run the Project

1️⃣ Install Dependencies

pip install -r requirements.txt

2️⃣ Run the Scraper

python scraper.py

Output:

data/raw_tweets.csv

⚖️ Ethical Considerations

Only publicly available tweets were collected
All direct and indirect personal identifiers were removed
Analysis focuses on aggregated sentiment patterns
This project is intended for educational and analytical purposes only

🚀 Future Work

Domain-specific fine-tuning of RoBERTa
Topic modeling alongside sentiment
Country-level sentiment segmentation
Engagement-weighted sentiment analysis

🧾 Key Takeaway

This project demonstrates a robust, end-to-end NLP pipeline combining data engineering, classical NLP, and modern transformer-based modeling to analyze real-world social media discourse.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
logger.py		logger.py
requirement.txt		requirement.txt
scraper.py		scraper.py
sentiment_analysis.ipynb		sentiment_analysis.ipynb
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IShowSpeed Twitter Sentiment Analysis

📌 Project Overview

🎯 Objectives

🔎 Data Collection & Scraping

Scraping Characteristics

📦 Dataset

🧠 Sentiment Analysis Models

1️⃣ VADER (Lexicon-Based)

2️⃣ RoBERTa (Transformer-Based)

📊 Analysis & Visualization

⚙️ How to Run the Project

1️⃣ Install Dependencies

2️⃣ Run the Scraper

⚖️ Ethical Considerations

🚀 Future Work

🧾 Key Takeaway

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

IShowSpeed Twitter Sentiment Analysis

📌 Project Overview

🎯 Objectives

🔎 Data Collection & Scraping

Scraping Characteristics

📦 Dataset

🧠 Sentiment Analysis Models

1️⃣ VADER (Lexicon-Based)

2️⃣ RoBERTa (Transformer-Based)

📊 Analysis & Visualization

⚙️ How to Run the Project

1️⃣ Install Dependencies

2️⃣ Run the Scraper

⚖️ Ethical Considerations

🚀 Future Work

🧾 Key Takeaway

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages