GitHub - Rroopesh55/Sentiment_Analysis_and_LLM: Google Reviews sentiment classifier interpretable TF-IDF + NB with EDA and an LLM (Gemini) baseline.

Sentiment Analysis of Google Reviews using Traditional ML and using LLM

A clean, reproducible notebook project for classifying sentiment in Google reviews using both a traditional ML pipeline (TF–IDF + Multinomial Naive Bayes) and an LLM-assisted baseline (Gemini). This README documents the dataset, methodology, environment setup, and how to run/evaluate/infer with the project at a professional standard.

Project Overview

Google Reviews dataset (reviews.csv) via gdown (or lets you provide your own).
Performs quick EDA on score distribution.
Maps review score → sentiment (negative, neutral, positive).
Builds a reproducible ML pipeline: text preprocessing → TF–IDF vectorization → MultinomialNB classifier.
Offers interactive inference for ad-hoc sentiment checks.
Includes an optional LLM baseline using Google Gemini for comparison.

TF–IDF + MultinomialNB is a strong baseline for short-text classification, fast to train, interpretable, and easy to deploy. The LLM baseline demonstrates zero-shot/ICL-style sentiment classification on the same inputs.

Model Details

Preprocessing: lowercase → punctuation removal → English stopwords → lemmatization/stemming
Vectorizer: TfidfVectorizer (consider ngram_range=(1,2), min_df, max_df)
Classifier: MultinomialNB (fast & robust for sparse text)

LLM Baseline (Optional)

Model: gemini-pro via google-generativeai
Prompt: outputs normalized JSON: {"sentiment":"positive|neutral|negative"}
Note: Pass API key via environment variable and avoid sending PII.

Extending the Project

Better features: character n-grams, domain stopwords.
Hyperparameter search: GridSearchCV / RandomizedSearchCV.
Robust evaluation: stratified CV, calibration.
Error analysis: inspect FP/FN; word clouds per class.
Modern embeddings: sentence-transformers + linear classifier.
Ship it: FastAPI inference service + basic CI tests.

Limitations & Ethics

The score→sentiment mapping is heuristic.
Reviews can include sarcasm, code-switching, or multilingual text.
Handle PII responsibly

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
README.MD		README.MD
Sentiment Analysis of Google Reviews.ipynb		Sentiment Analysis of Google Reviews.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

Rroopesh55/Sentiment_Analysis_and_LLM

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages