Skip to content

Google Reviews sentiment classifier interpretable TF-IDF + NB with EDA and an LLM (Gemini) baseline.

Notifications You must be signed in to change notification settings

Rroopesh55/Sentiment_Analysis_and_LLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Sentiment Analysis of Google Reviews using Traditional ML and using LLM

A clean, reproducible notebook project for classifying sentiment in Google reviews using both a traditional ML pipeline (TF–IDF + Multinomial Naive Bayes) and an LLM-assisted baseline (Gemini). This README documents the dataset, methodology, environment setup, and how to run/evaluate/infer with the project at a professional standard.

Project Overview

  • Google Reviews dataset (reviews.csv) via gdown (or lets you provide your own).
  • Performs quick EDA on score distribution.
  • Maps review score → sentiment (negative, neutral, positive).
  • Builds a reproducible ML pipeline: text preprocessing → TF–IDF vectorization → MultinomialNB classifier.
  • Offers interactive inference for ad-hoc sentiment checks.
  • Includes an optional LLM baseline using Google Gemini for comparison.

TF–IDF + MultinomialNB is a strong baseline for short-text classification, fast to train, interpretable, and easy to deploy. The LLM baseline demonstrates zero-shot/ICL-style sentiment classification on the same inputs.

Model Details

  • Preprocessing: lowercase → punctuation removal → English stopwords → lemmatization/stemming
  • Vectorizer: TfidfVectorizer (consider ngram_range=(1,2), min_df, max_df)
  • Classifier: MultinomialNB (fast & robust for sparse text)

LLM Baseline (Optional)

  • Model: gemini-pro via google-generativeai
  • Prompt: outputs normalized JSON: {"sentiment":"positive|neutral|negative"}
  • Note: Pass API key via environment variable and avoid sending PII.

Extending the Project

  • Better features: character n-grams, domain stopwords.
  • Hyperparameter search: GridSearchCV / RandomizedSearchCV.
  • Robust evaluation: stratified CV, calibration.
  • Error analysis: inspect FP/FN; word clouds per class.
  • Modern embeddings: sentence-transformers + linear classifier.
  • Ship it: FastAPI inference service + basic CI tests.

Limitations & Ethics

  • The score→sentiment mapping is heuristic.
  • Reviews can include sarcasm, code-switching, or multilingual text.
  • Handle PII responsibly

About

Google Reviews sentiment classifier interpretable TF-IDF + NB with EDA and an LLM (Gemini) baseline.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published