Arabic & Multilingual Aspect-Based Sentiment Analysis with MARBERT

A Natural Language Processing project for aspect-based sentiment analysis (ABSA) on Arabic and multilingual customer reviews.

The model detects which business aspects are mentioned in a review, then predicts the sentiment for each detected aspect.

Supported aspects include:

Food
Service
Price
Cleanliness
Delivery
Ambiance
App experience
General opinion

The project uses MARBERT because it is well suited for Arabic, dialectal Arabic, and informal social-media-style text.

Project Overview

Traditional sentiment analysis usually predicts one overall label for a full review.

This project goes deeper by answering questions such as:

The customer liked the food, but did they complain about the price?
Was the delivery experience negative even though the app experience was positive?

For each review, the model returns a structured JSON output like:

{
  "review_id": "example_001",
  "aspects": ["food", "service", "price"],
  "aspect_sentiments": {
    "food": "positive",
    "service": "negative",
    "price": "negative"
  }
}

Key Features

Arabic and dialectal Arabic support
Mixed Arabic-English review handling
Egyptian Arabic and informal review examples
Aspect detection and sentiment classification
JSON output for dashboards, APIs, or competition submissions
Gradio demo interface for public presentation
Kaggle-friendly training workflow

Model Approach

The task is transformed into a pairwise classification problem.

For every review, the model checks each possible aspect separately.

Example:

Review:
الخدمة ممتازة والأكل سيء

Aspect:
food

The model predicts one of the following labels:

not_mentioned
positive
negative
neutral

This allows the system to identify both:

Whether an aspect is mentioned
The sentiment toward that aspect

Repository Structure

arabic-multilingual-absa-marbert/
│
├── README.md
├── requirements.txt
├── multilingual-nlp-marbert.ipynb
├── gradio_absa_demo_app.py
├── demo_examples.md
└── .gitignore

Recommended optional folders:

images/
outputs/

Do not upload large trained model files or full Kaggle datasets directly to GitHub.

Dataset

The project was developed on Kaggle using the DeepX NLP dataset.

Expected Kaggle dataset path:

DATA_DIR = "/kaggle/input/datasets/kareemmaged/deepx-nlp"

Expected files:

train_fixed(Sheet1).csv
validation_fixed(Sheet1).csv
unlabeled_fixed(Sheet1).csv
DeepX_hidden_test (4).xlsx

The training and validation files contain labeled aspect and sentiment columns.

The unlabeled and hidden test files contain review text and metadata, and the trained model generates the predictions.

How to Run on Kaggle

1. Open the notebook

Upload or open:

multilingual-nlp-marbert.ipynb

2. Enable Kaggle settings

In the Kaggle notebook settings:

Accelerator: GPU
Internet: On

Internet is needed for downloading the pretrained MARBERT model and for launching a public Gradio demo link.

3. Install dependencies

The notebook includes an installation cell, but you can also install dependencies manually:

pip install -r requirements.txt

4. Check dataset paths

Inside the notebook, make sure the paths match your Kaggle dataset location:

DATA_DIR = "/kaggle/input/datasets/kareemmaged/deepx-nlp"

TRAIN_FILE = f"{DATA_DIR}/train_fixed(Sheet1).csv"
TEST_FILE = f"{DATA_DIR}/validation_fixed(Sheet1).csv"

For final prediction, change TEST_FILE to:

TEST_FILE = f"{DATA_DIR}/unlabeled_fixed(Sheet1).csv"

or:

TEST_FILE = f"{DATA_DIR}/DeepX_hidden_test (4).xlsx"

5. Run all notebook cells

The notebook will:

Load the dataset
Preprocess review text
Convert reviews into aspect-level training rows
Fine-tune MARBERT
Evaluate on validation data
Tune the aspect detection threshold
Save the best model and prediction artifacts

The trained model is saved to:

/kaggle/working/marbert_absa_output/best_model

The metadata/artifacts file is saved to:

/kaggle/working/marbert_absa_output/artifacts.json

Gradio Demo

The repository includes a Gradio interface:

gradio_absa_demo_app.py

The demo allows users to paste a review and see:

Detected aspects
Sentiment for each aspect
Confidence scores
Final JSON prediction
All-aspect confidence table

To run it on Kaggle after training:

python gradio_absa_demo_app.py

The app should launch using:

demo.queue().launch(share=True, debug=True)

share=True creates a public demo link that can be used for recording or sharing.

Demo Examples

Arabic Review

الخدمة كانت محترمة جدا والمكان رايق، بس الأكل وصل بارد واللحمة ناشفة والسعر مبالغ فيه

Expected behavior:

Service: positive
Ambiance: positive
Food: negative
Price: negative

Mixed Arabic and English

The app is smooth and checkout was fast, but el delivery اتأخر ساعتين والدعم مردش عليا خالص.

Expected behavior:

App experience: positive or neutral
Delivery: negative
Service: negative

Egyptian Arabic Franko

el akl kan gamed awy bas el service slow w el bill kan aghla mn el expected.

Expected behavior:

Food: positive
Service: negative
Price: negative

Output Format

The model returns predictions in this format:

{
  "review_id": "review_001",
  "aspects": ["food", "service"],
  "aspect_sentiments": {
    "food": "positive",
    "service": "negative"
  }
}

If no aspect is detected, the output becomes:

{
  "review_id": "review_001",
  "aspects": ["none"],
  "aspect_sentiments": {}
}

Notes and Limitations

The model is strongest on Arabic-script and dialectal Arabic reviews.
Franco-Arabic examples may work, but performance depends on how much similar text exists in the training data.
Very short or sarcastic reviews can be harder to classify.
The detection threshold affects how many aspects are included in the final output.
The trained model files are not included in this repository because they can be large.

Future Improvements

Possible next steps:

Add more labeled Franco-Arabic examples
Add model comparison with AraBERT or XLM-R
Deploy the Gradio demo on Hugging Face Spaces
Add explainability for aspect predictions
Add a dashboard for analyzing review trends by business or category

Author

Created by Kareem Maged as part of a multilingual NLP project focused on Arabic customer review understanding.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arabic & Multilingual Aspect-Based Sentiment Analysis with MARBERT

Project Overview

Key Features

Model Approach

Repository Structure

Dataset

How to Run on Kaggle

1. Open the notebook

2. Enable Kaggle settings

3. Install dependencies

4. Check dataset paths

5. Run all notebook cells

Gradio Demo

Demo Examples

Arabic Review

Mixed Arabic and English

Egyptian Arabic Franko

Output Format

Notes and Limitations

Future Improvements

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
gradio_absa_demo_app.py		gradio_absa_demo_app.py
multilingual-nlp-marbert.ipynb		multilingual-nlp-marbert.ipynb
requirements.txt		requirements.txt
submission.json		submission.json

Folders and files

Latest commit

History

Repository files navigation

Arabic & Multilingual Aspect-Based Sentiment Analysis with MARBERT

Project Overview

Key Features

Model Approach

Repository Structure

Dataset

How to Run on Kaggle

1. Open the notebook

2. Enable Kaggle settings

3. Install dependencies

4. Check dataset paths

5. Run all notebook cells

Gradio Demo

Demo Examples

Arabic Review

Mixed Arabic and English

Egyptian Arabic Franko

Output Format

Notes and Limitations

Future Improvements

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages