Skip to content

StudentTraineeCenter/seo-assistant

Repository files navigation

SEO Assistant (CMS Integration) 🚀

License Python Azure Functions Terraform

SEO Assistant is a Serverless REST API designed to validate content within CMS systems against modern SEO standards. It provides real-time feedback, detailed validation reports, and a scoring system to help editors optimize their articles before publication.

Note: This repository includes a "Dummy CMS" web interface (/web) solely for demonstration and testing purposes. The core product is the REST API intended for integration into full-featured CMS platforms.


🏗 Architecture

The system is designed for minimal operational costs, zero maintenance, and easy integration.

  • Backend: Serverless REST API built on Azure Functions (Python 3.10).
  • Infrastructure as Code: Fully automated infrastructure provisioning using Terraform.
  • Deployment Options:
    • Cloud: Azure (Function App + Static Web App).
    • Self-Hosted: Docker container support for on-premise usage.
  • Code Quality: Adheres to PEP 8, PEP 484 (Type Hints), and PEP 257 (Docstrings).

✨ Features & Validations

The assistant checks SEO aspects across three categories.

1. Metadata & SERP

Evaluates how the page appears in search results.

  • Title Tag: Checks length (30-60 chars) and keyword presence.
  • Meta Description: Checks length (120-155 chars) and keyword presence.
  • URL Slug: Validates format (lowercase, hyphens only, max 60 chars, no diacritics). Crucial for SERP ranking.

2. Technical Structure

Ensures the content is semantically correct and accessible.

  • Headings: Verifies single <h1> and correct hierarchy (h1 -> h2 -> h3).
  • Structured Elements: Checks for presence of lists (<ul>/<ol>) and bold text (<strong>) for snippets.
  • Alt Tags: checks if all images have alt attributes and aren't excessively long.
  • Filenames: Validates image filenames against generic patterns (e.g., IMG_1234.jpg) to ensure descriptive names.

3. Content Quality & Readability

Analyzes the text itself.

  • Sentence Length: Warns if >25% of sentences are longer than 20 words.
  • Keyword in Intro: Ensures the focus keyword appears in the first 100 words.
  • Paragraph Length: Warns on "Wall of Text" paragraphs (>700 chars).
  • Duplicate Content: Uses SimHash to compare the article against existing content hashes to prevent cannibalization.

🏆 Scoring Algorithm

The system provides a score from 0 to 100.

Calculation Logic:

  1. Raw Score (Sum of points):
    • Metadata: Title (+10), Meta Desc (+10), URL Slug (+10)
    • Structure: Headings (+10), Elements (+5), Alt Tags (+10), Filenames (+10)
    • Content: Keyword Intro (+15), Duplicates (+10), Paragraphs (+5), Sentences (+5)
  2. Penalty Coefficient:
    • If 0 FAILs (critical errors): Coefficient = 1.0
    • If ≥1 FAIL: Coefficient = 0.45

Final Score = Raw Score × Penalty Coefficient

This ensures that an article with even a single critical error (e.g., invalid URL) never scores above 45/100.


🛠 Getting Started

Prerequisites

🐳 Local Development (Docker)

You can run the full stack (API + Web) locally using Docker Compose.

docker-compose up --build
  • API: http://localhost:8080/api/seo-check
  • Web Interface: http://localhost:3000

☁️ Azure Deployment

We use a hybrid approach: Terraform for infrastructure, and CLI tools for code deployment.

1. Provision Infrastructure

cd infra
terraform init
terraform apply

Make a note of the function_app_name and static_web_app_default_hostname from the outputs.

2. Deploy Backend (Azure Functions)

# From the root directory
func azure functionapp publish <function_app_name> --python

3. Deploy Frontend (Static Web App)

You will need the deployment token from the Azure Portal (or via CLI) for the Static Web App created by Terraform.

swa deploy ./web --env production --deployment-token <YOUR_TOKEN>

🔌 API Reference

Endpoint: POST /api/seo-check

Request Body:

{
  "title": "Best Cat Food 2025",
  "meta_description": "Guide to the best cat food...",
  "url_slug": "best-cat-food-2025",
  "keyword": "cat food",
  "content_html": "<h1>Best Cat Food</h1><p>...</p>",
  "existing_article_hashes": ["0x123abc..."]
}

Response:

{
  "score": 95,
  "max_score": 100,
  "final_score": 95.0,
  "issues": [
    {
      "severity": "WARNING",
      "message": "Title is slightly short...",
      "location": "Title Tag"
    }
  ]
}

🤝 Contributing

Contributions are welcome! Please ensure your code strictly adheres to the following standards:

  1. PEP 8 (Style Guide)
  2. PEP 484 (Type Hints)
  3. PEP 257 (Docstring Conventions)

Before submitting a Pull Request, please verify your code:

  • Style & Docstrings: python -m ruff check app
  • Type Checking: python -m mypy app
  • Unit Tests: python -m unittest discover tests

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Contributors