Skip to content

INESCTEC/citilink-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CitiLink Logo

CitiLink - Enhancing Municipal Transparency and Citizen Engagement through Searchable Meeting Minutes

License: CC BY-NC-ND 4.0 Python 3.10 Docker Flask React MongoDB TailwindCSS LLM: Gemini 2.0 Flash

Official repository for the ECIR 2026 demo paper: "CitiLink: Enhancing Municipal Transparency and Citizen Engagement through Searchable Meeting Minutes".

This repository contains the code and instructions to run the CitiLink platform demo locally and reproduce the data extraction pipeline.

Try the Live Demo: https://demo.citilink.inesctec.pt/en

Overview

CitiLink demonstrates how Natural Language Processing (NLP) and Information Retrieval (IR) techniques can transform unstructured municipal meeting minutes into accessible, searchable, and transparent public records.

The Problem: Municipal meeting minutes are often lengthy, unstructured documents that are difficult to navigate and search, creating barriers to transparency and civic engagement.

Our Solution: CitiLink uses an LLM-based extraction pipeline (Gemini 2.0 Flash) to automatically extract structured information from PDF meeting minutes—including metadata, discussion subjects, and voting outcomes—and presents them through an intuitive, searchable web interface.

Demonstration Scope: The platform processes meeting minutes from 6 Portuguese municipalities: Alandroal, Campo Maior, Covilhã, Fundão, Guimarães, and Porto.

Project Status: The platform is under active development with a fully functional demo available online.

Live Demo

Access the platform: https://demo.citilink.inesctec.pt/en

The online demo features 120 processed meeting minutes demonstrating the full capabilities of the system, including:

  • Full-text search across all documents
  • Faceted filtering by municipality, date, topic, and participants
  • Structured visualization of meetings, subjects, and voting outcomes
  • Topic-based exploration and navigation

Architecture

CitiLink Architecture

The CitiLink architecture combines a data extraction pipeline powered by an LLM (Gemini 2.0 Flash), a Flask-based API, a React front-end web application, and a restricted back office for human-in-the-loop validation (available in the online demo).

Each meeting minute is processed through the LLM with prompt engineering to extract metadata, discussion subjects, and voting outcomes. Extracted entities are cross-referenced with predefined database collections to ensure consistency. All processed data is stored in a MongoDB Atlas instance, enabling full-text and faceted search capabilities.

The React-based front end allows users to explore minutes by municipality, topic, or participant, while the Flask API provides structured access to the processed data.

Technology Stack

  • Languages: Python, JavaScript
  • Frameworks: Flask, React, Tailwind CSS
  • Database: MongoDB Atlas
  • Infrastructure: Docker, Vite, Nginx
  • AI/ML: Google Gemini 2.0 Flash

Dataset

This repository includes 6 meeting minutes (one from each municipality: Alandroal, Campo Maior, Covilhã, Fundão, Guimarães, and Porto) for local experimentation with the processing pipeline.

The complete dataset with 120 meeting minutes, used in the online demo, is available in a separate repository: https://github.com/inesctec/citilink-dataset

Running the Demo Locally

Prerequisites

  • Docker and Docker Compose installed
  • Git for cloning the repository

Quick Start

# Clone the repository
git clone https://github.com/inesctec/citilink-demo.git
cd citilink-demo

# Navigate to platform directory
cd platform

# Start all services
docker-compose up -d

The platform will be available at:

The Docker Compose setup includes MongoDB database with sample meeting minutes, Flask backend API, React frontend application, and Nginx reverse proxy.

Stopping the Demo

# Stop all services
docker-compose down

# Stop and remove all data
docker-compose down -v

Data Extraction Pipeline

Detailed Documentation: For comprehensive instructions including database management, troubleshooting, and advanced processing options, see data_extraction/README.md.

To process additional meeting minutes or reproduce the extraction pipeline locally:

Setup

# Navigate to data extraction directory
cd data_extraction

# Install dependencies (listed in requirements.txt)
pip install -r requirements.txt

Configuration

Create a .env file with your settings:

# Google AI API Configuration
GOOGLE_API_KEY=your_google_api_key_here
MODEL_NAME=gemini-2.0-flash

# MongoDB Configuration
MONGO_URI=mongodb://localhost:27018
MONGO_DB=citilink_demo
MONGO_COLLECTION=atas

# Processing Settings
MAX_RETRIES=3
CHUNK_SIZE=20000
MAX_DOCUMENT_LENGTH=30000

Processing Documents

The repository includes one document from each municipality (six in total). To process documents:

# Process a specific municipality
python -m src.main --municipality Porto --years 2023

# Process multiple years
python -m src.main --municipality Porto --years 2021 2022 2023

# Limit number of documents
python -m src.main --municipality Guimarães --years 2023 --limit 5

Note: Gemini API rate limits may affect processing speed when handling large document batches.

Project Structure

citilink-demo/
├── data_extraction/          # Municipal minute document processing
│   ├── src/
│   │   ├── processors/       # Document processors
│   │   ├── models/           # Database schemas
│   │   ├── prompts/          # Gemini Prompts
│   │   └── utils/            # Utilities
│   ├── scripts/              # Management scripts
│   └── data/                 # Input documents
│
└── platform/                 # Web platform
    ├── backend/              # Flask API server
    ├── frontend/             # React application
    ├── nginx/                # Reverse proxy config
    ├── mongodb/              # MongoDB database config
    └── docker-compose.yml    # Docker compose setup

License

This project is licensed under the CC-BY-NC-ND-4.0 License - see the LICENSE file for details.

Acknowledgements

Development

The CitiLink platform was developed by the NLP&IR team at INESC TEC (Institute for Systems and Computer Engineering, Technology and Science).

Affiliations

Data Providers

We thank the municipalities of Alandroal, Campo Maior, Covilhã, Fundão, Guimarães, and Porto for providing their meeting minutes publicly and for their collaboration throughout the project.

Team

We acknowledge all team members and contributors who participated in the development, testing, and deployment of the CitiLink platform.

Funding

This work was funded within the scope of the project CitiLink, with reference 2024.07509.IACDC, which is co-funded by Component 5 - Capitalization and Business Innovation, integrated in the Resilience Dimension of the Recovery and Resilience Plan within the scope of the Recovery and Resilience Mechanism (MRR) of the European Union (EU), framed in the Next Generation EU, for the period 2021 - 2026, measure RE-C05-i08.M04 - "To support the launch of a programme of R&D projects geared towards the development and implementation of advanced cybersecurity, artificial intelligence and data science systems in public administration, as well as a scientific training programme", as part of the funding contract signed between the Recovering Portugal Mission Structure (EMRP) and the FCT - Fundação para a Ciência e a Tecnologia, I.P. (Portuguese Foundation for Science and Technology), as intermediary beneficiary.

Additional Resources

Contact

For questions, support, or collaboration inquiries:

Email: citilink@inesctec.pt

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published