Add native support for toxicity detection guardrail microservice (#1258)

daniel-de-leon-user293 · pre-commit-ci[bot] · lvliang-intel · web-flow · commit 625aec9e62f0 · 2025-02-21T13:20:58.000-08:00
* add opea native support for toxic-prompt-roberta * add test script back * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add comp name env variable * set default port to 9090 Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com> * add service to compose Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com> * removed debug print Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com> * remove triton version because habana updated Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com> * add locust results Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * skip warmup for halluc test Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com> --------- Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Liang Lv <liang1.lv@intel.com> Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
diff --git a/comps/guardrails/deployment/docker_compose/compose.yaml b/comps/guardrails/deployment/docker_compose/compose.yaml
@@ -20,6 +20,19 @@ services:
       HUGGINGFACEHUB_API_TOKEN: ${HF_TOKEN}
     restart: unless-stopped
 
+  # toxicity detection service
+  guardrails-toxicity-detection-server:
+    image: ${REGISTRY:-opea}/guardrails-toxicity-detection:${TAG:-latest}
+    container_name: guardrails-toxicity-detection-server
+    ports:
+      - "${TOXICITY_DETECTION_PORT:-9090}:9090"
+    ipc: host
+    environment:
+      no_proxy: ${no_proxy}
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+    restart: unless-stopped
+
   # factuality alignment service
   guardrails-factuality-predictionguard-server:
     image: ${REGISTRY:-opea}/guardrails-factuality-predictionguard:${TAG:-latest}
@@ -130,6 +143,7 @@ services:
       http_proxy: ${http_proxy}
       https_proxy: ${https_proxy}
       PREDICTIONGUARD_API_KEY: ${PREDICTIONGUARD_API_KEY}
+      TOXICITY_DETECTION_COMPONENT_NAME: "PREDICTIONGUARD_TOXICITY_DETECTION"
     restart: unless-stopped
 
 networks:
diff --git a/comps/guardrails/src/toxicity_detection/README.md b/comps/guardrails/src/toxicity_detection/README.md
@@ -2,17 +2,52 @@
 
 ## Introduction
 
-Toxicity Detection Microservice allows AI Application developers to safeguard user input and LLM output from harmful language in a RAG environment. By leveraging a smaller fine-tuned Transformer model for toxicity classification (e.g. DistilledBERT, RoBERTa, etc.), we maintain a lightweight guardrails microservice without significantly sacrificing performance making it readily deployable on both Intel Gaudi and Xeon.
+Toxicity Detection Microservice allows AI Application developers to safeguard user input and LLM output from harmful language in a RAG environment. By leveraging a smaller fine-tuned Transformer model for toxicity classification (e.g. DistillBERT, RoBERTa, etc.), we maintain a lightweight guardrails microservice without significantly sacrificing performance. This [article](https://huggingface.co/blog/daniel-de-leon/toxic-prompt-roberta) shows how the small language model (SLM) used in this microservice performs as good, if not better, than some of the most popular decoder LLM guardrails. This microservice uses [`Intel/toxic-prompt-roberta`](https://huggingface.co/Intel/toxic-prompt-roberta) that was fine-tuned on Gaudi2 with ToxicChat and Jigsaw Unintended Bias datasets.
 
-This microservice uses [`Intel/toxic-prompt-roberta`](https://huggingface.co/Intel/toxic-prompt-roberta) that was fine-tuned on Gaudi2 with ToxicChat and Jigsaw Unintended Bias datasets.
+In addition to showing promising toxic detection performance, the table below compares a [locust](https://github.com/locustio/locust) stress test on this microservice and the [LlamaGuard microservice](https://github.com/opea-project/GenAIComps/blob/main/comps/guardrails/src/guardrails/README.md#LlamaGuard). The input included varying lengths of toxic and non-toxic input over 200 seconds. A total of 50 users are added in the first 100 seconds, while the last 100 seconds the number of users stayed constant. It should also be noted that the LlamaGuard microservice was deployed on a Gaudi2 card while the toxicity detection microservice was deployed on a 4th generation Xeon.
 
-Toxicity is defined as rude, disrespectful, or unreasonable language likely to make someone leave a conversation. This can include instances of aggression, bullying, targeted hate speech, or offensive language. For more information on labels see [Jigsaw Toxic Comment Classification Challenge](http://kaggle.com/c/jigsaw-toxic-comment-classification-challenge).
+| Microservice       | Request Count | Median Response Time (ms) | Average Response Time (ms) | Min Response Time (ms) | Max Response Time (ms) | Requests/s |  50% |  95% |
+| :----------------- | ------------: | ------------------------: | -------------------------: | ---------------------: | ---------------------: | ---------: | ---: | ---: |
+| LG                 |          2099 |                      3300 |                       2718 |                     81 |                   4612 |       10.5 | 3300 | 4600 |
+| Toxicity Detection |          4547 |                       450 |                        796 |                     19 |                  10045 |       22.7 |  450 | 2500 |
+
+This microservice is designed to detect toxicity, which is defined as rude, disrespectful, or unreasonable language likely to make someone leave a conversation. This can include instances of aggression, bullying, targeted hate speech, or offensive language. For more information on labels see [Jigsaw Toxic Comment Classification Challenge](http://kaggle.com/c/jigsaw-toxic-comment-classification-challenge).
+
+## Environment Setup
+
+### Clone OPEA GenAIComps and Setup Environment
+
+Clone this repository at your desired location and set an environment variable for easy setup and usage throughout the instructions.
+
+```bash
+git clone https://github.com/opea-project/GenAIComps.git
+
+export OPEA_GENAICOMPS_ROOT=$(pwd)/GenAIComps
+```
+
+Set the port that this service will use and the component name
+
+```
+export TOXICITY_DETECTION_PORT=9090
+export TOXICITY_DETECTION_COMPONENT_NAME="OPEA_NATIVE_TOXICITY"
+```
+
+By default, this microservice uses `OPEA_NATIVE_TOXICITY` which invokes [`Intel/toxic-prompt-roberta`](https://huggingface.co/Intel/toxic-prompt-roberta), locally.
+
+Alternatively, if you are using Prediction Guard, reset the following component name environment variable:
+
+```
+export TOXICITY_DETECTION_COMPONENT_NAME="PREDICTIONGUARD_TOXICITY_DETECTION"
+```
+
+### Set environment variables
 
 ## 🚀1. Start Microservice with Python（Option 1）
 
 ### 1.1 Install Requirements
 
 ```bash
+cd $OPEA_GENAICOMPS_ROOT/comps/guardrails/src/toxicity_detection
 pip install -r requirements.txt
 ```
 
@@ -24,27 +59,42 @@ python toxicity_detection.py
 
 ## 🚀2. Start Microservice with Docker (Option 2)
 
-### 2.1 Prepare toxicity detection model
+### 2.1 Build Docker Image
 
-export HUGGINGFACEHUB_API_TOKEN=${HP_TOKEN}
+```bash
+cd $OPEA_GENAICOMPS_ROOT
+docker build \
+    --build-arg https_proxy=$https_proxy \
+    --build-arg http_proxy=$http_proxy \
+    -t opea/guardrails-toxicity-detection:latest  \
+    -f comps/guardrails/src/toxicity_detection/Dockerfile .
+```
 
-### 2.2 Build Docker Image
+### 2.2.a Run Docker with Compose (Option A)
 
 ```bash
-cd ../../../ # back to GenAIComps/ folder
-docker build -t opea/guardrails-toxicity-detection:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/guardrails/src/toxicity_detection/Dockerfile .
+cd $OPEA_GENAICOMPS_ROOT/comps/guardrails/deployment/docker_compose
+docker compose up -d guardrails-toxicity-detection-server
 ```
 
-### 2.3 Run Docker Container with Microservice
+### 2.2.b Run Docker with CLI (Option B)
 
 ```bash
-docker run -d --rm --runtime=runc --name="guardrails-toxicity-detection-endpoint" -p 9091:9091 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} -e HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN} opea/guardrails-toxicity-detection:latest
+docker run -d --rm \
+    --name="guardrails-toxicity-detection-server" \
+    --runtime=runc  \
+    -p ${TOXICITY_DETECTION_PORT}:9090 \
+    --ipc=host \
+    -e http_proxy=$http_proxy \
+    -e https_proxy=$https_proxy \
+    -e no_proxy=${no_proxy} \
+     opea/guardrails-toxicity-detection:latest
 ```
 
 ## 🚀3. Get Status of Microservice
 
 ```bash
-docker container logs -f guardrails-toxicity-detection-endpoint
+docker container logs -f guardrails-toxicity-detection-server
 ```
 
 ## 🚀4. Consume Microservice Pre-LLM/Post-LLM
@@ -54,9 +104,9 @@ Once microservice starts, users can use examples (bash or python) below to apply
 **Bash:**
 
 ```bash
-curl localhost:9091/v1/toxicity
-    -X POST
-    -d '{"text":"How to poison my neighbor'\''s dog without being caught?"}'
+curl localhost:${TOXICITY_DETECTION_PORT}/v1/toxicity \
+    -X POST \
+    -d '{"text":"How to poison my neighbor'\''s dog without being caught?"}' \
     -H 'Content-Type: application/json'
 ```
 
@@ -71,9 +121,11 @@ Example Output:
 ```python
 import requests
 import json
+import os
 
+toxicity_detection_port = os.getenv("TOXICITY_DETECTION_PORT")
 proxies = {"http": ""}
-url = "http://localhost:9091/v1/toxicity"
+url = f"http://localhost:{toxicty_detection_port}/v1/toxicity"
 data = {"text": "How to poison my neighbor'''s dog without being caught?"}
 
 
diff --git a/comps/guardrails/src/toxicity_detection/integrations/toxicdetection.py b/comps/guardrails/src/toxicity_detection/integrations/toxicdetection.py
@@ -0,0 +1,48 @@
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+import asyncio
+import os
+
+from transformers import pipeline
+
+from comps import CustomLogger, OpeaComponent, OpeaComponentRegistry, ServiceType, TextDoc
+
+logger = CustomLogger("opea_toxicity_native")
+logflag = os.getenv("LOGFLAG", False)
+
+
+@OpeaComponentRegistry.register("OPEA_NATIVE_TOXICITY")
+class OpeaToxicityDetectionNative(OpeaComponent):
+    """A specialized toxicity detection component derived from OpeaComponent."""
+
+    def __init__(self, name: str, description: str, config: dict = None):
+        super().__init__(name, ServiceType.GUARDRAIL.name.lower(), description, config)
+        self.model = os.getenv("TOXICITY_DETECTION_MODEL", "Intel/toxic-prompt-roberta")
+        self.toxicity_pipeline = pipeline("text-classification", model=self.model, tokenizer=self.model)
+        health_status = self.check_health()
+        if not health_status:
+            logger.error("OpeaToxicityDetectionNative health check failed.")
+
+    async def invoke(self, input: TextDoc):
+        """Invokes the toxic detection for the input.
+
+        Args:
+            input (Input TextDoc)
+        """
+        toxic = await asyncio.to_thread(self.toxicity_pipeline, input.text)
+        if toxic[0]["label"].lower() == "toxic":
+            return TextDoc(text="Violated policies: toxicity, please check your input.", downstream_black_list=[".*"])
+        else:
+            return TextDoc(text=input.text)
+
+    def check_health(self) -> bool:
+        """Checks the health of the animation service.
+
+        Returns:
+            bool: True if the service is reachable and healthy, False otherwise.
+        """
+        if self.toxicity_pipeline:
+            return True
+        else:
+            return False
diff --git a/comps/guardrails/src/toxicity_detection/opea_toxicity_detection_microservice.py b/comps/guardrails/src/toxicity_detection/opea_toxicity_detection_microservice.py
@@ -3,8 +3,7 @@
 
 import os
 import time
-
-from integrations.predictionguard import OpeaToxicityDetectionPredictionGuard
+from typing import Union
 
 from comps import (
     CustomLogger,
@@ -21,7 +20,17 @@
 logger = CustomLogger("opea_toxicity_detection_microservice")
 logflag = os.getenv("LOGFLAG", False)
 
-toxicity_detection_component_name = os.getenv("TOXICITY_DETECTION_COMPONENT_NAME", "PREDICTIONGUARD_TOXICITY_DETECTION")
+toxicity_detection_port = int(os.getenv("TOXICITY_DETECTION_PORT", 9090))
+toxicity_detection_component_name = os.getenv("TOXICITY_DETECTION_COMPONENT_NAME", "OPEA_NATIVE_TOXICITY")
+
+if toxicity_detection_component_name == "OPEA_NATIVE_TOXICITY":
+    from integrations.toxicdetection import OpeaToxicityDetectionNative
+elif toxicity_detection_component_name == "PREDICTIONGUARD_TOXICITY_DETECTION":
+    from integrations.predictionguard import OpeaToxicityDetectionPredictionGuard
+else:
+    logger.error(f"Component name {toxicity_detection_component_name} is not recognized")
+    exit(1)
+
 # Initialize OpeaComponentLoader
 loader = OpeaComponentLoader(
     toxicity_detection_component_name,
@@ -35,12 +44,12 @@
     service_type=ServiceType.GUARDRAIL,
     endpoint="/v1/toxicity",
     host="0.0.0.0",
-    port=9090,
+    port=toxicity_detection_port,
     input_datatype=TextDoc,
-    output_datatype=ScoreDoc,
+    output_datatype=Union[TextDoc, ScoreDoc],
 )
 @register_statistics(names=["opea_service@toxicity_detection"])
-async def toxicity_guard(input: TextDoc) -> ScoreDoc:
+async def toxicity_guard(input: TextDoc) -> Union[TextDoc, ScoreDoc]:
     start = time.time()
 
     # Log the input if logging is enabled
diff --git a/tests/guardrails/test_guardrails_hallucination_detection_on_intel_hpu.sh b/tests/guardrails/test_guardrails_hallucination_detection_on_intel_hpu.sh
@@ -38,6 +38,7 @@ function start_service() {
     export LLM_ENDPOINT_PORT=12210
     export vLLM_ENDPOINT="http://${host_ip}:${LLM_ENDPOINT_PORT}"
     export HALLUCINATION_DETECTION_PORT=11305
+    export VLLM_SKIP_WARMUP=true
     export TAG=comps
     service_name="vllm-gaudi-server hallucination-detection-server"
     cd $WORKPATH
diff --git a/tests/guardrails/test_guardrails_toxicity_detection_toxicdetection.sh b/tests/guardrails/test_guardrails_toxicity_detection_toxicdetection.sh
@@ -0,0 +1,73 @@
+#!/bin/bash
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+set -x
+
+WORKPATH=$(dirname "$PWD")
+ip_address=$(hostname -I | awk '{print $1}')
+
+function build_docker_images() {
+    echo "Start building docker images for microservice"
+    cd $WORKPATH
+    docker build --no-cache -t opea/guardrails-toxicity-detection:comps --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/guardrails/src/toxicity_detection/Dockerfile .
+    if [ $? -ne 0 ]; then
+        echo "opea/guardrails-toxicity-detection built fail"
+        exit 1
+    else
+        echo "opea/guardrails-toxicity-detection built successful"
+    fi
+}
+
+function start_service() {
+    echo "Starting microservice"
+    export toxicity_detection_port=9091
+    docker run -d --runtime=runc --name="test-comps-guardrails-toxicity-detection-endpoint" -p $toxicity_detection_port:$toxicity_detection_port --ipc=host -e TOXICITY_DETECTION_PORT=$toxicity_detection_port -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e no_proxy-$no_proxy opea/guardrails-toxicity-detection:comps
+    sleep 25
+    echo "Microservice started"
+}
+
+function validate_microservice() {
+    echo "Validate microservice started"
+    echo "test 1 - toxic"
+    result=$(curl localhost:9091/v1/toxicity -X POST -d '{"text":"How to poison your neighbor'\''s dog secretly"}' -H 'Content-Type: application/json')
+    if [[ $result == *"Violated"* ]]; then
+        echo "Result correct."
+    else
+        docker logs test-comps-guardrails-toxicity-detection-endpoint
+        exit 1
+    fi
+    echo "test 2 - non-toxic"
+    result=$(curl localhost:9091/v1/toxicity -X POST -d '{"text":"How to write a paper on raising dogs?"}' -H 'Content-Type: application/json')
+    if [[ $result == *"paper"* ]]; then
+        echo "Result correct."
+    else
+        echo "Result wrong."
+        docker logs test-comps-guardrails-toxicity-detection-endpoint
+        exit 1
+    fi
+    echo "Validate microservice completed"
+}
+
+function stop_docker() {
+    cid=$(docker ps -aq --filter "name=test-comps-guardrails-toxicity-detection-endpoint")
+    echo "Shutdown legacy containers "$cid
+    if [[ ! -z "$cid" ]]; then docker stop $cid && docker rm $cid && sleep 1s; fi
+}
+
+function main() {
+
+    stop_docker
+
+    build_docker_images
+    start_service
+
+    validate_microservice
+
+    stop_docker
+    echo "cleanup container images and volumes"
+    echo y | docker system prune 2>&1 > /dev/null
+
+}
+
+main