Skip to content

OptimaLPro/RoadXpert

Repository files navigation

RoadXpert

RoadXpert is a realtime PWA that runs a custom, fine‑tuned YOLO model entirely in the browser with TensorFlow.js, delivering fast on‑device detection on GPU (WebGL) or CPU (WASM). (no server calls for inference).

Simple steps to run, build, and use the PWA.

Quick start

  • Prereqs: Node.js 18+ and npm 9+

  • Install:

    npm install
  • Dev:

    npm run dev

    Open the printed URL (usually http://localhost:5173).

Build & preview

  • Build production:
    npm run build
  • Preview production (with service worker):
    npm run preview

PWA

  • Service worker runs only on Preview/Production (not in npm run dev).
  • When a new version is available you’ll see a small in‑app prompt to Reload.
  • First load may download ML model files; they’re then cached for offline use.

Add to Home Screen (install)

  • Android/Chrome: use the Install button in the address bar menu when offered.
  • iOS (Safari):
    1. Open the site in Safari over HTTPS (or localhost).
    2. Share button → Add to Home Screen.

Notes

  • Allow camera permission when prompted.
  • Static models/assets are served from public/ and must be hosted as static files.

Detection flow

  1. Backend init
  • File: src/context/RuntimeContext.jsx
  • Chooses TensorFlow.js backend (WASM CPU by default; WebGL GPU optional). Switching backends stops detection and reloads the model.
  1. Load model
  • Hook: src/hooks/useLoadModel.js
  • Loads /<modelName>_web_model/model.json via tf.loadGraphModel, shows progress, and warms up with a dummy input. Exposes { net, inputShape, name }.
  1. Open camera + wire detection
  • Component: src/components/WebcamDialog/WebcamDialog.jsx
  • Opens the webcam, then on <video onPlay> calls detectVideo(video, model, canvas, ...) and passes alert utilities from useAlerts().
  1. Per‑frame loop
  • Function: detectVideo() in src/utils/detect.js
  • Runs a requestAnimationFrame loop with optional FPS limiting (mobile), and for each frame calls detect(...).
  1. Preprocess
  • Function: preprocess() in src/utils/detect.js
  • Pads frame to square, resizes to model size, normalizes to [0..1], expandDims(). Returns [input, xRatio, yRatio] for later box scaling.
  1. Inference
  • Function: detect() in src/utils/detect.js
  • model.net.execute(input) → transpose → compute corner boxes, class scores/ids → NMS (tf.image.nonMaxSuppressionAsync) using the current sensitivity.
    • NMS (Non‑Max Suppression): removes duplicate/overlapping detections by comparing IoU and keeping the highest‑confidence boxes (here IoU≈0.45 plus the current confidence threshold).
  1. Post‑process + draw
  • Clears the canvas, draws ROI overlays (renderROIs) using roiMap.
  • Calls alertsOnFrame(...) (src/alerts/alertsManager.js) which triggers:
    • renderBoxes(...) to draw boxes
    • renderFocus(...) to highlight focus when moving
    • renderVoiceMessage(...) to speak/play sounds
  • ROI gating: src/alerts/roiUtils.js + src/alerts/roiMap.js ensure only relevant regions trigger.
  1. Alerts, sensitivity, movement
  • Context: src/context/AlertsContext.jsx manages which classes/alerts are enabled, sensitivity (confidence), ROI visibility, focus duration, voice modes.
  • Movement: src/context/LocationContext.jsx provides “user on the move” to allow focus/voice modes only when moving.

Labels

  • Loaded once per model: /<modelName>_web_model/labels.json in getLabels() inside src/utils/detect.js.

Tiny diagram

[Runtime init]
	RuntimeContext (WASM/WebGL)
				|
				v
[Model load]
	useLoadModel -> tf.loadGraphModel
	output: { net, inputShape=[1,H,W,3], name }
				|
				v
[Camera]
	<video> onPlay -> detectVideo(video, model, canvas, ...)
				|
				v
[Loop per frame]
	preprocess(video,H,W)
		-> input: [1,H,W,3] float32 (0..1)
		-> ratios: [xRatio,yRatio]
				|
				v
	net.execute(input)
		-> transRes: [1,N,(4+K)] -> boxes/scores/classes
		-> NMS(th=0.45, conf=sensitivity)
				|
				v
	draw + alerts
		- ROIs overlay (roiMap)
		- renderBoxes / renderFocus / renderVoiceMessage
		- timeoutManager + movement gating

Training Results

Below are some key visualizations from the model fine-tuning process:

Confusion Matrix (Normalized)

Confusion Matrix

Training Metrics

Training Metrics

Dataset & Training

  • Data sources:

    • COCO traffic subset (cars, buses, traffic lights, pedestrians, etc.)
    • LISA Traffic Light Dataset (UC San Diego, research use only)
    • Pothole dataset (Roboflow)
    • Self-annotation for missing class out of total 10 – using Roboflow smart annotation system
    • Final merged dataset resized to 640×640 with letterbox padding.
    • Split: 80% train / 20% val.
    • Classes (10 total): person, car, motorcycle, bicycle, truck, bus, traffic light red, traffic light green, traffic light na, pothole
  • Final validation results (best fine-tune checkpoint):

    • Precision: 0.732
    • Recall: 0.568
    • mAP@0.5: 0.641
    • mAP@0.5:0.95: 0.422

    Precision measures how many predicted detections are correct (TP/(TP+FP)); Recall measures how many true objects were found (TP/(TP+FN)); mAP is mean Average Precision across classes—mAP@0.5 uses IoU=0.5, while mAP@0.5:0.95 averages mAP over IoU thresholds from 0.5 to 0.95.

  • Export to TFJS:

    • Best checkpoint exported via Ultralytics:
      model.export(format="tfjs", opset=17)
    • Produces a TFJS GraphModel (model.json + shard files).
    • Loaded in-browser with @tensorflow/tfjs for real-time inference.

Dataset licenses and acknowledgments

This project uses publicly available datasets for academic and research purposes only:

  • COCO 2017 Dataset — © COCO Consortium.Licensed under the Creative Commons Attribution 4.0 License (CC BY 4.0).“This work uses data from the COCO 2017 dataset.”
  • LISA Traffic Light Dataset — © Laboratory for Intelligent and Safe Automobiles (LISA), UC San Diego. Provided for research use only. “This work uses the LISA Traffic Light Dataset, provided by UCSD for academic purposes only.”

No dataset files are redistributed in this repository. All datasets are used strictly for educational and non-commercial research within the academic context.