RoadXpert is a realtime PWA that runs a custom, fine‑tuned YOLO model entirely in the browser with TensorFlow.js, delivering fast on‑device detection on GPU (WebGL) or CPU (WASM). (no server calls for inference).
Simple steps to run, build, and use the PWA.
-
Prereqs: Node.js 18+ and npm 9+
-
Install:
npm install
-
Dev:
npm run dev
Open the printed URL (usually http://localhost:5173).
- Build production:
npm run build
- Preview production (with service worker):
npm run preview
- Service worker runs only on Preview/Production (not in
npm run dev). - When a new version is available you’ll see a small in‑app prompt to Reload.
- First load may download ML model files; they’re then cached for offline use.
- Android/Chrome: use the Install button in the address bar menu when offered.
- iOS (Safari):
- Open the site in Safari over HTTPS (or localhost).
- Share button → Add to Home Screen.
- Allow camera permission when prompted.
- Static models/assets are served from
public/and must be hosted as static files.
- Backend init
- File:
src/context/RuntimeContext.jsx - Chooses TensorFlow.js backend (WASM CPU by default; WebGL GPU optional). Switching backends stops detection and reloads the model.
- Load model
- Hook:
src/hooks/useLoadModel.js - Loads
/<modelName>_web_model/model.jsonviatf.loadGraphModel, shows progress, and warms up with a dummy input. Exposes{ net, inputShape, name }.
- Open camera + wire detection
- Component:
src/components/WebcamDialog/WebcamDialog.jsx - Opens the webcam, then on
<video onPlay>callsdetectVideo(video, model, canvas, ...)and passes alert utilities fromuseAlerts().
- Per‑frame loop
- Function:
detectVideo()insrc/utils/detect.js - Runs a requestAnimationFrame loop with optional FPS limiting (mobile), and for each frame calls
detect(...).
- Preprocess
- Function:
preprocess()insrc/utils/detect.js - Pads frame to square, resizes to model size, normalizes to [0..1],
expandDims(). Returns[input, xRatio, yRatio]for later box scaling.
- Inference
- Function:
detect()insrc/utils/detect.js model.net.execute(input)→ transpose → compute corner boxes, class scores/ids → NMS (tf.image.nonMaxSuppressionAsync) using the current sensitivity.- NMS (Non‑Max Suppression): removes duplicate/overlapping detections by comparing IoU and keeping the highest‑confidence boxes (here IoU≈0.45 plus the current confidence threshold).
- Post‑process + draw
- Clears the canvas, draws ROI overlays (
renderROIs) usingroiMap. - Calls
alertsOnFrame(...)(src/alerts/alertsManager.js) which triggers:renderBoxes(...)to draw boxesrenderFocus(...)to highlight focus when movingrenderVoiceMessage(...)to speak/play sounds
- ROI gating:
src/alerts/roiUtils.js+src/alerts/roiMap.jsensure only relevant regions trigger.
- Alerts, sensitivity, movement
- Context:
src/context/AlertsContext.jsxmanages which classes/alerts are enabled, sensitivity (confidence), ROI visibility, focus duration, voice modes. - Movement:
src/context/LocationContext.jsxprovides “user on the move” to allow focus/voice modes only when moving.
Labels
- Loaded once per model:
/<modelName>_web_model/labels.jsoningetLabels()insidesrc/utils/detect.js.
[Runtime init]
RuntimeContext (WASM/WebGL)
|
v
[Model load]
useLoadModel -> tf.loadGraphModel
output: { net, inputShape=[1,H,W,3], name }
|
v
[Camera]
<video> onPlay -> detectVideo(video, model, canvas, ...)
|
v
[Loop per frame]
preprocess(video,H,W)
-> input: [1,H,W,3] float32 (0..1)
-> ratios: [xRatio,yRatio]
|
v
net.execute(input)
-> transRes: [1,N,(4+K)] -> boxes/scores/classes
-> NMS(th=0.45, conf=sensitivity)
|
v
draw + alerts
- ROIs overlay (roiMap)
- renderBoxes / renderFocus / renderVoiceMessage
- timeoutManager + movement gating
Below are some key visualizations from the model fine-tuning process:
-
Data sources:
- COCO traffic subset (cars, buses, traffic lights, pedestrians, etc.)
- LISA Traffic Light Dataset (UC San Diego, research use only)
- Pothole dataset (Roboflow)
- Self-annotation for missing class out of total 10 – using Roboflow smart annotation system
- Final merged dataset resized to 640×640 with letterbox padding.
- Split: 80% train / 20% val.
- Classes (10 total):
person, car, motorcycle, bicycle, truck, bus, traffic light red, traffic light green, traffic light na, pothole
-
Final validation results (best fine-tune checkpoint):
- Precision: 0.732
- Recall: 0.568
- mAP@0.5: 0.641
- mAP@0.5:0.95: 0.422
Precision measures how many predicted detections are correct (TP/(TP+FP)); Recall measures how many true objects were found (TP/(TP+FN)); mAP is mean Average Precision across classes—mAP@0.5 uses IoU=0.5, while mAP@0.5:0.95 averages mAP over IoU thresholds from 0.5 to 0.95.
-
Export to TFJS:
- Best checkpoint exported via Ultralytics:
model.export(format="tfjs", opset=17)
- Produces a TFJS GraphModel (
model.json+ shard files). - Loaded in-browser with
@tensorflow/tfjsfor real-time inference.
- Best checkpoint exported via Ultralytics:
This project uses publicly available datasets for academic and research purposes only:
- COCO 2017 Dataset — © COCO Consortium.Licensed under the Creative Commons Attribution 4.0 License (CC BY 4.0).“This work uses data from the COCO 2017 dataset.”
- LISA Traffic Light Dataset — © Laboratory for Intelligent and Safe Automobiles (LISA), UC San Diego. Provided for research use only. “This work uses the LISA Traffic Light Dataset, provided by UCSD for academic purposes only.”
No dataset files are redistributed in this repository. All datasets are used strictly for educational and non-commercial research within the academic context.

