Skip to content

GGUNGSIL-WONNIT/WONNIT-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

36 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🏷️ Scene Classification Β· 🧭 Space Item Detection Β· πŸ” Change Detection

MobileNetV2 (torchvision) Β· YOLOv8n Β· TinyChangeUNet (MobileNetV3 encoder)

μ‹€λ‚΄ 곡간 데이터λ₯Ό λŒ€μƒμœΌλ‘œ μž₯μ†Œ λΆ„λ₯˜ β†’ 곡간 μ•„μ΄ν…œ 탐지 β†’ λ³€ν™” 감지



πŸ“ 폴더 ꡬ쑰

A) Scene Classification (ImageFolder)
space_cls/
  train/<class>/*.jpg|png
  val/<class>/*.jpg|png
  test/<class>/*.jpg|png

B) Space Item Detection (YOLO)
space_data/
  images/{train,val,test}/*.jpg|png
  labels/{train,val,test}/*.txt      # YOLO: cls cx cy w h
  space.yaml

C) Change Detection (ν•©μ„± before/after/GT)
pairs_out_cd/
  train/{before_images,after_images,labels}
  val/{before_images,after_images,labels}
  test/{before_images,after_images,labels}
meta/pairs_{train,val}.json

A) Scene Classification (μž₯μ†Œ λΆ„λ₯˜)

λͺ¨λΈ: MobileNetV2 (torchvision, ImageNet μ‚¬μ „ν•™μŠ΅ β†’ νŒŒμΈνŠœλ‹)
μž…λ ₯: 224Γ—224
타깃 클래슀(5): creative_studio, dance_studio, music_rehearsal_room, small_theater_gallery, study_room
데이터 ꡬ좕: ν΄λž˜μŠ€λ‹Ή ~50μž₯ (1μ°¨ 크둀링 β†’ 2μ°¨ μˆ˜μž‘μ—… μ •μ œ) ν›„ train:val=8:2 λΆ„ν• , ImageFolder 포맷

# ν•™μŠ΅ 슀크립트
python train_mobilenet.py
# μ‚°μΆœλ¬Ό: mobilenetv2.pth, class_names.txt

ν…ŒμŠ€νŠΈ μ„±λŠ₯(ν΄λž˜μŠ€λ³„ 정확도)

Class Correct / Total Acc.
creative_studio 10 / 10 100.0%
dance_studio 8 / 10 80.0%
music_rehearsal_room 7 / 10 70.0%
small_theater_gallery 8 / 10 80.0%
study_room 10 / 10 100.0%
matrix

Overall Acc: 43/50 = 86.0% Β· Macro Acc: 86.0%

μ£Όμš” μ˜€λΆ„λ₯˜: dance_studio β†’ small_theater_gallery 2건, music_rehearsal_room β†’ study_room 2건 λ“±.


B) Space Item Detection (μ•„μ΄ν…œ 탐지)

λͺ¨λΈ: YOLOv8n (Ultralytics, COCO μ‚¬μ „ν•™μŠ΅ β†’ μ»€μŠ€ν…€ νŒŒμΈνŠœλ‹)
λͺ©μ : μ‹€λ‚΄ μ‚¬μ§„μ—μ„œ 곡간 μ•„μ΄ν…œ(13μ’…)을 νƒμ§€ν•©λ‹ˆλ‹€.

데이터셋

  • 클래슀(13μ’…)
    air_conditioner, chair, desk, drum, microphone, mirror, monitor, piano, projector, speaker, spotlight, stage, whiteboard
  • ꡬ좕: μž₯μ†Œ λ°μ΄ν„°μ—μ„œ μžλ™ λ°•μŠ€ 라벨링 νŒŒμ΄ν”„λΌμΈμœΌλ‘œ μ΄ˆμ•ˆ 생성 β†’ μˆ˜μž‘μ—… 보정
  • ν˜•μ‹: YOLO ν˜•μ‹ (images/, labels/*.txt ; 각 txt: cls cx cy w h)
라벨 톡계 (νžˆμŠ€ν† κ·Έλž¨)
[dataset_trainval]
  0 air_conditioner : 1026   7 piano       : 1265
  1 chair           : 2058   8 projector   : 1026
  2 desk            : 4199   9 speaker     : 1900
  3 drum            : 3444  10 spotlight   : 6136
  4 microphone      : 1126  11 stage       : 1012
  5 mirror          : 1185  12 whiteboard  : 1504
  6 monitor         : 1391

[dataset_test]
  0 air_conditioner :   7   7 piano       :  22
  1 chair           :  94   8 projector   :  10
  2 desk            :  69   9 speaker     :  22
  3 drum            :  13  10 spotlight   : 142
  4 microphone      :  15  11 stage       :   8
  5 mirror          :  44  12 whiteboard  :  13
  6 monitor         :  14

YOLO 데이터 μ„€μ • μ˜ˆμ‹œ(space.yaml)

path: ./space_data
train: images/train
val: images/val
test: images/test
names:
  0: air_conditioner
  1: chair
  2: desk
  3: drum
  4: microphone
  5: mirror
  6: monitor
  7: piano
  8: projector
  9: speaker
 10: spotlight
 11: stage
 12: whiteboard

ν•™μŠ΅/평가/μΆ”λ‘ 

from ultralytics import YOLO
model = YOLO("yolov8n.pt")
model.train(data="space_data/space.yaml", imgsz=640, epochs=80, batch=16, seed=42)
model.val(data="space_data/space.yaml", imgsz=640, split="val")
model.predict(source="space_data/images/test", imgsz=640, conf=0.25, save=True)

μ„±λŠ₯ μš”μ•½

  • 80 epochs μ™„λ£Œ, best/last κ°€μ€‘μΉ˜ μ €μž₯ (runs/detect/space_no_leak/weights/best.pt)
  • Val: mAP50=0.981, mAP50-95=0.912
  • Test: mAP50=0.979, mAP50-95=0.910
Split mAP@0.5 mAP@0.5:0.95 ImgSize Model
Val 0.981 0.912 640 YOLOv8n
Test 0.979 0.910 640 YOLOv8n

Speed(ref): ~0.3ms preprocess, 5.0ms inference, 3.2ms postprocess / image (T4)

μ„±λŠ₯ 평가 μ‹œκ°ν™”

item detection itemdetection2 itemdetection3

C) Change Detection (μ „ν›„ λ³€ν™” 감지)

λͺ¨λΈ: TinyChangeUNet (MobileNetV3 Small encoder + TinyDecoder)
μž…λ ₯: before(3) + after(3) + diff(1) = 7채널 (diff = mean(|before - after|))

데이터셋 ꡬ좕 (Before/After + GT 마슀크)

λͺ©ν‘œ

  1. μ‹€μ œ ν™˜κ²½κ³Ό μœ μ‚¬ν•œ λ‹€μ–‘ν•œ λ³€ν™”(κ°€λ¦Ό/λΈ”λŸ¬/ν”½μ…€ν™”/인페인트/이동) λ₯Ό μžλ™ μ μš©ν•΄ (before, after, mask) 쌍 일괄 생성
  2. 마슀크 κ·œμΉ™: 0=λ°°κ²½, 255=λ³€κ²½ μ˜μ—­
  3. ν™œμš©: λ³€ν™” 감지, μ „/ν›„ 비ꡐ, λΆ„ν• (Segmentation) ν•™μŠ΅/λ²€μΉ˜λ§ˆν‚Ή

생성 둜직 μš”μ•½

  • YOLO 라벨 λ°•μŠ€λ₯Ό κΈ°μ€€μœΌλ‘œ μ˜μ—­ 선택 ν›„, μ•„λž˜ 쀑 ν•˜λ‚˜ 적용
    black / rect(noise) / blur / pixelate / inpaint / move(μ˜μ—­ 이동)
  • λ°•μŠ€ jitter/μ—¬μœ μ™€ λΆ€λΆ„ κ°€λ¦ΌμœΌλ‘œ λ‚œμ΄λ„ λ‹€μ–‘ν™”
  • κ²°κ³Ό: before_images/(원본), after_images/(λ³€ν˜•), labels/(0/255 PNG 마슀크)

생성 데이터 μ˜ˆμ‹œ change3 change4 change5 change6

λͺ¨λΈ ꡬ쑰 μš”μ•½

  • concat([before, after, diff]) β†’ 1Γ—1 conv둜 7ch β†’ 3ch μΆ•μ†Œ
  • Encoder: MobileNetV3 Small (timm, features_only) β†’ 채널 μ •κ·œν™”(24/40/64/96)
  • Decoder(TinyDecoder): ConvTranspose2d μ—…μƒ˜ν”Œ + μŠ€ν‚΅ + DWConvBlock
  • Head: 1Γ—1 conv β†’ logit(1ch) β†’ bilinear μ—…μƒ˜ν”Œ(원해상도)

ν•™μŠ΅/평가 (change_detection.ipynb)

  • κΈ°λ³Έ: IMG_SIZE=256, BATCH=8, EPOCHS=40, LR=3e-4
  • 루프: AMP(FP16), Cosine+Warmup(2ep), EMA(0.99), gradient clip
  • 손싀: BCEWithLogits(pos_weight) + Tversky(Ξ±=0.7, Ξ²=0.3)
  • 검증 threshold sweep: th ∈ [0.02, 0.40]μ—μ„œ F1 μ΅œλŒ€λ₯Ό 선택

μ„±λŠ₯ μš”μ•½ (Val 둜그 기반)

  • Early stop(F1), 총 Epoch 39
  • Best F1(EMA): 0.510 @ th=0.36
  • λ§ˆμ§€λ§‰ Epoch(38): train loss=0.4530, val loss=0.6248, mIoU=0.413, F1=0.513
018df41c-b2a7-4fce-857b-5fb91b2bce7a

test μ˜ˆμ‹œ 사진 chang

Split Best-th mIoU F1 AMP EMA
Val 0.36 0.413 0.513 βœ… βœ…

πŸ” μž¬ν˜„μ„±

  • 곡톡 μ‹œλ“œ: 42 (μŠ€ν”Œλ¦Ώ λˆ„μˆ˜ λ°©μ§€, 둜그/체크포인트 κ³ μ •)
  • 검증/μ €μž₯: EMA κ°€μ€‘μΉ˜ κΈ°μ€€

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published