Skip to content

Commit c2beec8

Browse files
committed
final one for now
0 parents  commit c2beec8

19 files changed

+5726
-0
lines changed

1.png

816 KB
Loading

README.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Mask-RCNN
2+
Implement Real-Time Semantic Segmentation with [Mask_RCNN](https://github.com/matterport/Mask_RCNN).
3+
4+
## Requirements
5+
- Ubuntu 18.04
6+
- Python 3.6
7+
- Tensorflow 1.9
8+
- Keras 2.1.6
9+
- OpenCV 4.0
10+
11+
The algorithm has been tested on the above mentioned configuration, but I'm pretty sure that other combinations would also work effectively. But please make sure that you have TF/Keras combination as mentioned above. Opencv 3.4 would suffice.
12+
13+
This implementation would work better if you have a GPU in your system and use CUDA-accelerated learning. In a MSI laptop with 1050Ti (4 GB with 768 cuda cores), i5-8220 and 8GB RAM, the FPS obtained is 4.637.
14+
15+
Also, in order to test it on a Tello, make sure that you have the drone turned on and connected to its WiFi network. Once you esecute this code, press `TAB` to take-off and `BACKSPACE` to land. Other manual navigation commands are given in the header of the python code.
16+
17+
## Getting Started
18+
Install Dependencies (Not Mentioned Above)
19+
```bash
20+
$ sudo -H pip3 install -r requirements.txt
21+
$ sudo -H pip3 install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI
22+
```
23+
24+
Using pre-trained weights for MS COCO
25+
```bash
26+
It is included in {telloCV-masked-rcnn.py} that downloading the pre-trained weights for MS COCO.
27+
```
28+
29+
Run Demonstration
30+
```bash
31+
$ python3 telloCV-masked-rcnn.py
32+
```
33+
<div align="center">
34+
<a href="https://www.youtube.com/watch?v=RMD8G3Na71s"><img src="1.jpg" alt="Mask RCNN COCO Object detection and segmentation
35+
" width="550"></a>
36+
</div>
37+

mrcnn/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
139 Bytes
Binary file not shown.
2.7 KB
Binary file not shown.
74.8 KB
Binary file not shown.
26 KB
Binary file not shown.

mrcnn/config.py

Lines changed: 214 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,214 @@
1+
"""
2+
Mask R-CNN
3+
Base Configurations class.
4+
5+
Copyright (c) 2017 Matterport, Inc.
6+
Licensed under the MIT License (see LICENSE for details)
7+
Written by Waleed Abdulla
8+
"""
9+
10+
import math
11+
import numpy as np
12+
13+
14+
# Base Configuration Class
15+
# Don't use this class directly. Instead, sub-class it and override
16+
# the configurations you need to change.
17+
18+
class Config(object):
19+
"""Base configuration class. For custom configurations, create a
20+
sub-class that inherits from this one and override properties
21+
that need to be changed.
22+
"""
23+
# Name the configurations. For example, 'COCO', 'Experiment 3', ...etc.
24+
# Useful if your code needs to do things differently depending on which
25+
# experiment is running.
26+
NAME = None # Override in sub-classes
27+
28+
# NUMBER OF GPUs to use. For CPU training, use 1
29+
GPU_COUNT = 1
30+
31+
# Number of images to train with on each GPU. A 12GB GPU can typically
32+
# handle 2 images of 1024x1024px.
33+
# Adjust based on your GPU memory and image sizes. Use the highest
34+
# number that your GPU can handle for best performance.
35+
IMAGES_PER_GPU = 2
36+
37+
# Number of training steps per epoch
38+
# This doesn't need to match the size of the training set. Tensorboard
39+
# updates are saved at the end of each epoch, so setting this to a
40+
# smaller number means getting more frequent TensorBoard updates.
41+
# Validation stats are also calculated at each epoch end and they
42+
# might take a while, so don't set this too small to avoid spending
43+
# a lot of time on validation stats.
44+
STEPS_PER_EPOCH = 1000
45+
46+
# Number of validation steps to run at the end of every training epoch.
47+
# A bigger number improves accuracy of validation stats, but slows
48+
# down the training.
49+
VALIDATION_STEPS = 50
50+
51+
# Backbone network architecture
52+
# Supported values are: resnet50, resnet101
53+
BACKBONE = "resnet101"
54+
55+
# The strides of each layer of the FPN Pyramid. These values
56+
# are based on a Resnet101 backbone.
57+
BACKBONE_STRIDES = [4, 8, 16, 32, 64]
58+
59+
# Number of classification classes (including background)
60+
NUM_CLASSES = 1 # Override in sub-classes
61+
62+
# Length of square anchor side in pixels
63+
RPN_ANCHOR_SCALES = (32, 64, 128, 256, 512)
64+
65+
# Ratios of anchors at each cell (width/height)
66+
# A value of 1 represents a square anchor, and 0.5 is a wide anchor
67+
RPN_ANCHOR_RATIOS = [0.5, 1, 2]
68+
69+
# Anchor stride
70+
# If 1 then anchors are created for each cell in the backbone feature map.
71+
# If 2, then anchors are created for every other cell, and so on.
72+
RPN_ANCHOR_STRIDE = 1
73+
74+
# Non-max suppression threshold to filter RPN proposals.
75+
# You can increase this during training to generate more propsals.
76+
RPN_NMS_THRESHOLD = 0.7
77+
78+
# How many anchors per image to use for RPN training
79+
RPN_TRAIN_ANCHORS_PER_IMAGE = 256
80+
81+
# ROIs kept after non-maximum supression (training and inference)
82+
POST_NMS_ROIS_TRAINING = 2000
83+
POST_NMS_ROIS_INFERENCE = 1000
84+
85+
# If enabled, resizes instance masks to a smaller size to reduce
86+
# memory load. Recommended when using high-resolution images.
87+
USE_MINI_MASK = True
88+
MINI_MASK_SHAPE = (56, 56) # (height, width) of the mini-mask
89+
90+
# Input image resizing
91+
# Generally, use the "square" resizing mode for training and inferencing
92+
# and it should work well in most cases. In this mode, images are scaled
93+
# up such that the small side is = IMAGE_MIN_DIM, but ensuring that the
94+
# scaling doesn't make the long side > IMAGE_MAX_DIM. Then the image is
95+
# padded with zeros to make it a square so multiple images can be put
96+
# in one batch.
97+
# Available resizing modes:
98+
# none: No resizing or padding. Return the image unchanged.
99+
# square: Resize and pad with zeros to get a square image
100+
# of size [max_dim, max_dim].
101+
# pad64: Pads width and height with zeros to make them multiples of 64.
102+
# If IMAGE_MIN_DIM or IMAGE_MIN_SCALE are not None, then it scales
103+
# up before padding. IMAGE_MAX_DIM is ignored in this mode.
104+
# The multiple of 64 is needed to ensure smooth scaling of feature
105+
# maps up and down the 6 levels of the FPN pyramid (2**6=64).
106+
# crop: Picks random crops from the image. First, scales the image based
107+
# on IMAGE_MIN_DIM and IMAGE_MIN_SCALE, then picks a random crop of
108+
# size IMAGE_MIN_DIM x IMAGE_MIN_DIM. Can be used in training only.
109+
# IMAGE_MAX_DIM is not used in this mode.
110+
IMAGE_RESIZE_MODE = "square"
111+
IMAGE_MIN_DIM = 800
112+
IMAGE_MAX_DIM = 1024
113+
# Minimum scaling ratio. Checked after MIN_IMAGE_DIM and can force further
114+
# up scaling. For example, if set to 2 then images are scaled up to double
115+
# the width and height, or more, even if MIN_IMAGE_DIM doesn't require it.
116+
# Howver, in 'square' mode, it can be overruled by IMAGE_MAX_DIM.
117+
IMAGE_MIN_SCALE = 0
118+
119+
# Image mean (RGB)
120+
MEAN_PIXEL = np.array([123.7, 116.8, 103.9])
121+
122+
# Number of ROIs per image to feed to classifier/mask heads
123+
# The Mask RCNN paper uses 512 but often the RPN doesn't generate
124+
# enough positive proposals to fill this and keep a positive:negative
125+
# ratio of 1:3. You can increase the number of proposals by adjusting
126+
# the RPN NMS threshold.
127+
TRAIN_ROIS_PER_IMAGE = 200
128+
129+
# Percent of positive ROIs used to train classifier/mask heads
130+
ROI_POSITIVE_RATIO = 0.33
131+
132+
# Pooled ROIs
133+
POOL_SIZE = 7
134+
MASK_POOL_SIZE = 14
135+
136+
# Shape of output mask
137+
# To change this you also need to change the neural network mask branch
138+
MASK_SHAPE = [28, 28]
139+
140+
# Maximum number of ground truth instances to use in one image
141+
MAX_GT_INSTANCES = 100
142+
143+
# Bounding box refinement standard deviation for RPN and final detections.
144+
RPN_BBOX_STD_DEV = np.array([0.1, 0.1, 0.2, 0.2])
145+
BBOX_STD_DEV = np.array([0.1, 0.1, 0.2, 0.2])
146+
147+
# Max number of final detections
148+
DETECTION_MAX_INSTANCES = 100
149+
150+
# Minimum probability value to accept a detected instance
151+
# ROIs below this threshold are skipped
152+
DETECTION_MIN_CONFIDENCE = 0.7
153+
154+
# Non-maximum suppression threshold for detection
155+
DETECTION_NMS_THRESHOLD = 0.3
156+
157+
# Learning rate and momentum
158+
# The Mask RCNN paper uses lr=0.02, but on TensorFlow it causes
159+
# weights to explode. Likely due to differences in optimzer
160+
# implementation.
161+
LEARNING_RATE = 0.001
162+
LEARNING_MOMENTUM = 0.9
163+
164+
# Weight decay regularization
165+
WEIGHT_DECAY = 0.0001
166+
167+
# Loss weights for more precise optimization.
168+
# Can be used for R-CNN training setup.
169+
LOSS_WEIGHTS = {
170+
"rpn_class_loss": 1.,
171+
"rpn_bbox_loss": 1.,
172+
"mrcnn_class_loss": 1.,
173+
"mrcnn_bbox_loss": 1.,
174+
"mrcnn_mask_loss": 1.
175+
}
176+
177+
# Use RPN ROIs or externally generated ROIs for training
178+
# Keep this True for most situations. Set to False if you want to train
179+
# the head branches on ROI generated by code rather than the ROIs from
180+
# the RPN. For example, to debug the classifier head without having to
181+
# train the RPN.
182+
USE_RPN_ROIS = True
183+
184+
# Train or freeze batch normalization layers
185+
# None: Train BN layers. This is the normal mode
186+
# False: Freeze BN layers. Good when using a small batch size
187+
# True: (don't use). Set layer in training mode even when inferencing
188+
TRAIN_BN = False # Defaulting to False since batch size is often small
189+
190+
# Gradient norm clipping
191+
GRADIENT_CLIP_NORM = 5.0
192+
193+
def __init__(self):
194+
"""Set values of computed attributes."""
195+
# Effective batch size
196+
self.BATCH_SIZE = self.IMAGES_PER_GPU * self.GPU_COUNT
197+
198+
# Input image size
199+
if self.IMAGE_RESIZE_MODE == "crop":
200+
self.IMAGE_SHAPE = np.array([self.IMAGE_MIN_DIM, self.IMAGE_MIN_DIM, 3])
201+
else:
202+
self.IMAGE_SHAPE = np.array([self.IMAGE_MAX_DIM, self.IMAGE_MAX_DIM, 3])
203+
204+
# Image meta data length
205+
# See compose_image_meta() for details
206+
self.IMAGE_META_SIZE = 1 + 3 + 3 + 4 + 1 + self.NUM_CLASSES
207+
208+
def display(self):
209+
"""Display Configuration values."""
210+
print("\nConfigurations:")
211+
for a in dir(self):
212+
if not a.startswith("__") and not callable(getattr(self, a)):
213+
print("{:30} {}".format(a, getattr(self, a)))
214+
print("\n")

0 commit comments

Comments
 (0)