Segmentation Model-Part VII - Training Instance Segmentation in MMDetection
The seventh part of the Segmentation Tutorial Series, a step-by-step guide to developing Instance Segmentation Models in MMDetection
- 1. Semantic Segmentation vs Instance Segmentation
- 1. Problem Description and Dataset
- 2. Data Preparation
- 3. Training instance segmentation problems by MMDetection
In this post, we will cover how to train a instance segmentation model by using the MMDetection library.
1. Semantic Segmentation vs Instance Segmentation
We first introduce about: Semantic image segmentation, Object detection, Semantic Image segmentation
- A Semantic image segmentation marks all pixels belonging to that tag, but won’t define the boundaries of each object.
- A Object detection does not segment the object, but define the location of each individual object instance with a box.
- Combining the semantic segmentation with the object detection leads to a instance segmentation
Nowaday, to tackle the instance segmentation problem, one use uselly Mask R-CNN model which is presented by [K.He] and all. For more detail about Mask R-CNN model, we refer to read Everything about Mask R-CNN: A Beginner’s Guide artical.
Mask R-CNN is the state-of-the-art model for the Instance Segmentation with three outputs of the model: mask, classes and boundary box.
1. Problem Description and Dataset
We will cover the nail instance segmentation. We want to have a bounding box and segment each nail in the picture. It’s from the real application. We want to make a nail disease classification application. To do that, the first step is cropping nails in the given image. Then each cropping nail image will be fed in to the classification model.
For the semantic nail segmentation, we can segment the nail in iamges and then use post-processing to obtain the bounding box and segmentation of nails. That method does not work well in the case that the nails have overlapping. We then aproach the instance segmentation problem to tackle the difficulty.
Images | Masks |
---|---|
Mission: We want to have a bounding box and segmentation of each nail in the picture.
Our data is organized as
├── Images
│ ├── 1
│ ├── first_image.png
│ ├── second_image.png
│ ├── third_image.png
│ ├── 2
│ ├── 3
│ ├── 4
├── Masks
│ ├── 1
│ ├── first_image.png
│ ├── second_image.png
│ ├── third_image.png
│ ├── 2
│ ├── 3
│ ├── 4
We have two folders: Images
and Masks
. Images
is the data folder, and Masks
is the label folder, which is the segmentations of input images. Each folder has four sub-folder: 1
, 2
, 3
, and 4
, corresponding to four distribution parttens of nail .
We download data from link and put it in data_root
, for example
data_root = "./nail-segmentation-dataset"
2. Data Preparation
We now have only the semantic segmentation dataset. This part we will make the instance segmentation datset and save that data in the form coco
.
2.1 Make data frame
For convenient, we will save all of dataset information in the csv files:
images,masks,width,height
images | masks | width | height |
---|---|---|---|
images/1/filename1.png | masks/1/filename1.png | 256 | 256 |
images/1/filename1.png | masks/1/filename1.png | 256 | 256 |
images/2/filename1.png | masks/2/filename1.png | 256 | 256 |
images/2/filename1.png | masks/2/filename1.png | 256 | 256 |
The function make_csv_file
helps us do the above task.
To do that we use two functions png2numpy
, make_csv_file_npy
in data_processing.py
file.
2.2 Get coco annotation
We want to convert our semantic segmentation data into the instance segmentaion. One of the famous format to organize the instance segmentation data is COCO
.
The coco annotation has the following format
{
"images": [images],
"annotations": [annotations],
"categories": [categories]
}
Where:
-
“images” (type: [List[Dict]]) is the list of dictionaries, each dictionary has informations
- “id”: 100 The id of image
- “file_name”: “train/images/1/image_100.png”, the path to get image
- “width”: 1800,
- “height”: 1626
-
“annotations” is the list of dictionaries, each dictionary has informations
- “id”: 350, id of object (not the image id)
- “image_id”: 100, id of image
- “category_id”: 1, id of categories
- “segmentation”: RLE or [polygon],
- “area”: float,
- “bbox”: [x,y,width,height],
- “iscrowd”: 0 or 1,
-
“categories” is the list of dictionaries, each dictionary has informations
- “id”: int = 0 id of categories
- “name”: str = “nail”
Using the get_annotations function, we can convert the semantic segmentation data into the coco format data of the instance segmentation.
def get_annotations(dataframe: pd.DataFrame):
"""get_annotations is to convert a dataframe into the coco format
Args:
train_df (pd.DataFrame): the dataframe that stored the infomation
of the dataset. the form of the dataframe is
images | width | height |
Returns:
[type]: the coco format data of the dataset
"""
cats = [{"id": 0, "name": "nail"}]
annotations = []
images = []
obj_count = 0
for idx, row in tqdm(dataframe.iterrows(), total=len(dataframe)):
filename = row.images
images.append(
{
"id": idx,
"file_name": filename,
"width": row.width,
"height": row.height,
}
)
binary_mask = read_mask(os.path.join(str(data_root), row.masks))
contours = find_contours(binary_mask)
for contour in contours:
xmin = int(np.min(contour[:, :, 0]))
xmax = int(np.max(contour[:, :, 0]))
ymin = int(np.min(contour[:, :, 1]))
ymax = int(np.max(contour[:, :, 1]))
poly = contour.flatten().tolist()
poly = [x + 0.5 for x in poly]
data_anno = {
"image_id": idx,
"id": obj_count,
"category_id": 0,
"bbox": [xmin, ymin, (xmax - xmin), (ymax - ymin)],
"area": (xmax - xmin) * (ymax - ymin),
"segmentation": [poly],
"iscrowd": 0,
}
if (xmax - xmin) * (ymax - ymin) < 20:
continue
else:
annotations.append(data_anno)
obj_count += 1
return {"categories": cats, "images": images, "annotations": annotations}
Where:
- find_contours is a function to get contour of a binary mask.
- dataframe argument of the above function is the data frame obtained from the make_csv_file that has the infomations of data.
We then save the annotaions as a json file by the get_json_coco
function
def get_json_coco(args) -> None:
train_df = pd.read_csv(f"{data_root}/csv_file/train_info.csv")
valid_df = pd.read_csv(f"{data_root}/csv_file/valid_info.csv")
coco_json = os.path.join(data_root, "annotations")
mkdir(coco_json)
train_json = get_annotations(train_df)
valid_json = get_annotations(valid_df)
with open(f"{coco_json}/train.json", "w+", encoding="utf-8") as f:
json.dump(train_json, f, ensure_ascii=True, indent=4)
with open(f"{coco_json}/valid.json", "w+", encoding="utf-8") as f:
json.dump(valid_json, f, ensure_ascii=True, indent=4)
For more details, we can find the source code at github
3. Training instance segmentation problems by MMDetection
3.1 MMDetection
MMDetection is an object detection toolbox that contains a rich set of object detection and instance segmentation methods as well as related components and modules. It is built on top of PyTorch.
One decomposes the detection framework into different components and one can easily construct a customized object detection framework by combining different modules. In this part, we discover how to decompose the instance segmentation framework and modify them in order to train a instance segmentation model.
To train a instance segmentation or object detection model, we pass to three steps:
- Prepare the customized dataset
- Prepare a config
- Train, test, inference models on the customized dataset.
In the second part we have customized our dataset into the coco format. With the coco format, we can easy reuse configurations.
3.2 Modify the config.
Config is all we need
To run a instance segmentation or object detection, all we need to do is define a good config. In the config file, there are all of infomation for a training model.
Examples of configurations are given in config. There are a lot of configs that help to build a customized configs. For the convenience, we will download them and put them to the repository of MMDetection tutorial.
A Config can be decompose into four parts.
- model: define the model architechture, loss function
- dataset: define the data pipeline
- schedules: define the optimization and the schedules learning rate
- default_runtime: define the logging, check point.
In the configs/__base__
there are examples for each module
├── configs
│ ├── __base__
│ ├── datasets
│ ├── models
│ ├── schedules
│ ├── default_runtime.py
Also, inside of the configs
, we have alot of subconfigs that coresponding to the model acrchitecture.
For example:
configs/mask_rcnn/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_coco.py
Here
- mask_rcnn: type of mask_rcnn
- r50: backbone of the model (Resnet50)
- caffe: the pretrained model is caffe model.
- fpn: the feature pyramid network.
- mstrain: the multi-scale image for the data pipeline
- poly: schedule poly
- 1x: 12 max_epochs
- coco: the dataset is coco format.
In this post, we focus on two modules: dataset and model and set the schedules and default_runtime as default.
Modify the model config
With the nail segmentation, the output is a binary mask (only nail object), then redefine the model as:
# The new config inherits a base config to highlight the necessary modification
_base_ = "mask_rcnn/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_coco.py"
# We also need to change the num_classes in head to match the dataset's annotation
model = dict(roi_head=dict(bbox_head=dict(num_classes=1), mask_head=dict(num_classes=1)))
Here:
- We inherit the config
mask_rcnn/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_coco.py
- Only need to define the num_classes in the bbox_head and mask_head.
Modify the data pipeline config
For the data pipeline:
data = dict(
samples_per_gpu=2,
workers_per_gpu=2,
train=dict(
type="CocoDataset",
img_prefix=data_root,
classes=cfg.classes,
ann_file=f"{data_root}/annotations/train.json",
pipeline=cfg.train_pipeline,
),
val=dict(
type="CocoDataset",
img_prefix=data_root,
classes=cfg.classes,
ann_file=f"{data_root}/annotations/valid.json",
pipeline=cfg.test_pipeline,
),
test=dict(
type="CocoDataset",
img_prefix=data_root,
classes=cfg.classes,
ann_file=f"{data_root}/annotations/valid.json",
pipeline=cfg.test_pipeline,
),
)
Here:
- type:”CocoDataset” as default because we use the coco format.
- img_prefix: - the path to the image directory.
- ann_file: the path to the json annotation file.
- classes: the classes of the dataset. Here class: = [“nail”]
- pipeline: data pipeline processing that is defined as
train_pipeline = [
dict(type="LoadImageFromFile"),
dict(type="LoadAnnotations", with_bbox=True, with_mask=True),
dict(
type="Resize",
img_scale=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), (1333, 768), (1333, 800)],
multiscale_mode="value",
keep_ratio=True,
),
dict(type="RandomFlip", flip_ratio=0.5),
dict(type="Normalize", **img_norm_cfg),
dict(type="Pad", size_divisor=32),
dict(type="DefaultFormatBundle"),
dict(type="Collect", keys=["img", "gt_bboxes", "gt_labels", "gt_masks"]),
]
test_pipeline = [
dict(type="LoadImageFromFile"),
dict(
type="MultiScaleFlipAug",
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type="Resize", keep_ratio=True),
dict(type="RandomFlip"),
dict(type="Normalize", **img_norm_cfg),
dict(type="Pad", size_divisor=32),
dict(type="ImageToTensor", keys=["img"]),
dict(type="Collect", keys=["img"]),
],
),
]
Note: we want use the multi-scale image when training the pipeline, then
img_scale=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), (1333, 768), (1333, 800)]
3.2 Training
Once we have the config file (see [nail_conig.py], we start to train model.
For that we will:
- import the config file
- define the model module
- defime the data pipeline
- train the model with an api
mmcv
library:
Import Config by using cfg = mmcv.Config("configs/nail_config.py")
Build the model pipeline from the Config by using build_detector api
from mmdet.apis import build_detector
model = build_detector(cfg.model, train_cfg=cfg.get("train_cfg"), test_cfg=cfg.get("test_cfg"))
model.init_weights()
Build the data pipeline from the Config by using build_dataset api
Using the apis: build_detector
, build_dataset
of mmdetection library, we can easily build the model and dataset.
from mmdet.apis import build_dataset
datasets = [build_dataset(cfg.data.train)]
Train the model with the train_detector api
from mmdet.apis import train_detector
train_detector(model, datasets)
After 40 epochs, we can see the model is training well.
For more details, we can find the source code at github