import torch
from torchvision.models.resnet import ResNet, Bottleneck, ResNet101_Weights
def resnet_101():
= ResNet(block=Bottleneck, layers=[3, 4, 23, 3])
resnet =True))
resnet.load_state_dict(ResNet101_Weights.DEFAULT.get_state_dict(progressreturn resnet
= resnet_101() resnet
Following up on the last post I did about eager mode quantization in pytorch, in this post I’ll be using pytorch’s FX graph mode quantization to quantize the same R-CNN. There are significant differences between the two quantization methods and here I will touch on those as well as demonstrate how to quantize using FX graph mode.
At the time of writing this, FX graph mode quantization is still a prototype feature. It’s not quite as mature as eager mode which is currently a beta feature. Although there does appear to be more effort on FX graph mode and it’s even encouraged over eager mode for first time users.
Symbolic Tracing
FX graph mode quantization requires the network to be symbolically traceable. Under the hood, PyTorch converts the nn.Module
network to an alternative format they often refer to as an internal representation (IR). You can imagine that this IR would need to be an accurate and consistent representation of the network regardless of the data flowing through it. So, any parts of the network that have data dependent control flow aren’t supported. Note that the approach taken with FX graph mode can apparently result in hacky modifications to code in the network that would otherwise be unnecessary. Users have complained about it complained about this and it’s something to keep in mind when considering this method.
Automation
Once your network is symbolically traceable, you’ll be in for a treat compared to eager mode. The biggest advantages of FX graph mode quantization are:
- module fusion occurs automatically, something that could otherwise be tedious or error prone depending upon the complexity and size of your network
- functionals and torch ops also get converted automagically. In this case that means no need to modify the bottleneck block to use float functional as done in the previous post
- no requirement to insert quant/dequant stubs in the network which means you can avoid creating those additional wrapper classes
Significant time and effort was invested in doing the above with eager mode. Assuming that getting your network to a symbolically traceable state isn’t more of a time sink, FX graph mode can be a better choice.
Verification and Model Preparation
With that out of the way, let’s dive into FX graph and QAT. As before, we’ll start with creating the resnet backbone but without having to modify the bottleneck to use float functional operator.
At this point the resnet is fully traceable. Tracing it with an example input will return a ScriptModule
which can be used to get a representation of graph’s forward method.
= torch.jit.trace(resnet, torch.rand(1, 3, 200, 200))
traced_module print(traced_module.code)
def forward(self,
x: Tensor) -> Tensor:
fc = self.fc
avgpool = self.avgpool
layer4 = self.layer4
layer3 = self.layer3
layer2 = self.layer2
layer1 = self.layer1
maxpool = self.maxpool
relu = self.relu
bn1 = self.bn1
conv1 = self.conv1
_0 = (relu).forward((bn1).forward((conv1).forward(x, ), ), )
_1 = (layer1).forward((maxpool).forward(_0, ), )
_2 = (layer3).forward((layer2).forward(_1, ), )
_3 = (avgpool).forward((layer4).forward(_2, ), )
input = torch.flatten(_3, 1)
return (fc).forward(input, )
Just as was done during eager mode preparation, the next step is to use torchvision’s helper method IntermediateLayerGetter
to extract layer outputs from the resnet to feed to the FPN.
from torchvision.models._utils import IntermediateLayerGetter
= [1, 2, 3, 4]
returned_layers = {f"layer{k}": str(v) for v, k in enumerate(returned_layers)}
return_layers = IntermediateLayerGetter(resnet, return_layers=return_layers) resnet_layers
As we saw before, the result of the layer getter is a module dict which returns an ordered dict from its forward method. If we attempt to trace using strict mode, JIT will complain because of the mutable output type:
AttributeError
...
AttributeError: expected a dictionary of (method_name, input) pairs
This can be ignored if we “are sure that the container you are using in your problem is a constant structure and does not get used as control flow (if, for) conditions.” Since we know the output won’t change, we can safely ignore this and set strict=False
. Note that this isn’t necessary for QAT preparation but it’s helpful to know apriori if the parts of the model that we intend to quantize are indeed traceable.
= torch.jit.trace(resnet_layers, torch.rand(1, 3, 200, 200), strict=False)
traced_module print(traced_module.code)
def forward(self,
x: Tensor) -> Dict[str, Tensor]:
layer4 = self.layer4
layer3 = self.layer3
layer2 = self.layer2
layer1 = self.layer1
maxpool = self.maxpool
relu = self.relu
bn1 = self.bn1
conv1 = self.conv1
_0 = (relu).forward((bn1).forward((conv1).forward(x, ), ), )
_1 = (layer1).forward((maxpool).forward(_0, ), )
_2 = (layer2).forward(_1, )
_3 = (layer3).forward(_2, )
_4 = {"0": _1, "1": _2, "2": _3, "3": (layer4).forward(_3, )}
return _4
Now to create the backbone with FPN, however this time without any modifications. After this, the module can be traced. Because the output is a mutable type (ordered dict) and it’s structure will not change, strict mode needs to be set to false. If you don’t, there is a slightly different error (shown below) but the reason is the same.
RuntimeError: Encountering a dict at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module's inputs. Consider using a constant container instead (e.g. for `list`, use a `tuple` instead. for `dict`, use a `NamedTuple` instead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior.
from torchvision.models.detection.backbone_utils import BackboneWithFPN
= resnet.inplanes // 8
in_channels_stage2 = [in_channels_stage2 * 2 ** (i - 1) for i in returned_layers]
in_channels_list = 256
out_channels = [1, 2, 3, 4]
returned_layers = {f"layer{k}": str(v) for v, k in enumerate(returned_layers)}
return_layers
= BackboneWithFPN(
bb_fpn =resnet,
backbone=return_layers,
return_layers=in_channels_list,
in_channels_list=out_channels
out_channels )
= torch.jit.trace(bb_fpn, torch.rand(1, 3, 200, 200), strict=False)
traced_module print(traced_module.code)
def forward(self,
x: Tensor) -> Dict[str, Tensor]:
fpn = self.fpn
body = self.body
_0, _1, _2, _3, = (body).forward(x, )
_4, _5, _6, _7, _8, = (fpn).forward(_0, _1, _2, _3, )
_9 = {"0": _4, "1": _5, "2": _6, "3": _7, "pool": _8}
return _9
We’ve verified the backbone with FPN is indeed traceable. So now we can create the R-CNN and prepare the model for QAT. During preparation, FX graph mode will automatically insert observers and fuse modules. The returned model is also now graph module and looks something like:
GraphModule(
(activation_post_process_0): HistogramObserver(min_val=inf, max_val=-inf)
(body): Module(
(conv1): ConvBnReLU2d(
3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(weight_fake_quant): PerChannelMinMaxObserver(min_val=tensor([]), max_val=tensor([]))
)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(layer1): Module(
(0): Module(
(conv1): ConvBnReLU2d(
64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(weight_fake_quant): PerChannelMinMaxObserver(min_val=tensor([]), max_val=tensor([]))
)
(conv2): ConvBnReLU2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(weight_fake_quant): PerChannelMinMaxObserver(min_val=tensor([]), max_val=tensor([]))
...
Note that preparation now requires an example input to determine the output types. As before, I’ll freeze the first layer along with batch norm stats.
%%capture
import re
from torchvision.models.detection.faster_rcnn import FasterRCNN
from torch.ao.quantization import quantize_fx
from torch.ao.quantization.qconfig_mapping import get_default_qconfig_mapping
= FasterRCNN(bb_fpn, num_classes=2)
quant_rcnn
= torch.randn(1, 3, 200, 200)
example_input
quant_rcnn.train()= get_default_qconfig_mapping("fbgemm")
qconfig_mapping = quantize_fx.prepare_qat_fx(quant_rcnn.backbone, qconfig_mapping, example_input)
quant_rcnn.backbone
= quant_rcnn.apply(torch.nn.intrinsic.qat.freeze_bn_stats)
quant_rcnn
for name, parameter in quant_rcnn.named_parameters():
if re.search(r"body.conv1", name) or re.search(r"body.layer1", name):
= False parameter.requires_grad
QAT and Model Conversion
As in the previous post, I’ll use the PennFudan dataset from the Torchvision object detection finetuning tutorial.
import os
from torchvision.io import read_image
from torchvision.ops.boxes import masks_to_boxes
from torchvision import tv_tensors
from torchvision.transforms.v2 import functional as F
from torchvision.transforms import v2 as T
class PennFudanDataset(torch.utils.data.Dataset):
def __init__(self, root, transforms):
self.root = root
self.transforms = transforms
# load all image files, sorting them to
# ensure that they are aligned
self.imgs = list(sorted(os.listdir(os.path.join(root, "PNGImages"))))
self.masks = list(sorted(os.listdir(os.path.join(root, "PedMasks"))))
def __getitem__(self, idx):
# load images and masks
= os.path.join(self.root, "PNGImages", self.imgs[idx])
img_path = os.path.join(self.root, "PedMasks", self.masks[idx])
mask_path = read_image(img_path)
img = read_image(mask_path)
mask # instances are encoded as different colors
= torch.unique(mask)
obj_ids # first id is the background, so remove it
= obj_ids[1:]
obj_ids = len(obj_ids)
num_objs
# split the color-encoded mask into a set
# of binary masks
= (mask == obj_ids[:, None, None]).to(dtype=torch.uint8)
masks
# get bounding box coordinates for each mask
= masks_to_boxes(masks)
boxes
# there is only one class
= torch.ones((num_objs,), dtype=torch.int64)
labels
= idx
image_id = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
area # suppose all instances are not crowd
= torch.zeros((num_objs,), dtype=torch.int64)
iscrowd
# Wrap sample and targets into torchvision tv_tensors:
= tv_tensors.Image(img)
img
= {}
target "boxes"] = tv_tensors.BoundingBoxes(boxes, format="XYXY", canvas_size=F.get_size(img))
target["masks"] = tv_tensors.Mask(masks)
target["labels"] = labels
target["image_id"] = image_id
target["area"] = area
target["iscrowd"] = iscrowd
target[
if self.transforms is not None:
= self.transforms(img, target)
img, target
return img, target
def __len__(self):
return len(self.imgs)
def get_transform(train):
= []
transforms if train:
0.5))
transforms.append(T.RandomHorizontalFlip(float, scale=True))
transforms.append(T.ToDtype(torch.
transforms.append(T.ToPureTensor())return T.Compose(transforms)
%%capture
"wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/engine.py")
os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/utils.py")
os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/coco_utils.py")
os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/coco_eval.py")
os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/transforms.py")
os.system(
!wget https://www.cis.upenn.edu/~jshi/ped_html/PennFudanPed.zip
!unzip PennFudanPed.zip -d ./
import utils
from engine import train_one_epoch, evaluate
# train on the GPU or on the CPU, if a GPU is not available
= torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
device
# our dataset has two classes only - background and person
= 2
num_classes # use our dataset and defined transformations
= PennFudanDataset('PennFudanPed', get_transform(train=True))
dataset = PennFudanDataset('PennFudanPed', get_transform(train=False))
dataset_test
# split the dataset in train and test set
= torch.randperm(len(dataset)).tolist()
indices = torch.utils.data.Subset(dataset, indices[:-50])
dataset = torch.utils.data.Subset(dataset_test, indices[-50:])
dataset_test
# define training and validation data loaders
= torch.utils.data.DataLoader(
data_loader
dataset,=1,
batch_size=True,
shuffle=1,
num_workers=utils.collate_fn
collate_fn
)
= torch.utils.data.DataLoader(
data_loader_test
dataset_test,=1,
batch_size=False,
shuffle=1,
num_workers=utils.collate_fn
collate_fn )
# move model to the right device
quant_rcnn.to(device)
# construct an optimizer
= [p for p in quant_rcnn.parameters() if p.requires_grad]
params = torch.optim.SGD(
optimizer
params,=0.005,
lr=0.9,
momentum=0.0005
weight_decay
)
# and a learning rate scheduler
= torch.optim.lr_scheduler.StepLR(
lr_scheduler
optimizer,=3,
step_size=0.1
gamma
)
# let's train it for 10 epochs
= 10
num_epochs
for epoch in range(num_epochs):
# train for one epoch, printing every 10 iterations
=20)
train_one_epoch(quant_rcnn, optimizer, data_loader, device, epoch, print_freq# update the learning rate
lr_scheduler.step()# evaluate on the test dataset
=device) evaluate(quant_rcnn, data_loader_test, device
Epoch: [0] [ 0/120] eta: 0:00:51 lr: 0.000047 loss: 1.4356 (1.4356) loss_classifier: 0.6798 (0.6798) loss_box_reg: 0.0042 (0.0042) loss_objectness: 0.6744 (0.6744) loss_rpn_box_reg: 0.0773 (0.0773) time: 0.4267 data: 0.1246 max mem: 4118
Epoch: [0] [ 20/120] eta: 0:00:34 lr: 0.000886 loss: 0.7461 (0.8032) loss_classifier: 0.1333 (0.2406) loss_box_reg: 0.0412 (0.0626) loss_objectness: 0.5052 (0.4778) loss_rpn_box_reg: 0.0116 (0.0222) time: 0.3406 data: 0.0036 max mem: 4118
Epoch: [0] [ 40/120] eta: 0:00:27 lr: 0.001726 loss: 0.3244 (0.5867) loss_classifier: 0.1122 (0.1902) loss_box_reg: 0.0930 (0.0879) loss_objectness: 0.0811 (0.2863) loss_rpn_box_reg: 0.0191 (0.0224) time: 0.3351 data: 0.0035 max mem: 4118
Epoch: [0] [ 60/120] eta: 0:00:20 lr: 0.002565 loss: 0.3139 (0.5107) loss_classifier: 0.1092 (0.1722) loss_box_reg: 0.1229 (0.1013) loss_objectness: 0.0516 (0.2141) loss_rpn_box_reg: 0.0137 (0.0231) time: 0.3380 data: 0.0034 max mem: 4118
Epoch: [0] [ 80/120] eta: 0:00:13 lr: 0.003405 loss: 0.2165 (0.4596) loss_classifier: 0.0779 (0.1560) loss_box_reg: 0.0868 (0.1079) loss_objectness: 0.0359 (0.1730) loss_rpn_box_reg: 0.0122 (0.0226) time: 0.3359 data: 0.0038 max mem: 4118
Epoch: [0] [100/120] eta: 0:00:06 lr: 0.004244 loss: 0.1732 (0.4241) loss_classifier: 0.0546 (0.1431) loss_box_reg: 0.0924 (0.1116) loss_objectness: 0.0328 (0.1473) loss_rpn_box_reg: 0.0105 (0.0221) time: 0.3306 data: 0.0034 max mem: 4118
Epoch: [0] [119/120] eta: 0:00:00 lr: 0.005000 loss: 0.1507 (0.4001) loss_classifier: 0.0599 (0.1343) loss_box_reg: 0.0839 (0.1141) loss_objectness: 0.0171 (0.1304) loss_rpn_box_reg: 0.0104 (0.0213) time: 0.3301 data: 0.0033 max mem: 4118
Epoch: [0] Total time: 0:00:40 (0.3369 s / it)
creating index...
index created!
Test: [ 0/50] eta: 0:00:18 model_time: 0.2536 (0.2536) evaluator_time: 0.0039 (0.0039) time: 0.3654 data: 0.1060 max mem: 4118
Test: [49/50] eta: 0:00:00 model_time: 0.2215 (0.2262) evaluator_time: 0.0017 (0.0025) time: 0.2302 data: 0.0030 max mem: 4118
Test: Total time: 0:00:11 (0.2365 s / it)
Averaged stats: model_time: 0.2215 (0.2262) evaluator_time: 0.0017 (0.0025)
Accumulating evaluation results...
DONE (t=0.02s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.251
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.661
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.067
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.024
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.274
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.116
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.387
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.400
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.022
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.438
Epoch: [1] [ 0/120] eta: 0:00:54 lr: 0.005000 loss: 0.1070 (0.1070) loss_classifier: 0.0222 (0.0222) loss_box_reg: 0.0459 (0.0459) loss_objectness: 0.0226 (0.0226) loss_rpn_box_reg: 0.0164 (0.0164) time: 0.4527 data: 0.0874 max mem: 4118
Epoch: [1] [ 20/120] eta: 0:00:34 lr: 0.005000 loss: 0.2121 (0.2348) loss_classifier: 0.0590 (0.0804) loss_box_reg: 0.0915 (0.1130) loss_objectness: 0.0149 (0.0175) loss_rpn_box_reg: 0.0153 (0.0238) time: 0.3356 data: 0.0033 max mem: 4118
Epoch: [1] [ 40/120] eta: 0:00:27 lr: 0.005000 loss: 0.2197 (0.2434) loss_classifier: 0.0628 (0.0799) loss_box_reg: 0.1179 (0.1204) loss_objectness: 0.0132 (0.0157) loss_rpn_box_reg: 0.0287 (0.0274) time: 0.3409 data: 0.0036 max mem: 4118
Epoch: [1] [ 60/120] eta: 0:00:20 lr: 0.005000 loss: 0.2105 (0.2374) loss_classifier: 0.0591 (0.0756) loss_box_reg: 0.1001 (0.1178) loss_objectness: 0.0117 (0.0164) loss_rpn_box_reg: 0.0176 (0.0276) time: 0.3342 data: 0.0034 max mem: 4118
Epoch: [1] [ 80/120] eta: 0:00:13 lr: 0.005000 loss: 0.1133 (0.2263) loss_classifier: 0.0351 (0.0707) loss_box_reg: 0.0642 (0.1158) loss_objectness: 0.0089 (0.0152) loss_rpn_box_reg: 0.0112 (0.0246) time: 0.3325 data: 0.0034 max mem: 4118
Epoch: [1] [100/120] eta: 0:00:06 lr: 0.005000 loss: 0.1559 (0.2291) loss_classifier: 0.0573 (0.0720) loss_box_reg: 0.0860 (0.1200) loss_objectness: 0.0060 (0.0138) loss_rpn_box_reg: 0.0135 (0.0233) time: 0.3367 data: 0.0034 max mem: 4118
Epoch: [1] [119/120] eta: 0:00:00 lr: 0.005000 loss: 0.1403 (0.2247) loss_classifier: 0.0440 (0.0706) loss_box_reg: 0.0860 (0.1195) loss_objectness: 0.0046 (0.0124) loss_rpn_box_reg: 0.0103 (0.0221) time: 0.3353 data: 0.0034 max mem: 4118
Epoch: [1] Total time: 0:00:40 (0.3375 s / it)
creating index...
index created!
Test: [ 0/50] eta: 0:00:19 model_time: 0.2691 (0.2691) evaluator_time: 0.0068 (0.0068) time: 0.3851 data: 0.1073 max mem: 4118
Test: [49/50] eta: 0:00:00 model_time: 0.2217 (0.2271) evaluator_time: 0.0014 (0.0024) time: 0.2323 data: 0.0040 max mem: 4118
Test: Total time: 0:00:11 (0.2382 s / it)
Averaged stats: model_time: 0.2217 (0.2271) evaluator_time: 0.0014 (0.0024)
Accumulating evaluation results...
DONE (t=0.02s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.452
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.912
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.297
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.006
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.186
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.485
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.223
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.556
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.565
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.250
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.378
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.586
Epoch: [2] [ 0/120] eta: 0:00:58 lr: 0.005000 loss: 0.3480 (0.3480) loss_classifier: 0.0969 (0.0969) loss_box_reg: 0.2045 (0.2045) loss_objectness: 0.0084 (0.0084) loss_rpn_box_reg: 0.0382 (0.0382) time: 0.4848 data: 0.0990 max mem: 4118
Epoch: [2] [ 20/120] eta: 0:00:34 lr: 0.005000 loss: 0.1341 (0.1999) loss_classifier: 0.0486 (0.0573) loss_box_reg: 0.0765 (0.1212) loss_objectness: 0.0035 (0.0042) loss_rpn_box_reg: 0.0100 (0.0172) time: 0.3420 data: 0.0034 max mem: 4118
Epoch: [2] [ 40/120] eta: 0:00:27 lr: 0.005000 loss: 0.0984 (0.1881) loss_classifier: 0.0389 (0.0531) loss_box_reg: 0.0616 (0.1145) loss_objectness: 0.0021 (0.0049) loss_rpn_box_reg: 0.0110 (0.0156) time: 0.3282 data: 0.0035 max mem: 4118
Epoch: [2] [ 60/120] eta: 0:00:20 lr: 0.005000 loss: 0.1335 (0.1769) loss_classifier: 0.0371 (0.0497) loss_box_reg: 0.0836 (0.1082) loss_objectness: 0.0018 (0.0047) loss_rpn_box_reg: 0.0074 (0.0143) time: 0.3309 data: 0.0036 max mem: 4118
Epoch: [2] [ 80/120] eta: 0:00:13 lr: 0.005000 loss: 0.1453 (0.1743) loss_classifier: 0.0383 (0.0483) loss_box_reg: 0.0852 (0.1070) loss_objectness: 0.0028 (0.0044) loss_rpn_box_reg: 0.0136 (0.0147) time: 0.3397 data: 0.0034 max mem: 4118
Epoch: [2] [100/120] eta: 0:00:06 lr: 0.005000 loss: 0.1703 (0.1798) loss_classifier: 0.0410 (0.0487) loss_box_reg: 0.1154 (0.1093) loss_objectness: 0.0050 (0.0048) loss_rpn_box_reg: 0.0190 (0.0170) time: 0.3397 data: 0.0036 max mem: 4118
Epoch: [2] [119/120] eta: 0:00:00 lr: 0.005000 loss: 0.1078 (0.1789) loss_classifier: 0.0296 (0.0482) loss_box_reg: 0.0621 (0.1084) loss_objectness: 0.0037 (0.0050) loss_rpn_box_reg: 0.0111 (0.0174) time: 0.3413 data: 0.0035 max mem: 4118
Epoch: [2] Total time: 0:00:40 (0.3392 s / it)
creating index...
index created!
Test: [ 0/50] eta: 0:00:20 model_time: 0.2938 (0.2938) evaluator_time: 0.0044 (0.0044) time: 0.4081 data: 0.1080 max mem: 4118
Test: [49/50] eta: 0:00:00 model_time: 0.2222 (0.2301) evaluator_time: 0.0012 (0.0019) time: 0.2304 data: 0.0031 max mem: 4118
Test: Total time: 0:00:11 (0.2398 s / it)
Averaged stats: model_time: 0.2222 (0.2301) evaluator_time: 0.0012 (0.0019)
Accumulating evaluation results...
DONE (t=0.02s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.459
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.935
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.320
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.050
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.225
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.481
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.236
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.548
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.563
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.050
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.511
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.576
Epoch: [3] [ 0/120] eta: 0:00:53 lr: 0.000500 loss: 0.1714 (0.1714) loss_classifier: 0.0475 (0.0475) loss_box_reg: 0.1006 (0.1006) loss_objectness: 0.0148 (0.0148) loss_rpn_box_reg: 0.0085 (0.0085) time: 0.4497 data: 0.0858 max mem: 4118
Epoch: [3] [ 20/120] eta: 0:00:33 lr: 0.000500 loss: 0.1223 (0.1611) loss_classifier: 0.0343 (0.0433) loss_box_reg: 0.0736 (0.0934) loss_objectness: 0.0054 (0.0082) loss_rpn_box_reg: 0.0094 (0.0161) time: 0.3345 data: 0.0034 max mem: 4118
Epoch: [3] [ 40/120] eta: 0:00:27 lr: 0.000500 loss: 0.1039 (0.1498) loss_classifier: 0.0250 (0.0406) loss_box_reg: 0.0655 (0.0887) loss_objectness: 0.0042 (0.0069) loss_rpn_box_reg: 0.0079 (0.0136) time: 0.3398 data: 0.0043 max mem: 4118
Epoch: [3] [ 60/120] eta: 0:00:20 lr: 0.000500 loss: 0.1237 (0.1445) loss_classifier: 0.0331 (0.0395) loss_box_reg: 0.0722 (0.0850) loss_objectness: 0.0045 (0.0065) loss_rpn_box_reg: 0.0057 (0.0135) time: 0.3353 data: 0.0036 max mem: 4118
Epoch: [3] [ 80/120] eta: 0:00:13 lr: 0.000500 loss: 0.1128 (0.1440) loss_classifier: 0.0337 (0.0392) loss_box_reg: 0.0734 (0.0857) loss_objectness: 0.0039 (0.0061) loss_rpn_box_reg: 0.0069 (0.0130) time: 0.3368 data: 0.0034 max mem: 4118
Epoch: [3] [100/120] eta: 0:00:06 lr: 0.000500 loss: 0.1441 (0.1482) loss_classifier: 0.0359 (0.0405) loss_box_reg: 0.0901 (0.0893) loss_objectness: 0.0033 (0.0058) loss_rpn_box_reg: 0.0099 (0.0126) time: 0.3267 data: 0.0035 max mem: 4118
Epoch: [3] [119/120] eta: 0:00:00 lr: 0.000500 loss: 0.1051 (0.1469) loss_classifier: 0.0276 (0.0398) loss_box_reg: 0.0678 (0.0895) loss_objectness: 0.0020 (0.0054) loss_rpn_box_reg: 0.0067 (0.0122) time: 0.3282 data: 0.0040 max mem: 4118
Epoch: [3] Total time: 0:00:40 (0.3354 s / it)
creating index...
index created!
Test: [ 0/50] eta: 0:00:18 model_time: 0.2583 (0.2583) evaluator_time: 0.0039 (0.0039) time: 0.3704 data: 0.1063 max mem: 4118
Test: [49/50] eta: 0:00:00 model_time: 0.2236 (0.2284) evaluator_time: 0.0012 (0.0017) time: 0.2295 data: 0.0030 max mem: 4118
Test: Total time: 0:00:11 (0.2384 s / it)
Averaged stats: model_time: 0.2236 (0.2284) evaluator_time: 0.0012 (0.0017)
Accumulating evaluation results...
DONE (t=0.02s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.538
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.950
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.583
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.327
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.562
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.269
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.604
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.606
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.511
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.625
Epoch: [4] [ 0/120] eta: 0:00:57 lr: 0.000500 loss: 0.0602 (0.0602) loss_classifier: 0.0157 (0.0157) loss_box_reg: 0.0431 (0.0431) loss_objectness: 0.0002 (0.0002) loss_rpn_box_reg: 0.0012 (0.0012) time: 0.4826 data: 0.0982 max mem: 4118
Epoch: [4] [ 20/120] eta: 0:00:34 lr: 0.000500 loss: 0.1051 (0.1215) loss_classifier: 0.0246 (0.0325) loss_box_reg: 0.0664 (0.0777) loss_objectness: 0.0020 (0.0024) loss_rpn_box_reg: 0.0048 (0.0089) time: 0.3355 data: 0.0034 max mem: 4118
Epoch: [4] [ 40/120] eta: 0:00:27 lr: 0.000500 loss: 0.1095 (0.1227) loss_classifier: 0.0317 (0.0330) loss_box_reg: 0.0637 (0.0783) loss_objectness: 0.0026 (0.0028) loss_rpn_box_reg: 0.0056 (0.0085) time: 0.3335 data: 0.0035 max mem: 4118
Epoch: [4] [ 60/120] eta: 0:00:20 lr: 0.000500 loss: 0.1261 (0.1309) loss_classifier: 0.0346 (0.0349) loss_box_reg: 0.0821 (0.0832) loss_objectness: 0.0018 (0.0029) loss_rpn_box_reg: 0.0095 (0.0098) time: 0.3379 data: 0.0034 max mem: 4118
Epoch: [4] [ 80/120] eta: 0:00:13 lr: 0.000500 loss: 0.1560 (0.1396) loss_classifier: 0.0446 (0.0380) loss_box_reg: 0.0914 (0.0881) loss_objectness: 0.0025 (0.0034) loss_rpn_box_reg: 0.0076 (0.0101) time: 0.3355 data: 0.0035 max mem: 4118
Epoch: [4] [100/120] eta: 0:00:06 lr: 0.000500 loss: 0.1295 (0.1440) loss_classifier: 0.0324 (0.0394) loss_box_reg: 0.0877 (0.0905) loss_objectness: 0.0026 (0.0038) loss_rpn_box_reg: 0.0076 (0.0103) time: 0.3268 data: 0.0034 max mem: 4118
Epoch: [4] [119/120] eta: 0:00:00 lr: 0.000500 loss: 0.1028 (0.1440) loss_classifier: 0.0250 (0.0395) loss_box_reg: 0.0653 (0.0907) loss_objectness: 0.0020 (0.0036) loss_rpn_box_reg: 0.0063 (0.0103) time: 0.3404 data: 0.0035 max mem: 4118
Epoch: [4] Total time: 0:00:40 (0.3367 s / it)
creating index...
index created!
Test: [ 0/50] eta: 0:00:18 model_time: 0.2581 (0.2581) evaluator_time: 0.0042 (0.0042) time: 0.3692 data: 0.1050 max mem: 4118
Test: [49/50] eta: 0:00:00 model_time: 0.2333 (0.2339) evaluator_time: 0.0013 (0.0018) time: 0.2418 data: 0.0033 max mem: 4118
Test: Total time: 0:00:12 (0.2438 s / it)
Averaged stats: model_time: 0.2333 (0.2339) evaluator_time: 0.0013 (0.0018)
Accumulating evaluation results...
DONE (t=0.02s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.583
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.971
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.633
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.025
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.325
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.609
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.286
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.646
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.649
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.100
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.511
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.670
Epoch: [5] [ 0/120] eta: 0:00:56 lr: 0.000500 loss: 0.0435 (0.0435) loss_classifier: 0.0182 (0.0182) loss_box_reg: 0.0232 (0.0232) loss_objectness: 0.0003 (0.0003) loss_rpn_box_reg: 0.0019 (0.0019) time: 0.4734 data: 0.0969 max mem: 4118
Epoch: [5] [ 20/120] eta: 0:00:35 lr: 0.000500 loss: 0.1120 (0.1410) loss_classifier: 0.0310 (0.0395) loss_box_reg: 0.0774 (0.0894) loss_objectness: 0.0016 (0.0022) loss_rpn_box_reg: 0.0084 (0.0098) time: 0.3440 data: 0.0039 max mem: 4118
Epoch: [5] [ 40/120] eta: 0:00:27 lr: 0.000500 loss: 0.1002 (0.1318) loss_classifier: 0.0299 (0.0363) loss_box_reg: 0.0614 (0.0834) loss_objectness: 0.0017 (0.0030) loss_rpn_box_reg: 0.0032 (0.0092) time: 0.3488 data: 0.0035 max mem: 4118
Epoch: [5] [ 60/120] eta: 0:00:20 lr: 0.000500 loss: 0.1339 (0.1381) loss_classifier: 0.0293 (0.0373) loss_box_reg: 0.0939 (0.0883) loss_objectness: 0.0016 (0.0031) loss_rpn_box_reg: 0.0062 (0.0094) time: 0.3475 data: 0.0038 max mem: 4118
Epoch: [5] [ 80/120] eta: 0:00:13 lr: 0.000500 loss: 0.1281 (0.1397) loss_classifier: 0.0369 (0.0377) loss_box_reg: 0.0801 (0.0898) loss_objectness: 0.0013 (0.0029) loss_rpn_box_reg: 0.0071 (0.0093) time: 0.3423 data: 0.0034 max mem: 4118
Epoch: [5] [100/120] eta: 0:00:06 lr: 0.000500 loss: 0.1085 (0.1398) loss_classifier: 0.0381 (0.0377) loss_box_reg: 0.0637 (0.0892) loss_objectness: 0.0011 (0.0029) loss_rpn_box_reg: 0.0064 (0.0100) time: 0.3382 data: 0.0033 max mem: 4118
Epoch: [5] [119/120] eta: 0:00:00 lr: 0.000500 loss: 0.1180 (0.1382) loss_classifier: 0.0275 (0.0378) loss_box_reg: 0.0697 (0.0880) loss_objectness: 0.0012 (0.0027) loss_rpn_box_reg: 0.0055 (0.0096) time: 0.3403 data: 0.0034 max mem: 4118
Epoch: [5] Total time: 0:00:41 (0.3456 s / it)
creating index...
index created!
Test: [ 0/50] eta: 0:00:18 model_time: 0.2561 (0.2561) evaluator_time: 0.0037 (0.0037) time: 0.3699 data: 0.1082 max mem: 4118
Test: [49/50] eta: 0:00:00 model_time: 0.2226 (0.2282) evaluator_time: 0.0011 (0.0016) time: 0.2313 data: 0.0031 max mem: 4118
Test: Total time: 0:00:11 (0.2378 s / it)
Averaged stats: model_time: 0.2226 (0.2282) evaluator_time: 0.0011 (0.0016)
Accumulating evaluation results...
DONE (t=0.02s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.585
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.955
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.682
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.282
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.615
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.285
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.645
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.651
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.500
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.675
Epoch: [6] [ 0/120] eta: 0:00:54 lr: 0.000050 loss: 0.0497 (0.0497) loss_classifier: 0.0101 (0.0101) loss_box_reg: 0.0377 (0.0377) loss_objectness: 0.0006 (0.0006) loss_rpn_box_reg: 0.0013 (0.0013) time: 0.4527 data: 0.0854 max mem: 4118
Epoch: [6] [ 20/120] eta: 0:00:35 lr: 0.000050 loss: 0.1043 (0.1187) loss_classifier: 0.0288 (0.0338) loss_box_reg: 0.0714 (0.0760) loss_objectness: 0.0013 (0.0016) loss_rpn_box_reg: 0.0054 (0.0073) time: 0.3507 data: 0.0036 max mem: 4118
Epoch: [6] [ 40/120] eta: 0:00:27 lr: 0.000050 loss: 0.1110 (0.1172) loss_classifier: 0.0277 (0.0320) loss_box_reg: 0.0755 (0.0755) loss_objectness: 0.0018 (0.0021) loss_rpn_box_reg: 0.0057 (0.0076) time: 0.3412 data: 0.0037 max mem: 4118
Epoch: [6] [ 60/120] eta: 0:00:20 lr: 0.000050 loss: 0.1067 (0.1196) loss_classifier: 0.0275 (0.0325) loss_box_reg: 0.0619 (0.0772) loss_objectness: 0.0012 (0.0025) loss_rpn_box_reg: 0.0031 (0.0074) time: 0.3431 data: 0.0041 max mem: 4118
Epoch: [6] [ 80/120] eta: 0:00:13 lr: 0.000050 loss: 0.1232 (0.1236) loss_classifier: 0.0333 (0.0336) loss_box_reg: 0.0801 (0.0791) loss_objectness: 0.0015 (0.0026) loss_rpn_box_reg: 0.0056 (0.0083) time: 0.3325 data: 0.0034 max mem: 4118
Epoch: [6] [100/120] eta: 0:00:06 lr: 0.000050 loss: 0.1273 (0.1281) loss_classifier: 0.0300 (0.0351) loss_box_reg: 0.0806 (0.0819) loss_objectness: 0.0015 (0.0029) loss_rpn_box_reg: 0.0062 (0.0082) time: 0.3556 data: 0.0042 max mem: 4118
Epoch: [6] [119/120] eta: 0:00:00 lr: 0.000050 loss: 0.0951 (0.1297) loss_classifier: 0.0295 (0.0357) loss_box_reg: 0.0680 (0.0830) loss_objectness: 0.0015 (0.0027) loss_rpn_box_reg: 0.0039 (0.0083) time: 0.3447 data: 0.0040 max mem: 4118
Epoch: [6] Total time: 0:00:41 (0.3464 s / it)
creating index...
index created!
Test: [ 0/50] eta: 0:00:18 model_time: 0.2564 (0.2564) evaluator_time: 0.0036 (0.0036) time: 0.3700 data: 0.1081 max mem: 4118
Test: [49/50] eta: 0:00:00 model_time: 0.2383 (0.2413) evaluator_time: 0.0013 (0.0017) time: 0.2628 data: 0.0039 max mem: 4118
Test: Total time: 0:00:12 (0.2515 s / it)
Averaged stats: model_time: 0.2383 (0.2413) evaluator_time: 0.0013 (0.0017)
Accumulating evaluation results...
DONE (t=0.01s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.602
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.961
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.693
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.301
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.634
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.288
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.662
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.667
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.478
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.695
Epoch: [7] [ 0/120] eta: 0:00:54 lr: 0.000050 loss: 0.0397 (0.0397) loss_classifier: 0.0133 (0.0133) loss_box_reg: 0.0253 (0.0253) loss_objectness: 0.0010 (0.0010) loss_rpn_box_reg: 0.0001 (0.0001) time: 0.4533 data: 0.0919 max mem: 4118
Epoch: [7] [ 20/120] eta: 0:00:35 lr: 0.000050 loss: 0.1022 (0.1146) loss_classifier: 0.0279 (0.0318) loss_box_reg: 0.0659 (0.0724) loss_objectness: 0.0010 (0.0017) loss_rpn_box_reg: 0.0057 (0.0087) time: 0.3465 data: 0.0035 max mem: 4118
Epoch: [7] [ 40/120] eta: 0:00:27 lr: 0.000050 loss: 0.1042 (0.1220) loss_classifier: 0.0261 (0.0332) loss_box_reg: 0.0658 (0.0781) loss_objectness: 0.0013 (0.0020) loss_rpn_box_reg: 0.0059 (0.0088) time: 0.3374 data: 0.0036 max mem: 4118
Epoch: [7] [ 60/120] eta: 0:00:20 lr: 0.000050 loss: 0.1504 (0.1340) loss_classifier: 0.0459 (0.0371) loss_box_reg: 0.0936 (0.0850) loss_objectness: 0.0019 (0.0022) loss_rpn_box_reg: 0.0067 (0.0097) time: 0.3477 data: 0.0037 max mem: 4118
Epoch: [7] [ 80/120] eta: 0:00:13 lr: 0.000050 loss: 0.0815 (0.1321) loss_classifier: 0.0239 (0.0366) loss_box_reg: 0.0501 (0.0832) loss_objectness: 0.0021 (0.0031) loss_rpn_box_reg: 0.0062 (0.0091) time: 0.3354 data: 0.0037 max mem: 4118
Epoch: [7] [100/120] eta: 0:00:06 lr: 0.000050 loss: 0.0972 (0.1313) loss_classifier: 0.0236 (0.0359) loss_box_reg: 0.0704 (0.0836) loss_objectness: 0.0016 (0.0029) loss_rpn_box_reg: 0.0051 (0.0089) time: 0.3420 data: 0.0037 max mem: 4118
Epoch: [7] [119/120] eta: 0:00:00 lr: 0.000050 loss: 0.0925 (0.1308) loss_classifier: 0.0259 (0.0360) loss_box_reg: 0.0534 (0.0835) loss_objectness: 0.0010 (0.0026) loss_rpn_box_reg: 0.0040 (0.0087) time: 0.3312 data: 0.0035 max mem: 4118
Epoch: [7] Total time: 0:00:41 (0.3420 s / it)
creating index...
index created!
Test: [ 0/50] eta: 0:00:19 model_time: 0.2790 (0.2790) evaluator_time: 0.0040 (0.0040) time: 0.3950 data: 0.1099 max mem: 4118
Test: [49/50] eta: 0:00:00 model_time: 0.2328 (0.2311) evaluator_time: 0.0012 (0.0017) time: 0.2403 data: 0.0032 max mem: 4118
Test: Total time: 0:00:12 (0.2411 s / it)
Averaged stats: model_time: 0.2328 (0.2311) evaluator_time: 0.0012 (0.0017)
Accumulating evaluation results...
DONE (t=0.02s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.606
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.965
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.691
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.309
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.637
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.290
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.660
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.664
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.478
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.692
Epoch: [8] [ 0/120] eta: 0:01:06 lr: 0.000050 loss: 0.3302 (0.3302) loss_classifier: 0.0762 (0.0762) loss_box_reg: 0.2121 (0.2121) loss_objectness: 0.0069 (0.0069) loss_rpn_box_reg: 0.0350 (0.0350) time: 0.5563 data: 0.1245 max mem: 4118
Epoch: [8] [ 20/120] eta: 0:00:34 lr: 0.000050 loss: 0.0942 (0.1334) loss_classifier: 0.0245 (0.0366) loss_box_reg: 0.0559 (0.0851) loss_objectness: 0.0008 (0.0020) loss_rpn_box_reg: 0.0059 (0.0096) time: 0.3338 data: 0.0036 max mem: 4118
Epoch: [8] [ 40/120] eta: 0:00:27 lr: 0.000050 loss: 0.0989 (0.1257) loss_classifier: 0.0283 (0.0342) loss_box_reg: 0.0635 (0.0807) loss_objectness: 0.0007 (0.0023) loss_rpn_box_reg: 0.0052 (0.0084) time: 0.3320 data: 0.0034 max mem: 4118
Epoch: [8] [ 60/120] eta: 0:00:20 lr: 0.000050 loss: 0.1265 (0.1325) loss_classifier: 0.0325 (0.0360) loss_box_reg: 0.0893 (0.0857) loss_objectness: 0.0014 (0.0027) loss_rpn_box_reg: 0.0044 (0.0082) time: 0.3368 data: 0.0035 max mem: 4118
Epoch: [8] [ 80/120] eta: 0:00:13 lr: 0.000050 loss: 0.0977 (0.1320) loss_classifier: 0.0235 (0.0357) loss_box_reg: 0.0558 (0.0858) loss_objectness: 0.0011 (0.0025) loss_rpn_box_reg: 0.0047 (0.0080) time: 0.3376 data: 0.0035 max mem: 4118
Epoch: [8] [100/120] eta: 0:00:06 lr: 0.000050 loss: 0.1137 (0.1329) loss_classifier: 0.0276 (0.0362) loss_box_reg: 0.0748 (0.0859) loss_objectness: 0.0012 (0.0024) loss_rpn_box_reg: 0.0038 (0.0084) time: 0.3383 data: 0.0036 max mem: 4118
Epoch: [8] [119/120] eta: 0:00:00 lr: 0.000050 loss: 0.1221 (0.1300) loss_classifier: 0.0341 (0.0358) loss_box_reg: 0.0712 (0.0832) loss_objectness: 0.0021 (0.0026) loss_rpn_box_reg: 0.0057 (0.0084) time: 0.3293 data: 0.0034 max mem: 4118
Epoch: [8] Total time: 0:00:40 (0.3374 s / it)
creating index...
index created!
Test: [ 0/50] eta: 0:00:18 model_time: 0.2610 (0.2610) evaluator_time: 0.0036 (0.0036) time: 0.3756 data: 0.1088 max mem: 4118
Test: [49/50] eta: 0:00:00 model_time: 0.2253 (0.2335) evaluator_time: 0.0012 (0.0017) time: 0.2361 data: 0.0032 max mem: 4118
Test: Total time: 0:00:12 (0.2433 s / it)
Averaged stats: model_time: 0.2253 (0.2335) evaluator_time: 0.0012 (0.0017)
Accumulating evaluation results...
DONE (t=0.01s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.597
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.958
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.666
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.294
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.629
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.289
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.655
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.660
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.478
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.686
Epoch: [9] [ 0/120] eta: 0:01:02 lr: 0.000005 loss: 0.2162 (0.2162) loss_classifier: 0.0581 (0.0581) loss_box_reg: 0.1423 (0.1423) loss_objectness: 0.0041 (0.0041) loss_rpn_box_reg: 0.0117 (0.0117) time: 0.5208 data: 0.1063 max mem: 4118
Epoch: [9] [ 20/120] eta: 0:00:35 lr: 0.000005 loss: 0.0883 (0.1108) loss_classifier: 0.0271 (0.0284) loss_box_reg: 0.0535 (0.0702) loss_objectness: 0.0010 (0.0027) loss_rpn_box_reg: 0.0041 (0.0094) time: 0.3453 data: 0.0035 max mem: 4118
Epoch: [9] [ 40/120] eta: 0:00:28 lr: 0.000005 loss: 0.1093 (0.1252) loss_classifier: 0.0293 (0.0329) loss_box_reg: 0.0642 (0.0813) loss_objectness: 0.0010 (0.0022) loss_rpn_box_reg: 0.0046 (0.0089) time: 0.3466 data: 0.0036 max mem: 4118
Epoch: [9] [ 60/120] eta: 0:00:20 lr: 0.000005 loss: 0.0878 (0.1183) loss_classifier: 0.0182 (0.0316) loss_box_reg: 0.0387 (0.0765) loss_objectness: 0.0009 (0.0023) loss_rpn_box_reg: 0.0036 (0.0079) time: 0.3363 data: 0.0036 max mem: 4118
Epoch: [9] [ 80/120] eta: 0:00:13 lr: 0.000005 loss: 0.1186 (0.1250) loss_classifier: 0.0355 (0.0338) loss_box_reg: 0.0691 (0.0806) loss_objectness: 0.0009 (0.0022) loss_rpn_box_reg: 0.0057 (0.0085) time: 0.3427 data: 0.0037 max mem: 4118
Epoch: [9] [100/120] eta: 0:00:06 lr: 0.000005 loss: 0.1115 (0.1301) loss_classifier: 0.0311 (0.0347) loss_box_reg: 0.0739 (0.0850) loss_objectness: 0.0017 (0.0021) loss_rpn_box_reg: 0.0062 (0.0084) time: 0.3343 data: 0.0035 max mem: 4118
Epoch: [9] [119/120] eta: 0:00:00 lr: 0.000005 loss: 0.0857 (0.1269) loss_classifier: 0.0257 (0.0340) loss_box_reg: 0.0551 (0.0826) loss_objectness: 0.0013 (0.0021) loss_rpn_box_reg: 0.0066 (0.0081) time: 0.3332 data: 0.0038 max mem: 4118
Epoch: [9] Total time: 0:00:41 (0.3417 s / it)
creating index...
index created!
Test: [ 0/50] eta: 0:00:18 model_time: 0.2536 (0.2536) evaluator_time: 0.0036 (0.0036) time: 0.3713 data: 0.1122 max mem: 4118
Test: [49/50] eta: 0:00:00 model_time: 0.2274 (0.2330) evaluator_time: 0.0012 (0.0017) time: 0.2359 data: 0.0033 max mem: 4118
Test: Total time: 0:00:12 (0.2430 s / it)
Averaged stats: model_time: 0.2274 (0.2330) evaluator_time: 0.0012 (0.0017)
Accumulating evaluation results...
DONE (t=0.02s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.602
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.964
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.679
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.315
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.633
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.288
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.660
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.664
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.478
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.691
# convert to quantized
'cpu'))
quant_rcnn.to(torch.device(eval()
quant_rcnn.= quantize_fx.convert_fx(quant_rcnn.backbone) quant_rcnn.backbone
In the previous post, I saved the state dict of the model. This was a bit cumbersome when re-loading as I had to reinstantiate the model and perform conversion again before I could load the state dict. So here I’ll save it as a torch script which doesn’t have a class definition dependency nor does it require the same conversion steps. This is great because it decouples model creation with usage as long as input/output signatures are known. Once the model is saved, it becomes a recursive script module:
RecursiveScriptModule(
original_name=FasterRCNN
(transform): RecursiveScriptModule(original_name=GeneralizedRCNNTransform)
(backbone): RecursiveScriptModule(
original_name=GraphModule
(body): RecursiveScriptModule(
original_name=Module
(conv1): RecursiveScriptModule(original_name=ConvReLU2d)
(maxpool): RecursiveScriptModule(original_name=MaxPool2d)
(layer1): RecursiveScriptModule(
original_name=Module
(0): RecursiveScriptModule(
original_name=Module
(conv1): RecursiveScriptModule(original_name=ConvReLU2d)
(conv2): RecursiveScriptModule(original_name=ConvReLU2d)
(conv3): RecursiveScriptModule(original_name=Conv2d)
(downsample): RecursiveScriptModule(
original_name=Module
(0): RecursiveScriptModule(original_name=Conv2d)
)
)
...
= torch.jit.script(quant_rcnn)
script_module "./quant_rcnn_torchscript.pt")
script_module.save(
= torch.jit.load("./quant_rcnn_torchscript.pt", map_location=torch.device('cpu')) quant_rcnn_jit
from time import perf_counter
= next(iter(data_loader_test))
images, targets = list(img.to(torch.device('cpu')) for img in images)
images = 10
n
# warmup
for _ in range(n * 3):
= quant_rcnn_jit(images)
__
= perf_counter()
start
for _ in range(n):
= quant_rcnn_jit(images)
__ print(f"quant jit model avg time: {(perf_counter() - start) / n:.2f}")
code/__torch__/torchvision/models/detection/faster_rcnn.py:103: UserWarning: RCNN always returns a (Losses, Detections) tuple in scripting
quant jit model avg time: 1.44
As expected, FX graph mode quantization has about the same inference time as eager. The eager mode model had an average inference time of 1.42
.
Note the UserWarning
above. Scripting changes the return signature to includes losses in addition to the normal prediction output. I’m not sure if this applies to other models but at least there is a warning so we know to modify any post processing etc. to handle this change.
import matplotlib.pyplot as plt
from torchvision.utils import draw_bounding_boxes, draw_segmentation_masks
= read_image("PennFudanPed/PNGImages/FudanPed00007.png")
image = get_transform(train=False)
eval_transform
with torch.no_grad():
= eval_transform(image)
x # convert RGBA -> RGB and move to device
= x[:3, ...].to(torch.device('cpu'))
x = quant_rcnn_jit([x, ])
predictions = predictions[1][0] # JIT model returns tuple ({losses}, [pred_dicts])
pred
= 0.50
threshold = (255.0 * (image - image.min()) / (image.max() - image.min())).to(torch.uint8)
image = image[:3, ...]
image = [f"pedestrian: {score:.3f}" for label, score in zip(pred["labels"], pred["scores"]) if score > threshold]
pred_labels = pred["boxes"].long()[pred["scores"] > threshold]
pred_boxes
= draw_bounding_boxes(image, pred_boxes, pred_labels, colors="red")
output_image
=(12, 12))
plt.figure(figsize1, 2, 0)) plt.imshow(output_image.permute(
code/__torch__/torchvision/models/detection/faster_rcnn/___torch_mangle_17.py:103: UserWarning: RCNN always returns a (Losses, Detections) tuple in scripting
FX graph mode quantization made for much easier model preparation compared to eager mode. We did not have to modify network, insert stubs, or even fuse modules. However, because of the symbolic tracing requirement, more complex networks with data dependent control flow may not be able to use it.