YOLOv5网络结构解析

本教程涉及到的代码在 https://github.com/oneflow-inc/one-yolov5 ，教程也同样适用于 ultralytics/yolov5 因为 one-yolov5 仅仅是换了一个运行时后端而已，计算逻辑和代码相比于 ultralytics/yolov5 没有做任何改变，欢迎 star 。详细信息请看one-yolov5 发布，一个训得更快的yolov5
yolov5 网络结构解析引言 yolov5针对不同大小（n, s, m, l, x）的网络整体架构都是一样的，只不过会在每个子模块中采用不同的深度和宽度，
分别应对yaml文件中的depth_multiple和width_multiple参数。
还需要注意一点，官方除了n, s, m, l, x版本外还有n6, s6, m6, l6, x6，区别在于后者是针对更大分辨率的图片比如1280x1280,
当然结构上也有些差异，前者只会下采样到32倍且采用3个预测特征层 , 而后者会下采样64倍，采用4个预测特征层。
本章将以 yolov5s为例，从配置文件 models/yolov5s.yaml(https://github.com/oneflow-inc/one-yolov5/blob/main/models/yolov5s.yaml) 到 models/yolo.py(https://github.com/oneflow-inc/one-yolov5/blob/main/models/yolo.py) 源码进行解读。
yolov5s.yaml文件内容: nc: 80  # number of classes 数据集中的类别数depth_multiple: 0.33  # model depth multiple  模型层数因子(用来调整网络的深度)width_multiple: 0.50  # layer channel multiple 模型通道数因子(用来调整网络的宽度)# 如何理解这个depth_multiple和width_multiple呢?它决定的是整个模型中的深度（层数）和宽度（通道数）,具体怎么调整的结合后面的backbone代码解释。anchors: # 表示作用于当前特征图的anchor大小为 xxx# 9个anchor，其中p表示特征图的层级，p3/8该层特征图缩放为1/8,是第3层特征  - [10,13, 16,30, 33,23]  # p3/8，表示[10,13],[16,30], [33,23]3个anchor  - [30,61, 62,45, 59,119]  # p4/16  - [116,90, 156,198, 373,326]  # p5/32# yolov5s v6.0 backbonebackbone:  # [from, number, module, args]  [[-1, 1, conv, [64, 6, 2, 2]],  # 0-p1/2   [-1, 1, conv, [128, 3, 2]],  # 1-p2/4   [-1, 3, c3, [128]],   [-1, 1, conv, [256, 3, 2]],  # 3-p3/8   [-1, 6, c3, [256]],   [-1, 1, conv, [512, 3, 2]],  # 5-p4/16   [-1, 9, c3, [512]],   [-1, 1, conv, [1024, 3, 2]],  # 7-p5/32   [-1, 3, c3, [1024]],   [-1, 1, sppf, [1024, 5]],  # 9  ]# yolov5s v6.0 headhead:  [[-1, 1, conv, [512, 1, 1]],   [-1, 1, nn.upsample, [none, 2, 'nearest']],   [[-1, 6], 1, concat, [1]],  # cat backbone p4   [-1, 3, c3, [512, false]],  # 13   [-1, 1, conv, [256, 1, 1]],   [-1, 1, nn.upsample, [none, 2, 'nearest']],   [[-1, 4], 1, concat, [1]],  # cat backbone p3   [-1, 3, c3, [256, false]],  # 17 (p3/8-small)   [-1, 1, conv, [256, 3, 2]],   [[-1, 14], 1, concat, [1]],  # cat head p4   [-1, 3, c3, [512, false]],  # 20 (p4/16-medium)   [-1, 1, conv, [512, 3, 2]],   [[-1, 10], 1, concat, [1]],  # cat head p5   [-1, 3, c3, [1024, false]],  # 23 (p5/32-large)   [[17, 20, 23], 1, detect, [nc, anchors]],  # detect(p3, p4, p5)  ] anchors 解读 yolov5 初始化了 9 个 anchors，分别在三个特征图（feature map）中使用，每个 feature map 的每个 grid cell 都有三个 anchor 进行预测。分配规则：
尺度越大的 feature map 越靠前，相对原图的下采样率越小，感受野越小，所以相对可以预测一些尺度比较小的物体(小目标)，分配到的 anchors 越小。
尺度越小的 feature map 越靠后，相对原图的下采样率越大，感受野越大，所以可以预测一些尺度比较大的物体(大目标)，所以分配到的 anchors 越大。
即在小特征图（feature map）上检测大目标，中等大小的特征图上检测中等目标，在大特征图上检测小目标。
backbone & head解读 [from, number, module, args] 参数四个参数的意义分别是：
第一个参数 from ：从哪一层获得输入，-1表示从上一层获得，[-1, 6]表示从上层和第6层两层获得。第二个参数 number：表示有几个相同的模块，如果为9则表示有9个相同的模块。第三个参数 module：模块的名称，这些模块写在common.py中。第四个参数 args：类的初始化参数，用于解析作为 moudle 的传入参数。下面以第一个模块conv 为例介绍下common.py中的模块
conv 模块定义如下:
class conv(nn.module):    # standard convolution    def __init__(self, c1, c2, k=1, s=1, p=none, g=1, act=true):  # ch_in, ch_out, kernel, stride, padding, groups                @pargm c1: 输入通道数        @pargm c2: 输出通道数        @pargm k : 卷积核大小(kernel_size)        @pargm s : 卷积步长 (stride)        @pargm p : 特征图填充宽度 (padding)        @pargm g : 控制分组，必须整除输入的通道数(保证输入的通道能被正确分组)                super().__init__()        # https://oneflow.readthedocs.io/en/master/generated/oneflow.nn.conv2d.html?highlight=conv        self.conv = nn.conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=false)        self.bn = nn.batchnorm2d(c2)        self.act = nn.silu() if act is true else (act if isinstance(act, nn.module) else nn.identity())    def forward(self, x):        return self.act(self.bn(self.conv(x)))    def forward_fuse(self, x):        return self.act(self.conv(x)) 比如上面把width_multiple设置为了0.5，那么第一个 [64, 6, 2, 2] 就会被解析为 [3,64*0.5=32,6,2,2]，其中第一个 3 为输入channel(因为输入)，32 为输出channel。
关于调整网络大小的详解说明在yolo.py(https://github.com/oneflow-inc/one-yolov5/blob/main/models/yolo.py)的256行有对yaml 文件的nc,depth_multiple等参数读取，具体代码如下:
anchors, nc, gd, gw = d['anchors'], d['nc'], d['depth_multiple'], d['width_multiple'] width_multiple参数的作用前面介绍args参数中已经介绍过了，那么depth_multiple又是什么作用呢？
在yolo.py(https://github.com/oneflow-inc/one-yolov5/blob/main/models/yolo.py)的257行有对参数的具体定义：
n = n_ = max(round(n * gd), 1) if n > 1 else n  # depth gain 暂且将这段代码当作公式(1) 其中 gd 就是depth_multiple的值，n的值就是backbone中列表的第二个参数：
根据公示(1) 很容易看出 gd 影响 n 的大小，从而影响网络的结构大小。
后面各层之间的模块数量、卷积核大小和数量等也都产生了变化，yolov5l 与 yolov5s 相比较起来训练参数的大小成倍数增长，
其模型的深度和宽度也会大很多，这就使得 yolov5l 的精度值要比 yolov5s 好很多，因此在最终推理时的检测精度高，但是模型的推理速度更慢。
所以 yolov5 提供了不同的选择，如果想要追求推理速度可选用较小一些的模型如 yolov5s、yolov5m，如果想要追求精度更高对推理速度要求不高的可以选择其他两个稍大的模型。
如下面这张图：
yolov5模型复杂度比较图 conv模块解读网络结构预览下面是根据yolov5s.yaml(https://github.com/oneflow-inc/one-yolov5/blob/main/models/yolov5s.yaml)绘制的网络整体结构简化版。
yolov5s网络整体结构图详细的网络结构图：https://oneflow-static.oss-cn-beijing.aliyuncs.com/one-yolo/imgs/yolov5s.onnx.png通过export.py导出的onnx格式，并通过 https://netron.app/ 网站导出的图片(模型导出将在本教程的后续文章单独介绍)。
模块组件右边参数表示特征图的的形状，比如在第一层( conv )输入图片形状为 [ 3, 640, 640] ,关于这些参数，可以固定一张图片输入到网络并通过yolov5s.yaml(https://github.com/oneflow-inc/one-yolov5/blob/main/models/yolov5s.yaml)的模型参数计算得到，并且可以在工程 models/yolo.py(https://github.com/oneflow-inc/one-yolov5/blob/main/models/yolo.py) 通过代码进行print查看,详细数据可以参考附件表2.1。
yolo.py解读文件地址(https://github.com/oneflow-inc/one-yolov5/blob/main/models/yolo.py)
文件主要包含三大部分: detect类， model类，和 parse_model 函数
可以通过 python models/yolo.py --cfg yolov5s.yaml 运行该脚本进行观察
parse_model函数解读 def parse_model(d, ch):  # model_dict, input_channels(3)    用在下面model模块中    解析模型文件(字典形式)，并搭建网络结构    这个函数其实主要做的就是: 更新当前层的args（参数）,计算c2（当前层的输出channel） =>                          使用当前层的参数搭建当前层 =>                          生成 layers + save    @params d: model_dict 模型文件字典形式 {dict:7}  [yolov5s.yaml](https://github.com/oneflow-inc/one-yolov5/blob/main/models/yolov5s.yaml)中的6个元素 + ch    #params ch: 记录模型每一层的输出channel 初始ch=[3] 后面会删除    @return nn.sequential(*layers): 网络的每一层的层结构    @return sorted(save): 把所有层结构中from不是-1的值记下并排序 [4, 6, 10, 14, 17, 20, 23]        logger.info(f{'':>3}{'from':>18}{'n':>3}{'params':>10}  {'module':<40}{'arguments': 1 else n  # depth gain                # 如果当前的模块m在本项目定义的模块类型中，就可以处理这个模块        if m in (conv, ghostconv, bottleneck, ghostbottleneck, spp, sppf, dwconv, mixconv2d, focus, crossconv,                 bottleneckcsp, c3, c3tr, c3spp, c3ghost, nn.convtranspose2d, dwconvtranspose2d, c3x):            # c1: 输入通道数 c2：输出通道数            c1, c2 = ch[f], args[0]             # 该层不是最后一层，则将通道数乘以宽度因子也就是说，宽度因子作用于除了最后一层之外的所有层            if c2 != no:  # if not output                # make_divisible的作用，使得原始的通道数乘以宽度因子之后取整到8的倍数，这样处理一般是让模型的并行性和推理性能更好。                c2 = make_divisible(c2 * gw, 8)            # 将前面的运算结果保存在args中，它也就是这个模块最终的输入参数。            args = [c1, c2, *args[1:]]             # 根据每层网络参数的不同，分别处理参数具体各个类的参数是什么请参考它们的__init__方法这里不再详细解释了            if m in [bottleneckcsp, c3, c3tr, c3ghost, c3x]:                # 这里的意思就是重复n次，比如conv这个模块重复n次，这个n 是上面算出来的 depth                 args.insert(2, n)  # number of repeats                n = 1        elif m is nn.batchnorm2d:            args = [ch[f]]        elif m is concat:            c2 = sum(ch[x] for x in f)        elif m is detect:            args.append([ch[x] for x in f])            if isinstance(args[1], int):  # number of anchors                args[1] = [list(range(args[1] * 2))] * len(f)        elif m is contract:            c2 = ch[f] * args[0] ** 2        elif m is expand:            c2 = ch[f] // args[0] ** 2        else:            c2 = ch[f]        # 构建整个网络模块这里就是根据模块的重复次数n以及模块本身和它的参数来构建这个模块和参数对应的module        m_ = nn.sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args)  # module        # 获取模块(module type)具体名例如 models.common.conv , models.common.c3 , models.common.sppf 等。        t = str(m)[8:-2].replace('__main__.', '')  #  replace函数作用是字符串__main__替换为''，在当前项目没有用到这个替换。        np = sum(x.numel() for x in m_.parameters())  # number params        m_.i, m_.f, m_.type, m_.np = i, f, t, np  # attach index, 'from' index, type, number params        logger.info(f'{i:>3}{str(f):>18}{n_:>3}{np:10.0f}  {t:<40}{str(args):10s} {'gflops':>10s} {'params':>10s}  module)        logger.info(f'{dt[-1]:10.2f} {o:10.2f} {m.np:10.0f}  {m.type}')        if c:            logger.info(f{sum(dt):10.2f} {'-':>10s} {'-':>10s}  total)    # initialize biases into detect(), cf is class frequency    def _initialize_biases(self, cf=none):         # https://arxiv.org/abs/1708.02002 section 3.3        # cf = flow.bincount(flow.tensor(np.concatenate(dataset.labels, 0)[:, 0]).long(), minlength=nc) + 1.        m = self.model[-1]  # detect() module        for mi, s in zip(m.m, m.stride):  # from            b = mi.bias.view(m.na, -1).detach()  # conv.bias(255) to (3,85)            b[:, 4] += math.log(8 / (640 / s) ** 2)  # obj (8 objects per 640 image)            b[:, 5:] += math.log(0.6 / (m.nc - 0.999999)) if cf is none else flow.log(cf / cf.sum())  # cls            mi.bias = flow.nn.parameter(b.view(-1), requires_grad=true)    #  打印模型中最后detect层的偏置biases信息(也可以任选哪些层biases信息)    def _print_biases(self):                打印模型中最后detect模块里面的卷积层的偏置biases信息(也可以任选哪些层biases信息)                m = self.model[-1]  # detect() module        for mi in m.m:  # from            b = mi.bias.detach().view(m.na, -1).t  # conv.bias(255) to (3,85)            logger.info(                ('%6g conv2d.bias:' + '%10.3g' * 6) % (mi.weight.shape[1], *b[:5].mean(1).tolist(), b[5:].mean()))    def _print_weights(self):                打印模型中bottleneck层的权重参数weights信息(也可以任选哪些层weights信息)                for m in self.model.modules():            if type(m) is bottleneck:                logger.info('%10.3g' % (m.w.detach().sigmoid() * 2))  # shortcut weights        # fuse()是用来进行conv和bn层合并，为了提速模型推理速度。    def fuse(self):  # fuse model conv2d() + batchnorm2d() layers        用在detect.py、val.py        fuse model conv2d() + batchnorm2d() layers        调用oneflow_utils.py中的fuse_conv_and_bn函数和common.py中conv模块的fuseforward函数                logger.info('fusing layers... ')        for m in self.model.modules():            # 如果当前层是卷积层conv且有bn结构, 那么就调用fuse_conv_and_bn函数讲conv和bn进行融合, 加速推理            if isinstance(m, (conv, dwconv)) and hasattr(m, 'bn'):                m.conv = fuse_conv_and_bn(m.conv, m.bn)  # update conv                delattr(m, 'bn')  # remove batchnorm  移除bn remove batchnorm                m.forward = m.forward_fuse  # update forward 更新前向传播 update forward (反向传播不用管, 因为这种推理只用在推理阶段)        self.info()  # 打印conv+bn融合后的模型信息        return self    # 打印模型结构信息在当前类__init__函数结尾处有调用    def info(self, verbose=false, img_size=640):  # print model information        model_info(self, verbose, img_size)    def _apply(self, fn):        # apply to(), cpu(), cuda(), half() to model tensors that are not parameters or registered buffers        self = super()._apply(fn)        m = self.model[-1]  # detect()        if isinstance(m, detect):            m.stride = fn(m.stride)            m.grid = list(map(fn, m.grid))            if isinstance(m.anchor_grid, list):                m.anchor_grid = list(map(fn, m.anchor_grid))        return self detect类解读 class detect(nn.module):        detect模块是用来构建detect层的，将输入feature map 通过一个卷积操作和公式计算到我们想要的shape, 为后面的计算损失或者nms后处理作准备        stride = none  # strides computed during build    onnx_dynamic = false  # onnx export parameter    export = false  # export mode    def __init__(self, nc=80, anchors=(), ch=(), inplace=true):  # detection layer        super().__init__()        #  nc:分类数量        self.nc = nc  # number of classes          #  no:每个anchor的输出数        self.no = nc + 5  # number of outputs per anchor        # nl:预测层数，此次为3        self.nl = len(anchors)  # number of detection layers        #  na:anchors的数量，此次为3        self.na = len(anchors[0]) // 2  # number of anchors        #  grid:格子坐标系，左上角为(1,1),右下角为(input.w/stride,input.h/stride)        self.grid = [flow.zeros(1)] * self.nl  # init grid        self.anchor_grid = [flow.zeros(1)] * self.nl  # init anchor grid        # 写入缓存中，并命名为anchors        self.register_buffer('anchors', flow.tensor(anchors).float().view(self.nl, -1, 2))  # shape(nl,na,2)        # 将输出通过卷积到 self.no * self.na 的通道，达到全连接的作用        self.m = nn.modulelist(nn.conv2d(x, self.no * self.na, 1) for x in ch)  # output conv        self.inplace = inplace  # use inplace ops (e.g. slice assignment)    def forward(self, x):        z = []  # inference output        for i in range(self.nl):            x[i] = self.m[i](x[i])  # conv            bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)            x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()            if not self.training:  # inference                if self.onnx_dynamic or self.grid[i].shape[2:4] != x[i].shape[2:4]:                    # 向前传播时需要将相对坐标转换到grid绝对坐标系中                    self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i)                y = x[i].sigmoid()                if self.inplace:                    y[..., 0:2] = (y[..., 0:2] * 2 + self.grid[i]) * self.stride[i]  # xy                    y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh                else:  # for yolov5 on aws inferentia https://github.com/ultralytics/yolov5/pull/2953                    xy, wh, conf = y.split((2, 2, self.nc + 1), 4)  # y.tensor_split((2, 4, 5), 4)                      xy = (xy * 2 + self.grid[i]) * self.stride[i]  # xy                    wh = (wh * 2) ** 2 * self.anchor_grid[i]  # wh                    y = flow.cat((xy, wh, conf), 4)                z.append(y.view(bs, -1, self.no))        return x if self.training else (flow.cat(z, 1),) if self.export else (flow.cat(z, 1), x)        # 相对坐标转换到grid绝对坐标系    def _make_grid(self, nx=20, ny=20, i=0):        d = self.anchors[i].device        t = self.anchors[i].dtype        shape = 1, self.na, ny, nx, 2  # grid shape        y, x = flow.arange(ny, device=d, dtype=t), flow.arange(nx, device=d, dtype=t)               yv, xv = flow.meshgrid(y, x, indexing=ij)        grid = flow.stack((xv, yv), 2).expand(shape) - 0.5  # add grid offset, i.e. y = 2.0 * x - 0.5        anchor_grid = (self.anchors[i] * self.stride[i]).view((1, self.na, 1, 1, 2)).expand(shape)        return grid, anchor_grid 附件表2.1 yolov5s.yaml(https://github.com/oneflow-inc/one-yolov5/blob/main/models/yolov5s.yaml)解析表
层数 form moudule arguments input output
0 -1 conv [3, 32, 6, 2, 2] [3, 640, 640] [32, 320, 320]
1 -1 conv [32, 64, 3, 2] [32, 320, 320] [64, 160, 160]
2 -1 c3 [64, 64, 1] [64, 160, 160] [64, 160, 160]
3 -1 conv [64, 128, 3, 2] [64, 160, 160] [128, 80, 80]
4 -1 c3 [128, 128, 2] [128, 80, 80] [128, 80, 80]
5 -1 conv [128, 256, 3, 2] [128, 80, 80] [256, 40, 40]
6 -1 c3 [256, 256, 3] [256, 40, 40] [256, 40, 40]
7 -1 conv [256, 512, 3, 2] [256, 40, 40] [512, 20, 20]
8 -1 c3 [512, 512, 1] [512, 20, 20] [512, 20, 20]
9 -1 sppf [512, 512, 5] [512, 20, 20] [512, 20, 20]
10 -1 conv [512, 256, 1, 1] [512, 20, 20] [256, 20, 20]
11 -1 upsample [none, 2, 'nearest'] [256, 20, 20] [256, 40, 40]
12 [-1, 6] concat [1] [1, 256, 40, 40],[1, 256, 40, 40] [512, 40, 40]
13 -1 c3 [512, 256, 1, false] [512, 40, 40] [256, 40, 40]
14 -1 conv [256, 128, 1, 1] [256, 40, 40] [128, 40, 40]
15 -1 upsample [none, 2, 'nearest'] [128, 40, 40] [128, 80, 80]
16 [-1, 4] concat [1] [1, 128, 80, 80],[1, 128, 80, 80] [256, 80, 80]
17 -1 c3 [256, 128, 1, false] [256, 80, 80] [128, 80, 80]
18 -1 conv [128, 128, 3, 2] [128, 80, 80] [128, 40, 40]
19 [-1, 14] concat [1] [1, 128, 40, 40],[1, 128, 40, 40] [256, 40, 40]
20 -1 c3 [256, 256, 1, false] [256, 40, 40] [256, 40, 40]
21 -1 conv [256, 256, 3, 2] [256, 40, 40] [256, 20, 20]
22 [-1, 10] concat [1] [1, 256, 20, 20],[1, 256, 20, 20] [512, 20, 20]
23 -1 c3 [512, 512, 1, false] [512, 20, 20] [512, 20, 20]
24 [17, 20, 23] detect [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]] [1, 128, 80, 80],[1, 256, 40, 40],[1, 512, 20, 20] [1, 3, 80, 80, 85],[1, 3, 40, 40, 85],[1, 3, 20, 20, 85]

英特尔加快推进新一代PC、路由器和网关Wi-Fi 6的应用
TISS将SCiB可充电电池业务转移到东芝
STP网线在综合布线系统中重要吗
非易失性存储器在DDR3速度下具有非挥发性和高耐久性
防雷攻略 · PoE交换机雷电防护篇
YOLOv5网络结构解析
红米Note4，十核旗舰，就是快！
Anduril Industries设计研发虚拟边境墙，用于检测非法越境人员
苹果终于推出iOS10.3.1d的正式更新，还有惊喜！
介绍2020年值得关注的五款创新型AI软件，分享机器学习的应用成果
智能家居成物联网产品方向安全隐私问题再上“眉梢”
三极管的放大电路简析
ccd与cmos传感器哪个好
压力传感器怎么控制变频器
PLC故障查找方法流程图及其处理对策
东芝全球首发64层3D TLC闪存SSD：容量轻松达到30TB
海思PLC-IOT电力载波模组MN-L80A
电池修复——电动车电瓶日常维护（普通人可掌握）
又一大厂发力RISC-V，Meta第一代自研AI加速器MTIA
Facebook离职潮 Oculus创始人离职原因或在扎克伯格