
2019-02-20 13:45王佳盛曾泽钦邹湘军陈明猷
农业工程学报 2019年23期

陈 燕,王佳盛,曾泽钦,邹湘军※,陈明猷

(1. 华南农业大学工程学院,广州 510642; 2. 华南农业大学南方农业机械与装备关键技术教育部重点实验室,广州 510642)

机器人采摘荔枝时需要获取多个目标荔枝串的空间位置信息,以指导机器人获得最佳运动轨迹,提高效率。该文研究了大视场下荔枝采摘机器人的视觉预定位方法。首先使用双目相机采集荔枝图像;然后改进原始的YOLOv3网络,设计YOLOv3-DenseNet34荔枝串检测网络;提出同行顺序一致性约束的荔枝串配对方法;最后基于双目立体视觉的三角测量原理计算荔枝串空间坐标。试验结果表明,YOLOv3-DenseNet34网络提高了荔枝串的检测精度与检测速度;平均精度均值(mean average precision,mAP)达到0.943,平均检测速度达到22.11帧/s。基于双目立体视觉的荔枝串预定位方法在3 m的检测距离下预定位的最大绝对误差为36.602 mm,平均绝对误差为23.007 mm,平均相对误差为0.836%,满足大视场下采摘机器人的视觉预定位要求,可为其他果蔬在大视场下采摘的视觉预定位提供参考。


0 引 言



近年来,随着深度学习,特别是卷积神经网络的发展,有许多学者利用卷积神经网络进行分类、分割、识别与检测[19-32]。如文献[20]在VGGNet的基础上优化网络结构,提高番茄主要器官的特征提取能力,并通过Selective Search生产检测区域,实现不同种类、不同成熟度的番茄主要器官的检测。文献[28-29]分别使用YOLO算法对采摘目标进行了识别、定位并取得不错的结果。因此,使用深度学习方法有助于荔枝果串的预定位。

1 材料与方法

1.1 试验设备

试验设备由硬件设备与软件组成,硬件设备主要包括:2台GigE工业相机构成的双目立体视觉系统,型号为维视 MV-EM200C,分辨率1600×1200像素,帧率60帧/s,镜头焦距为16 mm;博世激光测距仪,型号为GLM50,有效测量范围0.05~50 m,测量精度±1.5 mm;维视高精度圆点标定板,圆点数量为9×11个,圆心距离(30±0.01)mm;笔记本电脑,主要配置:i7-7700HQ处理器;16 G,2 400 MHz内存;GTX1060 6G显卡。


1.2 图像与数据采集

在拍照采样之前,需要对双目立体视觉系统进行标定。根据三角测量原理,基线距离越大,测量精度越高,但是基线距离越大,2个相机的公共视场越小。为了保证在较高的精度下有较大的公共视场,经过多次调试后选择基线距离为110 mm。为确保图像的准确度,相机标定在大视场范围内进行,即相机与目标果实的距离为2.5~3 m。在采集图像前,使用圆点标定板完成相机双目立体视觉系统的标定。

试验图像的拍摄时间为2018年6-7月,拍摄地点为广州市增城区和广州市从化区。在野外环境下采集大视场范围下的荔枝图像,并用激光测距仪测量荔枝串的距离,用于与本文算法所得结果进行比对。共采集双目图像250对。由于样本数据较小,容易出现过拟合,因此需要对原图与极线校正后的图像使用了小范围的随机裁剪、缩放对样本进行扩充,最终的图片数据集为4 000张。最后借助开源工具LabelImg制作目标检测网络的数据集。

1.3 荔枝串预定位方法



1.3.1 荔枝串目标检测

借鉴YOLOv3[30]目标检测网络以及DenseNet[31]分类网络,并结合荔枝串检测任务的场景单一(仅为果园环境)、目标单一的特点优化网络结构,设计了深度为34层的密集卷积层(下文称为Dense Module),基于Dense Module设计荔枝串检测网络YOLOv3-DenseNet34。

由卷积层(convolution,Conv),批归一化层(batch normalization,BN)以及激活层(leaky ReLU)组成一个基本组件层(DarkNet convolution, batch normalization, leaky ReLU, 下文称为DBL)(如图2左下角),其中DBL(1×1)指卷积层的卷积核大小为1×1。多个DBL层组成一个DBL模块(如图2右下角);多个DBL模块组成Dense Module,模块之间的连接模式如图2所示。

图2 Dense Module结构示意图

YOLOv3-DenseNet34的先验框尺寸通过对样本集所有图像中荔枝的宽高进行K-means聚类获得。根据样本的尺度分布,聚类时选取聚类数为6。最终得到的先验框聚类结果为(20, 20),(33, 27),(26, 39),(48, 49),(32, 56),(57, 95)。


为了不损失原始数据,YOLOv3-DenseNet34使用步长为2的卷积来代替最大池化(max pooling)进行下采样。下采样次数与卷积感受野、先验框边长存在以下关系:


本文设计的YOLOv3-DenseNet34目标检测网络结构如图3所示。其中DBL(步长=2)即为代替下采样的卷积层。该网络使用包含4个Dense Module的34层卷积backbone提取多尺度特征,使用3个不同尺度的特征图做预测输出,即图3中的1,2,3,其中1、2、3分别下采样5、4、3次。每个尺度预测2个输出,每个输出包含目标的位置坐标和尺度在不同方向上的偏移量、置信度和目标类别的one-hot共6个数据,因此预测输出的深度均为12。

1.3.2 基于双目立体视觉的荔枝串预定位




图3 YOLOv3-DenseNet34网络结构示意图






1.3.3 亚像素视差计算



1.3.4 预定位误差计算



2 试验结果与分析


2.1 荔枝串检测网络性能试验分析


表1 网络训练参数设置

根据前人研究[29-31],采用Loss值表示损失状况,可用于衡量网络的正确性与收敛状况。本文网络训练过程中前1 000次迭代的Loss数值很大而且没有意义,曲线从第1 000次迭代开始记录,如图8所示。

由图8可知,2种网络在前2 000次迭代中迅速拟合,之后偏向稳定,YOLOv3-DenseNet34的Loss值比原始网络下降慢,但最后均能收敛。表明本文所设计的网络结构可靠。

图8 荔枝串检测网络训练过程Loss曲线

使用平均精度均值[33-34](mean average precision,mAP)指标来衡量荔枝串检测精度,它能很好地反映目标检测网络的识别能力,是目前目标检测领域最常用的指标。用帧率(frame per second,FPS)来表示模型的检测速度。其中mAP计算公式如下:



表2 荔枝串检测网络的性能对比

由表2可知,YOLOv3-DenseNet34检测速度比原始的YOLOv3提高约0.6倍,达到22.11帧/s,同时mAP提高5.6%,达到0.943,模型大小只有9.3 MB,仅为原始网络的1/26。由此可见,本文改进的荔枝串检测网络YOLOv3-DenseNet34与原始YOLOv3模型在检测速度与检测精度以及模型参数大小上都有改进和提高。

2.2 双目立体视觉荔枝预定位精度试验分析

荔枝串预定位的激光测量值、视觉测量值、测量误差等数据如表3所示。计算可得双目立体视觉荔枝串预定位的最大绝对误差为33.602 mm,平均绝对误差为23.007 mm,标准差为7.434 mm,平均相对误差为0.836%,表明本文方法检测精度高,满足预定位要求。

表3 荔枝预定位视觉测量值及其误差

3 结 论

本文研究了大视场下荔枝采摘机器人视觉预定位方法。通过改进的原始的YOLOv3,设计了荔枝串检测网络YOLO-DenseNet34;提出了同行顺序一致性约束的荔枝串配对方法;最后基于双目立体视觉的三角测量原理计算荔枝串空间坐标。试验结果表明,YOLOv3-DenseNet34网络提高了荔枝串的检测精度与检测速度;mAP值达到0.943,平均检测速度达到22.11帧/s。基于双目立体视觉的荔枝串预定位方法在3 m的检测距离下预定位的最大绝对误差为36.602 mm,平均绝对误差为23.007 mm,平均相对误差为0.836%。本文所研究的大视场下荔枝采摘机器人视觉预定位方法在精度与速度上都能满足大视场下采摘视觉预定位要求,可为其他果蔬大视场下采摘的视觉预定位提供参考。

Vision pre-positioning method for litchi picking robot under large field of view

Chen Yan, Wang Jiasheng, Zeng Zeqin, Zou Xiangjun※, Chen Mingyou

(1510642,; 2.,,510642,)

Litchi picking robot is an important tool for improving the automation of litchi picking operation. The spatial position information of litchi cluster needs to be acquired when the robot picks litchi normally. In order to guide the robot moving to the picking position and improve the picking efficiency, the vision pre-positioning method of litchi picking robot under large field of view is proposed in this paper studied. Firstly, using the binocular stereo vision system composed of two industrial cameras that have been calibrated, 250 pairs of litchi cluster images under large field of view was taken in the litchi orchard in Guangzhou, the spatial positions of key litchi clusters were recorded by using a laser range finder, and the results were compared with those tested in the paper. In order to expand the sample size, the original image and the polar line correction image were randomly cropped and scaled in a small range, and the final image data set was 4 000 sheets. After that, by using LabelImg, the data set of the target detection network was created. Secondly, by using the YOLOv3 network and the DenseNet classification network, combined with the characteristics of single target and single scene of litchi cluster detection task (only for orchard environment), the network structure was optimized, a Dense Module with a depth of 34 layers and a litchi cluster detection network YOLOv3-DenseNet34 based on the Dense Module was designed. Thirdly, Because of the the complexity of the background image under large field of view, the dense stereo matching degree of the whole image is low and the effect is poor, at the same time, some litchi clusters can not appear in the public view of the image at the same time, therefore, a method for calculating sub-pixel parallax was designed, peer-to-peer sequential consistency constraint matching method was proposed. By solving the quadratic curve composed of parallax and similarity, the parallax under sub-pixel was used to calculate the spatial positions of the litchi cluster. Through the comparison with the original network of YOLOv3, the test network performance of the paper was tested, and found that the YOLOv3-DenseNet34 network improved the detection accuracy and detection speed of the litchi cluster, the mAP (mean average precision) value was 0.943, the average detection speed was 22.11 frame/s and the model size was 9.3 MB, which was 1/26 of the original network of YOLOv3. Then, the detection results of the method were compared with the results of the laser range finder. The max absolute error of the pre-positioning at the detection distance of 3 m was 36.602 mm, the mean absolute error was 23.007 mm, and the average relative error was 0.836%. Test results showed that the vision pre-positioning method studied in this paper can basically meet the requirements of vision pre-positioning under large field of view in precision and speed. And this method can provide reference for other vision pre-positioning methods under large field of view of fruits and vegetables picking.

robs; image processing; object detection; litchi picking; large field of view; convolutional neural network; stereo vision

陈 燕,王佳盛,曾泽钦,邹湘军,陈明猷. 大视场下荔枝采摘机器人的视觉预定位方法[J]. 农业工程学报,2019,35(23):48-54.doi:10.11975/j.issn.1002-6819.2019.23.006 http://www.tcsae.org

Chen Yan, Wang Jiasheng, Zeng Zeqin, Zou Xiangjun, Chen Mingyou. Vision pre-positioning method for litchi picking robot under large field of view[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2019, 35(23): 48-54. (in Chinese with English abstract) doi:10.11975/j.issn.1002-6819.2019.23.006 http://www.tcsae.org




陈 燕,副教授,主要从事农业机器人、农业智能装备和智能设计与制造的研究,Email:cy123@scau.edu.cn






