An infrared target intrusion detection method based on feature fusion and enhancement

2020-06-28 03:04XiaodongHuXinqingWangXinYangDongWangPengZhangYiXiao
Defence Technology 2020年3期

Xiaodong Hu, Xinqing Wang, Xin Yang, Dong Wang, Peng Zhang, Yi Xiao

College of Field Engineering, Army Engineering University of PLA, Nanjing, 210007, China

Keywords:Target intrusion detection Convolutional neural network Feature fusion Infrared target

ABSTRACT Infrared target intrusion detection has significant applications in the fields of military defence and intelligent warning.In view of the characteristics of intrusion targets as well as inspection difficulties,an infrared target intrusion detection algorithm based on feature fusion and enhancement was proposed.This algorithm combines static target mode analysis and dynamic multi-frame correlation detection to extract infrared target features at different levels. Among them, LBP texture analysis can be used to effectively identify the posterior feature patterns which have been contained in the target library, while motion frame difference method can detect the moving regions of the image, improve the integrity of target regions such as camouflage, sheltering and deformation. In order to integrate the advantages of the two methods, the enhanced convolutional neural network was designed and the feature images obtained by the two methods were fused and enhanced. The enhancement module of the network strengthened and screened the targets, and realized the background suppression of infrared images.Based on the experiments, the effect of the proposed method and the comparison method on the background suppression and detection performance was evaluated, and the results showed that the SCRG and BSF values of the method in this paper had a better performance in multiple data sets,and it’s detection performance was far better than the comparison algorithm. The experiment results indicated that,compared with traditional infrared target detection methods,the proposed method could detect the infrared invasion target more accurately, and suppress the background noise more effectively.

1. Introduction

Target intrusion detection is a significant technical means in the field of military defence[1],and it has been preliminarily applied in the key monitoring areas, including prohibited military zones,border posts, airport perimeter and national defense engineering.Monitoring enemies and weapons and equipment with satellite imagery, unmanned aerial vehicle photography, video monitoring and other equipment can effectively reduce the cost and workload of alert tasks. Compared with optical target detection, infrared target detection is characterized by high accuracy, long operating distance, outstanding anti-interference and so on, because investigators and equipment have thermal radiation which is very difficult to be hidden. Therefore, infrared imaging equipment can reduce the difficulty of the detection algorithm.

At present, infrared target intrusion detection mostly aims at ground-air target detection [2-4]. Ground target detection technology is usually interfered by a number of factors, such as background environment, high-temperature area, weak targets,deformation, sheltering and so on, which make infrared intrusion target detection become a challenging subject. In addition, the military alert missions have some uniqueness compared with usual target detection missions.On the one hand,the targets which need to be guarded are usually the reconnaissance personnel and small ground equipment of the other party,which limits the targets of the monitoring mission. On the other hand, the intrusion targets are usually characterized by movement, which enables the detection task to combine with the dynamic target detection technology,thus effectively increasing the accuracy.

Currently, there are two kinds of target detection approaches:one is to use the spatial convolution filter template for static detection of single frame image, and the other uses the continuity and similarity between adjacent frames for motion detection. The former is susceptible to be interfered by environmental, thus leading to a high false alarm rate, so it is not competent for detection in complex background. The latter has a good adaptability to the environment,but it cannot detect the target area very accurately,and it may not be able to extract the target boundary of slowly moving targets. At the same time, the target area extracted from fast moving targets is too large, and the detection results are prone to have “cavity” and “double shadow”. Benefited from the rapid growth of the deep convolutional neural network (DCNN)[5,6],the target detection method based on deep learning which is proposed recently, such as R-CNN [7], Fast R-CNN [8], Faster R-CNN [9] and YOLO series algorithms etc. [10-12], has made a great breakthrough on target detection performance.Deep learning can automatically learn useful features directly from abundant training samples, however, the infrared target only contains grayscale information but does not have the distinguishing features such as obvious size,texture and color of the target to be detected in the application of computer vision.Therefore,the existing object detection methods based on deep learning in the field of computer vision are not suitable for infrared target detection. The manual feature has the advantages of fast calculation speed, no need to train the network,and can intuitively express the characteristics of the target. By manually acquiring the visual features of the target and combining it with the deep learning method,it is beneficial to extract the high-level features of the target.

In view of the above problems and specific task background in infrared target intrusion detection, an infrared target intrusion detection method based on feature fusion and enhancement was proposed. It comprehensively used the gray-scale texture features of infrared images and the correlation between adjacent frames to efficiently extract target features as well as suppress background noise.Meanwhile, an enhanced convolutional neural network was designed to fuse the extracted feature images, further strengthen the target characteristics of infrared images,suppress false targets and backgrounds,and realize fast and accurate detection of infrared invasion targets.

The remainder of this paper is arranged as follows.In the second section, the related work was briefly discussed. The third section presented the infrared target intrusion detection framework. The content of the fourth section is experimental analysis, and the experimental results of the proposed method and other methods are discussed. Finally, our research was summarized in the fifth section.

2. Related work

Traditional infrared target detection technology generally includes the approach based on single frame image as well as the approach based on sequential images. The detection approach based on single frame images mainly carries out the detection on the basis of the basic features of the images,such as edge features,gray information, etc. The general detection process is preprocessing first and then carrying out threshold detection. Traditional detection methods based on single frame spatial domain include maxmean filtering [13], morphological top-hat filtering[14], high-pass filtering [15] as well as wavelet transform [16].However, these methods usually lead to a lot of false alarms and poor detection performance in the case where the signal-to-noise ratio of the target is relatively low. The detection approach based on sequential images mainly carries out the detection on the basis of the continuity and similarity of target motion. When using sequential images for detection, the accumulated information can be obtained, so that the signal-to-noise ratio can be effectively improved, thus highlighting the target. The main target detection approach based on sequential multi-frame images includes difference method [17], optical flow method [18], 3D matched filter method [19] and grayscale accumulation method [20], whose shortcoming is that when the inter-frame motion speed is fast,the target energy may not be effectively accumulated, making the detection performance of this method decline.

In the past ten years,many researches have been carried out on the detection technology of intrusion targets, among which the detection technologies proposed are to detect potential invasion targets through enhancing the characteristics of invasion targets and suppressing background noise and clutter. Literature [21]adopts filter based on hidden Markov model (HMM) to effectively detect intrusion targets.HMM filter is the optimal filter for discrete time process,but a threshold needs to be preset before detection.A high preset threshold can reduce the incidence of false alarms,but it also reduces the probability of detecting the targets. Literature[22] puts forward the idea of combining tracking technology with HMM filter before detection, and constructs a new HMM filter library. The research shows that the HMM filter library is more flexible than other HMM filters, and its detection target performance is also better, but both the HMM filter system and HMM filter don’t have recognition function, so the occurrence of false alarm rate is inevitable.Literature[23]proposes the morphological filtering method which combining with the trained classifier,and it is helpful to identify real invasion targets from the images of“suspected targets”that may cause false alarm,thus reducing false alarm rate. However, it must emphasize that this type of methods are strongly dependent on the training data set.

In recent years, many scholars have studied the detection of infrared targets. Literature [24] proposes to use local steering kernel (LSK) to encode infrared images, but does not use deep network training. Literature [25] proposes a new learning framework to transfer knowledge from remote sensing image scene classification tasks to multiple types of geospatial target detection tasks. However, due to the dense distribution of objects and complex background structure, the robustness of this method to noise is not Strong. For the past few years, with the amelioration of computer performance as well as the rapid growth of neural network, many infrared target detection methods based on DCNN have obtained good achievements. In view of the problems of infrared ship images,such as low recognition rate and slow speed,Wang et al.[26]proposed a method which combines marked-based watershed segmentation algorithm with DCNN, and experimental results show that the proposed method could identify infrared ship targets more quickly and accurately. Lin et al. [27] proposed an infrared point target detection approach based on DCNN, and designed two kinds of deep networks,regression and classification,to achieve the detection and classification of infrared point targets,the results indicate that the method is suitable for point target detection for infrared oversampling scanning systems. Wu et al.[28] proposed a new deep convolutional network which can address the issue of small target detection of infrared images. The network is composed of fully convolutional network (FCN) and classification network. The fully convolutional network carries out the enhancement and preliminary screening of infrared small targets, while the classification network classifies the position distribution of small targets, and the experiment validates the advantages of the new detection network compared with traditional small target detection algorithm.

3. Proposed method

3.1. Overview of the proposed method

Image feature extraction methods based on single-frame analysis can extract target areas quickly, but they are generally limited to specific application environments. In the image preprocessing phase,the effectiveness of target segmentation depends heavily on the priori knowledge of targets and background. Image feature extraction methods based on multi-frame correlation can obtain accumulated information, so it can effectively improve the signal-to-noise ratio and highlight the movement of the target.However,when the inter-frame movement speed is relatively fast,it may not effectively accumulate the target energy,decreasing the detection performance of this method. In addition, the invasion targets in the military alert zone may be the feature targets which have been already contained in the knowledge base,or the targets that cannot be recognized after deformation and camouflage. To overcome the above problems, this paper combines single-frame mode analysis with multi-frame correlation motion analysis, and carries out feature extraction and target detection of images through feature enhancement convolutional network. The overall algorithm flow is shown in Fig.1.

Fig.1 contains two parts. The former part is feature extraction,while the latter part is target detection based on enhancement neural network. The first part contains two modules: the upper module is image texture feature extraction based on LBP model,and the lower module is motion feature extraction of three-frame difference method. The two features are combined and then taken as the input of the enhancement network. Through the training of the network, on the one hand, the infrared texture feature mode structure of the existing targets can be recognized,and on the other hand, the position of the moving targets can be sensed. Based on the network, the features of two different ways are normalized and combined as the input,and the final output is a binary image containing detection results.

3.2. Feature extraction

3.2.1. Image texture feature extraction based on LBP

LBP(Local Binary Pattern)texture feature proposed in 1994[29]is an operator which is used to describe local features of images.As LBP can calculate features simply and have good effect and other obvious advantages such as gray scale invariance and rotation invariance,they have been widely used in many fields of computer vision.In this paper,LBP algorithm is adopted to represent texture features [30], because it can better express the target feature patterns which have been contained in the sample database, thus providing a strong support for static or slow-moving target analysis.

The original LBP operator is defined in a neighborhood window of 3×3.Taking the center pixel of the window as the threshold,the gray values of 8 pixels in the neighborhood are compared with it.If the value of surrounding pixel above or equal to the threshold,then the value of the pixel point will be marked as 1; otherwise, it is 0.After comparison,8 points in the 3×3 neighborhood will generate 8-bit binary number, this value is converted to an LBP value to reflect the texture information of the region [31]. For a neighborhood (P, R), the above process can be expressed as:

where P represents the number of sampling pixels on the circle,which determines the specific degree of texture features.The larger the value is,the more the sampling points will be,the more specific the texture features will be obtained, and the higher the computation complexity will be; R is the radius of the circle, which determines the neighborhood size of the operator. The smaller the value is,the more localized the texture features will be;pcdenotes the gray value of the corresponding center pixel; pidenotes the gray value of each sampling pixel on the circle with a radius of R;s is a symbolic function, which is denoted by:

In order to extract the most basic structure and rotation invariance mode from LBP, the LBP texture model with gray scale and rotation invariance is adopted [32]:

Fig.1. Proposed infrared target intrusion detection framework.

where,the superscript“riu2”in the above formula means that the maximum of U value of the rotation invariant “uniform” is 2.

3.2.2. Motion frame difference

As military alert areas need long-term monitoring,the position of the target can be sensed by analyzing its infrared image motion.The image of the target has a displacement between adjacent frames, while the location of background image is fixed between the adjacent frames. The frame difference method [33] is used to carry out point-to-point subtracting of adjacent frames, so as determine the absolute value of gray difference.

Traditional two-frame difference method obtains the contour of the moving target by detecting the changing areas in the images of the two adjacent frames,and it can be represented by the following equation:

Where ft(x,y) denotes the gray value of pixel point (x,y) at time t;ft-1(x,y) represents the gray value of pixel point (x, y) at time t-1;dt(x,y)represents the pixel difference between the adjacent frames at two time points. The threshold is set as T, and the binarization processing is carried out on pixel points one by one according to Eq.(6), and then the binarization imageis obtained. Among them,the point with a gray value of 255 represent the foreground point,and the point with a gray value of 0 represent the background point. Based on the connectivity analysis of image, image Rtcontaining complete moving target can be acquired finally.

In the actual scene of infrared intrusion target detection,as the target to be measured is distant from the detection equipment,the imaging area of the target is small,so the motion of the target can be approximately uniform. Two-frame difference method is not sensitive to the slowly moving targets and easy to produce cavitation, and complete moving targets cannot be obtained after the subtraction of two frames. Therefore, this paper chooses threeframe difference method [34] to extract moving objects. First of all,three adjacent frames of images are taken as a group to carry out the difference in pairs;secondly,logical operation is carried out to the two difference results, and the specific algorithm process is as follows:

1) Assuming there is an image sequence containing n frames of infrared invasion targets,denoted by{f1(x,y),…,fk(x,y),…,fn(x,y)},where fk(x,y)denotes the k-th frame in the image sequence.The difference between two adjacent frames is calculated as follows:

2) An appropriate threshold value is set as T,and then binarize the obtained two difference images:

3) For each pixel point(x,y),carry out logical“or”operation on the two binary images obtained from step (2) to calculate the following binary image:

3.3. Enhancement network

Based on the above steps, LBP texture feature map and motion frame difference map can be obtained. The enhancement network we designed integrates the two different feature maps and further enhances the characteristics of the target, so as to suppress background clutter and improve the detection rate. The specific structure of the enhancement network designed in this paper is shown in Fig. 2. The main purpose of the enhancement network is to highlight the target features, in order to obtain the candidate position with the highest probability and reduce false alarm rate.The enhancement network consists of two modules.The first module is the feature fusion module, which integrates and comprehensively analyzes the captured information of the two feature graphs. The second module is the feature enhancement module,which aims to suppress background clutter characteristics and effectively highlight the target area.

In the feature fusion module, the feature images obtained by gray-scale texture analysis and the feature images obtained by frame difference method are fused to input the enhancement network, and the extracted feature images are normalized before the feature fusion, and the expression is as follows:

where F(i,j)denotes the extracted feature map;F′(i,j)denotes the normalized feature map;μ and σ denote the mean and variance of the feature map, respectively.

The feature fusion phase introduces inception function module[35],which can increase the network’s adaptation to the dimension without increasing network complexity. Fig. 3(a) shows the basic structure of the inception module, which stacks the convolution kernel of 1×1, 3×3 and 5×5 together, and then aggregates the features of each layer,thus providing initial features with different scales for the next extraction work. This paper refers to the improvement of the inception module in the literature [36], the branches for pooling operations are removed to avoid losing a lot of feature information and causing difficulties in model training; the 5×5 convolution kernel is replaced with two 3×3 convolution kernels, so as to obtain the same field of vision and have fewer parameters,as well as indirectly increase the depth of the network.In order to reflect the importance of convolution kernel with different scales, the outputs of three convolution layers are given different weights: 1/4, 1/2 and 1/4, respectively. To speed up the model training, BN (Batch Normalization) is used after each convolution layer of the inception module, and the improved inception module is depicted in Fig. 3(b).

Fig. 2. Enhancement network structure.

Fig. 3. Inception structure. (a) basic structure; (b) improved structure.

Three convolutional layers are adopted in the enhancement module. The size of the convolution kernel is 3×3. The activation function of each convolutional layer is ReLU, and the last layer adopts deconvolution to change the output channel into 1[28].The design idea of the enhancement module is shown in Fig.4.By using the characteristics of the target, the convolutional neural network can extract target features through convolution and form a classification ability on the target through multi-layer convolution.Then the probability of the target point is obtained, and then the obtained probability values are filtered layer by layer through convolution, and finally, a probability distribution image S is output, and the stronger the target properties are, the larger the corresponding probability values will be.

After getting the probability distribution image S, the probability value threshold is adaptively selected by iteration method.First of all, Sis divided into equal rectangular areas according to 8×8 grids. Secondly, the mean value of the probability value in each rectangular is calculated,so that 64×1 array can be obtained.Secondly, the elements which are less than 0.1 are rounded, then judged for no goal, so the array C of n x 1 can be obtained. The process of adaptive threshold selection is as follows:

Fig. 4. Design ideas of enhancement module.

1) Firstly, the maximum and minimum values of the array are calculated,denoted as Pmaxand Pmin,respectively,and the initial threshold is set as T0= (Pmax+ Pmin)/2.

2) The array is divided into high-pixel area and low-pixel area according to the threshold value Tk(k =0,1,2,…,k),the average pixel values of the two are calculated and expressed as H1and H2respectively;

3) Calculate the new threshold value Tk+1= (H1+ H2)/ 2;

4) If Tk=Tk+1,Tkis the threshold value of this probability image S;Otherwise, go to step 2 and carry out the iterative calculation.

After obtaining the threshold value,the probability value above the threshold value is denoted as 1,and the probability value below the threshold value is denoted as 0. Finally, the target image obtained is a binary image,in which the location of the target pixel is labeled to 1, and the rest are 0.

The loss function of the convolutional network adopts the loss function based on grayscale cross-correlation.As the target image is a binary image,only a few points in the image take a value of 1,and the remaining points take a value of 0, adopting grayscale crosscorrelation loss can obtain a larger loss gradient than the mean square error loss[28].Loss function L consists of two parts:L1and L2. The former calculates the mean deviation between network output image and target image,and its purpose is to make the two images approximate on the mean value; the latter calculates the gray cross-correlation coefficient, aiming to make the two images consistent in the changes of the pixels. The loss function L adopts the way of L1plus L2to make the output image and the target image achieve consistent in all of the pixels as much as possible, and the error calculation formula is as follows:

where G represents the network output image and S denotes the target image. To prevent the occurrence of lg(0), smoothing coefficient 0.01 is added to each term.“batch_size”represents the batch size of each training, which is set as 32 in the experiment [28].

3.4. Training and parameter setting of the network

In this paper,the implementation platform of the experiment is 64-bit Ubuntu16.04 LTS,based on DELL Precision R7910(AWR7910)graphic workstation, and the processor is Intel Xeon e5-2603 v2(1.8ghz/10 M), and NVIDIA Quadro K620 GPU is adopted for accelerated computing.

In the experiment,the initial learning rate of the model training is 0.01; the optimization mode is random gradient descent; the momentum is 0.9;the weight attenuation is 0.0005;32 images are processed each time;and the maximum iterations are 60,000.The learning rate of the first 30,000 times is 0.01,and the learning rate of the last 30,000 times is 0.001.

4. Experimental analysis

4.1. Data set

A representative data set is produced for training and testing the model. The data set contains 700 sets of samples, among which three consecutive-frame images are combined as a set of samples,and Ground truth of the target area is drawn manually as the sample label. Considering that the invasion targets of the prohibited military zone are usually suspicious persons and vehicles,when selecting the targets in the data set,the people with different postures and quantity and different types of vehicles are taken as the targets to be detected,as shown in Fig.1.To ensure the diversity of data sources, the sample data is obtained in multiple environments,including woodland,grassland,urban complex background,monotonous background, etc. The shooting process is carried out by the unmanned aerial vehicle (UAV) with infrared lens. The shooting band is 8 μm-14 μm and the image size is 256×256.The flight height of the UAV isn’t fixed, allowing both small and large targets to be covered. In the process of making data sets, considering that the performance of different methods needs to be evaluated objectively,the data sets can be divided into three types,and in each type of data sets, we divide training sets and data sets according to the ratio of 4:1.The characteristics of the three classes of data sets is presented in Table 1.

Table 1 The characteristics of three classes of data sets.

4.2. Evaluation metric

The evaluation indexes selected in this paper include signal-tonoise ratio gain (SCRG) [37], background suppression factor (BSF)[38], detection rate Pdand false alarm rate Pf[39,40], which are used to evaluate the performance of the comparative methods.Among them,SCRG and BSF reflect the effect of target enhancement and background suppression,and their expressions are as follows:

where SNRoutand SNRinrepresent the signal-to-noise ratio of the output image and the input image, respectively, and σoutand σindenote the mean square error of the output image and the input image,respectively.

Receiver operating characteristic curve (ROC curve) [41] is drawn according to the variation relationship between Pdand Pf.The expressions of Pdand Pfare as follows:

where Nsis the number of targets successfully detected and Nrdenotes the number of real targets; Nfdenotes the number of targets wrongly detected,and Ntrepresents the total number of pixels detected.

4.3. Comparison with other methods

Some simulation experiments based on real infrared images were completed to verify the effectiveness and superiority of the proposed method.Some classical infrared target detection methods are selected as experimental comparison algorithms, including Max-Mean filter method(MM)[13],Fusion of two different motion cues method (FTDMC) [42], Co-Detection method (COD) [43] and Full Convolutional Network and Region Growth(FCNARG)[44].MM method uses the maximum mean filtering and maximum median filtering to detect small targets in infrared images.FTDMC method uses two kinds of motion cues, including background subtraction and time difference,to realize the motion detection and recognition of the target.COD method proposes an infrared target cooperative detection model which combines background auto-correlation features with the common characteristics of the target in timespace domain. The FCNARG method proposes an infrared segmentation algorithm combining the full convolutional neural network and the dynamic adaptive region growing method for infrared images in complex background.

4.3.1. Comparison of background suppression performance

To further evaluate the capability of the proposed method in target enhancement and background suppression,we analyzed the SCRG and BSF of different methods. Table 2 and Table 3 show the mean values of SCRG and BSF of different methods in three data sets.A higher mean value means a better target detection performance of corresponding method.

As depicted in the table above, the proposed method has the highest SNRG value in Class 1 and Class 3,while FTDMC method has the worst performance in the other two classes, though the SNRG value in Class 2 is slightly better than that of the proposed method.In terms of BSF performance,COD method performs best in Class 1,with the highest value, but the proposed method has better capability in Class 1 than the other two methods,and achieves the best performance in both class 2 and class 3.All in all,the success of the three existing infrared target detection methods is only limited to their specific applications. Although the proposed method doesn’t have the highest SCRG and BSF values in all classes, they generally have better performance in improving target signal-to-noise ratio and suppressing background.

4.3.2. Comparison of detection performance

ROC curves of different methods are drawn to further demonstrate their target detection performance. Fig. 5 shows the ROC curves obtained by the above four methods in three data sets,various methods show their own characteristics under different conditions. Meanwhile, it can be seen from the data set of Class 3 that the MM method and FTDMC method cannot effectively detect target pixels under complex backgrounds.In the data sets of Class 1 and Class 2, although the COD method and FCNARG method are similar to the proposed method in ROC curve distribution, the proposed method can achieve a higher detection rate Pdunder the condition of guaranteeing a lower false alarm rate Pf. Therefore,according to the results of ROC curve, the proposed method has a higher robustness, a higher detection rate as well as a lower false alarm rate.In this paper,the proposed experimental conditions and data sets are used to analyze the speed performance of the proposed method. The results show that the proposed method can process 6 samples per second,which basically meets the real-time requirements of intrusion detection.

To intuitively show the detection effects of different methods,a group of infrared images are selected from each of the three data sets for detection. As shown in Fig. 6, the middle frame of each group of images appears in the first column; the second to fourthcolumns are the experimental results produced by the comparison method,the fifth column is the results of the proposed method,and the sixth column represents the Ground truth of the target area.According to the experimental results, the first row of sample images is multi-object slowly moving images; MM method detects more noise points, while FTDMC and FCNARG method fails to detect the complete target area; the detection results of COD also contain some noise points and missed detection;and the proposed method can accurately detect the invasion target. The samples in the second row are the single-object rapidly moving images; MM and COD methods have a relatively poor background suppression effect, and the double shadow appears in the detection results of FTDMC method,the result of the FCNARG method has a false alarm,and the proposed method has the effect which is closest to Ground truth. In the samples of the third row, we implemented a deformation barrier camouflage on the target,and placed the target at a position partially shaded by trees;and there are a large amount of residual background interference in the results of MM and FTDMC methods; and COD has a better detection effect for larger targets,but has omissions for the detection of small targets; the results of the FCNARG method also showed a large number of false alarms;meanwhile, the proposed method preserves less background region, and it not only detects large targets but also enhances small targets.The samples in the fourth row are pure background images,and the proposed method achieves the best background suppression effect. In summary, since the MM method uses the edge retention and noise filtering performance of the filter to detect the infrared target, the FCNARG method performs regional growth based on the feature extraction using the full convolutional neural network,and neither uses the motion information of the target,so the test results are prone to false alarms; the FTDMC method and the COD method have no training process of the model, so the complete target area cannot be detected well. It can be seen from the detection results that the proposed method can not only suppress complex background areas but also enhance the intrusion targets in practical applications.

Table 2 Average values of SNRG corresponding to different methods.

Table 3 Average values of BSF corresponding to different methods.

Fig. 5. ROC curves of different methods in the three data sets, (a) to (c) represent the experimental results on data set class 1 to 3 respectively.

Fig. 6. Detection results of different methods.

5. Conclusion

Aiming at the key problems of infrared target intrusion detection in modern defense system and infrared detection system,this paper proposed an infrared target intrusion detection method based on feature fusion and enhancement.On the one hand,multilevel feature extraction was completed by combining single-frame analysis and multi-frame correlation, making full use of the advantages of single frame image and sequence image,and effectively suppressing background clutter. On the other hand, the extracted feature map was fused through the constructed enhancement convolutional neural network, and at the same time, the enhancement module in the network was used to enhance the target characteristics and suppress false alarm rate. In this paper,three typical target detection methods were selected to carry out an experimental comparison with the proposed method,and in terms of the performance of SCRG and BSF values, the control methods could achieve good results in their respective applications.Although the proposed method wasn’t better than the control methods in all classes, in general, it had a better performance in improving SNR of the target and suppress background.In terms of detection performance, the control method showed their own advantages under different conditions. However, the proposed method can not only suppress complex background areas but also enhance the intrusion targets in practical applications. The experimental results indicate that,compared with some classical infrared target detection methods,this method has a stronger robustness in improving SCRG and BSF values of the image, and significant performance in detection rate and false alarm rate. The intrusion targets such as detection personnel and suspicious objects are important tasks in the field of protection.In the future,this method can be applied in key military protection and monitoring fields,including prohibited military zones,border posts,airport boundary and national defense engineering, so this method has a broad military application prospect.

Funding

This work was supported by the National Natural Science Foundation of China (grant number: 61671470), the National Key Research and Development Program of China (grant number:2016YFC0802904), and the Postdoctoral Science Foundation Funded Project of China (grant number: 2017M623423).

Declaration of competing interest

We declare that the contents of this manuscript have not been copyrighted or published previously and does not have any financial or non-financial conflict of interest.I am one author signing on behalf of all co-authors of this manuscript, and attesting to the above.