Measuring air traffic complexity based on small samples

2017-11-20 02:39XiZHUXianbinCAOKaiquanCAI

CHINESE JOURNAL OF AERONAUTICS 2017年4期

关键词：堤坝坝体灌浆

Xi ZHU,Xianbin CAO,Kaiquan CAI

School of Electronics and Information Engineering,Beihang University,Beijing 100191,China

Beijing Key Laboratory for Network-Based Cooperative Air Traffic Management,Beijing 100083,China

Measuring air traffic complexity based on small samples

Xi ZHU,Xianbin CAO*,Kaiquan CAI

School of Electronics and Information Engineering,Beihang University,Beijing 100191,China

Beijing Key Laboratory for Network-Based Cooperative Air Traffic Management,Beijing 100083,China

Air traffic complexity is an objective metric for evaluating the operational condition of the airspace.It has several applications,such as airspace design and traffic flow management.Therefore,identifying a reliable method to accurately measure traffic complexity is important.Considering that many factors correlate with traffic complexity in complicated nonlinear ways,researchers have proposed several complexity evaluation methods based on machine learning models which were trained with large samples.However,the high cost of sample collection usually results in limited training set.In this paper,an ensemble learning model is proposed for measuring air traffic complexity within a sector based on small samples.To exploit the classification information within each factor,multiple diverse factor subsets(FSSs)are generated under guidance from factor noise and independence analysis.Then,a base complexity evaluator is built corresponding to each FSS.The final complexity evaluation result is obtained by integrating all results from the base evaluators.Experimental studies using real-world air traffic operation data demonstrate the advantages of our model for small-sample-based traffic complexity evaluation over other stateof-the-art methods.

©2017 Chinese Society of Aeronautics and Astronautics.Production and hosting by Elsevier Ltd.This is an open access article under the CC BY-NC-NDlicense(http://creativecommons.org/licenses/by-nc-nd/4.0/).

1.Introduction

Air traffic complexity is an objective and critical metric for evaluating the operational status of given airspace.From the perspective of complexity science,it can be summarized as three types of complexity contained in an air traffic management system(ATMS):the complexity embedded in the global operational pattern,the complexity contained in the relationships between various elements,and the complexity embodied in the uncertainty of the evolutionary trend.1,2Based on this definition,we can infer that air traffic complexity has a dominant influence on the workload of the air traffic controller(ATCo)because it brings the ATCo difficulty in perceiving traffic situations and making right decisions.In other words,ATCos are more likely to increase operational errors with higher traffic complexity.3Therefore,air traffic complexity is a critical factor that affects the operational safety in ATMS,and in turn ultimately limits airspace capacity.4

Today’s ATMS,composed of numerous airspace sectors with varying air traffic flow,is a large-scale and rapidly evolving complex dynamical system,and thus air traffic complexity is consistently changing over time and sectors.For a specific sector,a mismatch of excessive air traffic complexity and limited traffic management ability frequently occurs,which may lead to airspace congestion and flight delays.To avoid such situations,we should effectively tune the operational status of the sectors by traffic management methods,such as air traffic flow management and dynamic airspace configuration,to balance the traffic complexity and the controllability of each sector.To implement these techniques,a reliable measurement of air traffic complexity is needed.Thus,air traffic complexity evaluation has become a popular research topic in the air traffic management(ATM)field.

To measure air traffic complexity,a direct approach is to define a tangible complexity indicator that can be explicitly formulated.Many scholars define the complexity indicator by a traffic attribute that is identified by them as the predominant representative for traffic complexity,such as the difficulty of potential conflict resolution,5–7the probability of conflict occurring,8–10and the disorder of traffic trajectories.11,12Note that each indicator of this type depicts air traffic complexity from a certain angle,which has limitations because the ATMS includes so many elements.For example,the former two indicators mentioned above cannot reflect traffic surveillance complexity.Thus,the definition perspectives of these indicators are insufficient for characterizing traffic complexity comprehensively.

There is another complexity measurement approach that has a more comprehensive view.Considering that air traffic complexity is the result of complicated interactions among a range of traffic attributes(complexity factors),many scholars use machine learning technique to measure complexity.Gianazza13,14and Xiao et al.15have respectively advanced two representative machine learning-based complexity evaluation models that achieved satisfactory performance through fully training on a large number of samples.Nevertheless,in the real world,a large sample set can be very difficult to obtain due to the expensive cost of accurately labeling the complexity value for the complexity sample(a complexity sample includes a collection of complexity factors and a corresponding complexity degree).The labeling work needs real-time participation of ATCos during the control task,which is timeconsuming and labor-intensive.Therefore,in most cases,only a small number of samples are available for training the complexity evaluation model.In addition,the operational rules of ATMS are changing slowly,and the complexity generation laws are also evolving gradually.Thus,complexity samples and the evaluation model should be updated occasionally.For this reason,constructing large dataset and retraining evaluation model would be considerable burdens.Therefore,it is necessary to develop an improved model to accommodate the real-world applications with limited sample set.

In this paper,a novel machine learning method for rating sectors’air traffic complexity levels with small dataset is presented.In our work,air traffic complexity within a sector is classified into three levels:Low,Normal and High.Specifically,the low complexity level indicates a simple traffic pattern and a waste of control resources because the workload is much less than that the ATCo can provide.The normal complexity level indicates a balance between the traffic control demand and the ATCo’s control ability.Therefore,at this level,the control resources can be effectively used while safety is ensured.The high complexity level means that the traffic is difficult to control,and the workload is high so that the ATCo is likely to increase operational errors.In this context,our model can be used as a decision support tool.The traffic complexity level output by our model can help ATCos make tactic control decisions,such as splitting or merging sectors.In consideration of the small sample set,we expect to obtain satisfactory evaluation results by encouraging mining of the classification information contained in every dimension of each sample.Hence,the first step of our approach is to generate multiple small-size factor subsets(FSSs)by sampling factors from the‘factor pool” (FP,the original factor set).Then,corresponding to every FSS,a base classifier is trained.Next,we integrate the evaluation results of all of the base classifiers to obtain the final result.Within this ensemble learning scheme,each factor in the FP can be included in multiple small FSSs,and thus has many ‘chances” to be learned by numerous base evaluators.Note that our approach is an improved version of a popular ensemble learning model—random subspace(RS).16The improvement lies in how the FSSs are generated.In the traditional RS,each FSS is generated by randomly selecting factors,whereas in our approach,the factors’noiseand independence analysis is referenced to generate more efficient and compact FSSs that include fewer noisy and redundant factors.Therefore,our FSS generation strategy further facilitates the expression offactors’information,and good complexity evaluation results should be achieved.

The remainder of this paper is organized as follows:Section 2 reviews representative air traffic complexity measurements proposed by predecessors;Section 3 elaborates the proposed ensemble learning model designed for rating sectors’air traffic complexity levels based on small samples;Section 4 presents the experimental studies and the analysis of the results;Section 5 concludes this paper and suggests future research work.

2.Related work

To date,numerous air traffic complexity evaluation methods have been proposed by many scholars and engineers.As mentioned in Section 1,these methods broadly fall into two categories.

The first category is to characterize air traffic complexity by an explicitly formulized indicator that describes the complexity from a certain angle.Forexample,the input-output approach5–7,proposed by Lee et al.,defines traffic complexity as ‘how difficult” a given traffic situation is in terms of the control activity required to resolve it in response to a change in ‘reference signal”,that is,the presence of a new aircraft entering the airspace.Besides Lee,Prandini et al.8–10proposed a mid-term air traffic complexity characterization approach based on the occurrence probability calculation of multiple aircraft converging within a specific distance,and the aircraft future flying process is modeled as Brownian motion.Another representative metric,proposed by Delahaye et al.,is based on the Lyapunov exponent(LE).11,12In this approach,the ATMS is modeled by nonlinear differential evolution equations,and the LE is used to measure the rate of exponential convergence or divergence of nearby trajectories and thus quantify the system disorder degree,which is defined as the complexity indicator.The three complexity indicators exemplified above might have limitations on the comprehensiveness of their complexity descriptions because they are proposed based on single perspective.Defining air traffic complexity by an explicit indicator also leads to the fact that the definitions have been constantly created by researchers over decades,but widely-accepted one has not evolved.1

The other category of air traffic complexity evaluation approaches is to obtain the complexity value by synthesizing numerous complexity factors.Most researchers agree that a range of attributes,parameters and ingredients of the air traffic situation in a given airspace are related to,or reflect air traffic complexity,and these attributes are termed air traffic complexity factors.To date,various lists of complexity factors have been created.14,15,17–21In 2016,a list of 28 complexity factors originally collected by Gianazza and Guittet14was reviewed and utilized in the air traffic complexity evaluation work of Xiao et al.15These factors have been consistently found to be important in measuring air traffic complexity.Therefore,in this paper,we also use these 28 factors to build our complexity evaluation model.For a thorough review of these factors,readers can refer to Refs.14,15.

A number of complexity evaluation models have been proposed to build up mappings from factors to integrated complexity volume.Among these works,Gianazza and Xiao’s approaches are more representative.Gianazza13,14applied an individual backpropagation neural network(BPNN)to evaluating the traffic complexities of the sectors operated by 5 French enroute air traffic control centers(note that,the original evaluation object of Gianazza’s method is ATCo’s workload;considering thatalloftheinvolved factorsare objective and that the workload can also be an indicator of air traffic complexity,we can recognize Gianazza’s method as a complexity evaluation method).First,by processing real flight tracks and sector opening archives,an air traffic complexity sample set was established.Corresponding to a one-minute air traffic scenario within a sector,each sample comprises 28 complexity factors and a complexity level(Low,Normal or High).Then,the factor reduction was implemented based on principal component analysis(PCA)and Bayesian information criterion.Among the FP,only 6 principal complexity factors are adopted as the inputs of the BPNN.Finally,after studying a large number of samples,the BPNN is able to accurately classify the complexity level corresponding to any fresh unlabeled complexity sample.Thus,Gianazza’s method(referred to as BPNN_PCA in the rest of this paper)actually converted the complexity evaluation problem to a complexity level classification task.

In 2016,Xiao et al.15presented a further work on the same problem.In the proposed air traffic complexity evaluation model(ATCEM),7 critical complexity factors are selected from the FP by genetic algorithm.Thereafter,the adaptive boosting(AdaBoost)ensemble learning model was built based on large samples to evaluate sectors’complexity levels.The results showed that the performance of ATCEM is better than that of BPNN_PCA due to more efficient and comprehensive combination of significant factors achieved and more superior and stable machine learning model used.

Note that both Gianazza’s and Xiao’s methods were trained with at least thousands of samples,whereas in many real-world cases,as previously explained,only a small complexity sample set is available.The factor reduction procedures embedded in both methods are only suitable for large dataset learning.When the dataset is small,the factor reduction would result in a serious lack of training of the evaluation model.Therefore,investigating the complexity evaluation method based on a small complexity dataset is necessary.

3.Air traffic complexity evaluation model based on small-sample learning

3.1.Motivation and basic idea

To identify the complicated correlation between air traffic complexity factor and complexity level,previous studies employed thousands or even tens of thousands of complexity samples to fully train their machine learning models.However,as explained in Section 1,it is not easy to collect so many complexity samples.The likelihood of achieving a satisfactory complexity evaluation performance with a small training dataset seems small.

In this paper,our goal is to find an appropriate solution for evaluating complexity under small samples.Fortunately,there are many factors existing in complexity dataset that can provide considerable related information for evaluating complexity.Therefore,our basic idea is to design an evaluation model that can facilitate the acquisition and utilization of the information embedded in these factors.To realize this thought,an individual machine learning model based on all available factors is apparently undesirable because the information contained in various factors would be blended and diminished by one another;the practice of employing factor reduction to select critical factors before building a machine learning model(as done by Gianazza13,14and Xiao et al.15)is also inappropriate because many informative factors would be cut offin the factor reduction procedure.Thus,we are inclined to utilize the basic scheme of the RS ensemble learning model.This learning scheme initially generates many small-size and diverse FSSs.For each FSS,an individual base classifier is created,and the ultimate evaluation result is the combination of all of the base classifiers.Within this learning scheme,each factor in FP can be included in multiple compact FSSs,and thus has much more ‘chances” to be learned by numerous base classifiers.Through classifier combination,the advantages of all base classifiers can be integrated,and the shortcoming of every base classifier can be filled by another.However,a crucial question remains unclear—The traditional RS model generates every FSS by selecting factors randomly;is random generation appropriate when the handled factors are air traffic complexity factors?

Note that a number of relatively noisy and redundant factors exist in the FP.When characterizing a factor,the word‘noisy” means offering little help to the complexity evaluation task,and ‘redundant” means offering overlapped information for rating complexity.To make full use of the intrinsic information of factors in FSS,particularly the ones that are critical for evaluation,we should inhibit the ‘noisy” and ‘redundant”factors in FSS generation.To achieve this effect,the factor analysis is implemented to support the FSS generation process.Numerous efficient and compact FSSs would thereby be obtained,leading to high accuracy of the final evaluation output.Fig.1 illustrates the general block diagram of our approach.The upcoming sections elaborate the details.

3.2.Guided factor subset generation(GFSS)

As previously stated,to employ every complexity factor to evaluate complexity,we designed a guided factor subset generation strategy to eliminate the noisy and redundant factors.This section elaborates the strategy of GFSS in detail.

The foremost step of the GFSS is factor analysis.On the one hand,we must measure the ‘noise” degree of each factor in the FP;on the other hand,to identify redundant factors,the mutual independence between each two factors must be assessed.

To measure the noise degrees of factors,we utilize the metric of signal-to-noise ratio(SNR),which derives from Ref.22and has been modified for three categories.SNR is calculated by

where μL(ft)and σL(ft)are,respectively,the mean value and the standard deviation of the factor ft attributed to the low air traffic complexity class.Similarly,μN(ft),σN(ft),μH(ft)and σH(ft)are the corresponding statistics offt attributed to the normal and high complexity class.Larger SNR value indicates a stronger ability to discriminate among classes.

Fig.1 General block diagram of proposed complexity evaluation approach.

To assess each factor’s independence from any other factor,we borrow the idea ofh2correlation coefficient23and define theg2independence coefficient.The independence calculation is based on the factor prediction.For example,we measure the factorY’s independence from another factorX.We denote this independence degree as’s definition stems from the idea of consideringYas a dependent variable affected byX.IfYis strongly dependent uponX,Ycan be well predicted based onX;otherwise,Ycan only be partially predicted.The predictable part ofY’s variance is termed the ‘explained variance”,that is,it can be explained by the knowledge ofX.The remaining portion ofY’s variance is named the ‘unexplained variance”,which is estimated by subtracting the ‘explained variance”from the total variance ofY.Thus,the larger the proportion of the ‘unexplained variance” contained in the total variance ofYis,the more independenceYhas fromX.Therefore,is defined as an independence ratio of unexplained variance/total variance.

In practice,we only have a certain number of(X,Y)sam-

ples,represented as{(xi,yi)},1≤i≤N,for calculatingThe formula is

用劈裂灌浆防渗加固技术来改进坝体的稳定性，是堤坝加固领域的一种非常有效的加固方法，多年来该技术在中小型水库上坝防渗加固中得到广泛应用。

wherey=the regression function obtained from the regression analysis.Here,we use the locally weighted linear regression method,24which is non-parametric and does not require any specification of a function to fit all(xi,yi).The idea of this regression method is that,for a givenx0∈ [min1≤i≤Nxi,max1≤j≤Nxj],a straight linefx0(x)=kx0x+bx0is fit locally for a subset of observations nearx0by weighted least squares.The values forkx0andbx0are determined by solving

wherewx0(xi)is a tri-cube weight function

In Eq.(4),x[τ]is an observation in{xi}which is the⌈τ·N⌉th closest tox0(0 ＜ τ≤ 1).The weight function determines the subset of{(xi,yi)}over which the local regression is performed;if τ is set higher,this observation subset becomes larger so that more observations nearx0are used to do the local regression.The weight functionwx0(xi)also gives more weight to the observations that are nearer tox0,and less weight to the points that are further away.

By the locally weighted linear regression method,given anx0∈ [min1≤i≤Nxi,max1≤j≤Nxj],we can obtain its regressed valuefx0(x0),that is,f(x0).Therefore,we can obtain the complete regression curvef(x)on[min1≤i≤Nxi,max1≤j≤Nxj].Here,we provide the closed-form solution forf(x)in Eq.(5).and Wx=diag(wx(x1),wx(x2),···,wx(xN)).Readers who want to check more details about this regression method can refer to Ref.24.

Theg2independence is an estimation of the relationship between two factors.Thus,in our problem,it is acceptable to apply the layered mode of theg2independence coefficient—l(g2)(g2independence level),which is defined in Eq.(6).The values of the two thresholds θLNand θNHare discussed in Section 4.2.2.

Theg2independence coefficient utilized in our problem has three advantages.First,its calculation does not require any prior knowledge such as the general distribution of data.Second,it can be applied to factors independent of whether their relationship is linear or not.Third,theg2coefficient’s algorithm considers asymmetriesdue to different capabilities ofX-to-YandY-to-Xprediction).Thus,theg2coefficient can weigh the factor’s prediction ability more reasonably than traditional symmetrical metrics.

By applying SNR and theg2independence coefficient,we can easily identify the noisy and redundant complexity factors within any factor sets of different sizes(an example is provided in Appendix A).Having obtained the knowledge of factors’noise and independence through analysis of the training set,we come to the next step:FSS generation(Table 1).

From the pseudo-code of GFSS,we can see that the key idea of the GFSS is to decrease the probability of selecting noisy and redundant factors into each FSS.A FSS is generated bykloops each selecting one factor.In each factor selection loop(except for the first loop),each candidate factorchance to be selected into FSS is positively correlated with notonly its relevance for classification (measured bybut also its independence from the factors previ-ously selected into FSS in earlierloops(measured byis thejth factor in FSS).The factor selection probability should be appropriately adjusted by tuning α and β in the range of [0,+∞) (here we specifically define thatand α equal 0).Through the GFSS strategy,noisy and redundant factors are properly inhibited in FSSs,particularly factors that are both noisy and redundant.

Table 1 Pseudo-code description for GFSS.

3.3.Ensemble learning model for evaluating air traffic complexity

Fig.1 shows that,in addition to the GFSS strategy elaborated in Section 3.2,the design and the integration way of base classifiers also must be specified for the proposed ensemble learning model.Every base classifier is trained on a training subset,which is generated by spanning the original training set with the corresponding FSS.Two basic machine learning models are considered as the base classifiers:multi-class support vector machine(MSVM)and BPNN.Considering the small-sample problem we face,we can utilize the MSVM as the base classifier due to the support vector machine(SVM)’s excellent performance on learning small datasets.25Specifically,the MSVMs we use are 3-class directed acyclic graph MSVMs(DAG-MSVMs)26which are formed by three 2-class SVM models.Besides MSVM,BPNN is also adoptable because its instability may bring much diversity among the base classifiers,thus lifting the overall accuracy of the ensemble.For the final decision-making(base classifier integration),we apply the simplest majority voting rule,whose merits lie in neither requiring any apriori knowledge nor requiring any complex and intensive computation.27

4.Experiments

To check the performance and analyze the characteristics of the proposed airtraffic complexity evaluation method GFSS_RS,we designed two groups of experimental studies:Group A(elaborated in Section 4.1)is the study of performance comparisons between the GFSS_RS and several other comparable models;Group B(elaborated in Section 4.2)is the parameter research study of the GFSS_RS.

4.1.Group A experiments:performance comparison between complexity evaluation methods

We test GFSS_RS with 3 other comparable methods,including Gianazza’s model BPNN_PCA,13,14Xiao’s model ATCEM,15and the traditional RS model.16Two types of RS and GFSS_RS are involved in the experiments.One type,referred to as RSSVMand GFSS_RSSVM,uses DAG-MSVM as base classifier;the other type,referred to as RSNNand GFSS_RSNN,uses BPNN as base classifier.

All the experiments are performed based on real ATMS operating data.The original pool of air traffic complexity samples(denoted as SP)applied in our experiments is derived from the operational processes of 6 sectors regulated by the Southwest ATM Bureau of China(Fig.2).The 6 sectors’operational time involved in the SP is 00:00–16:00 GMT on July 28,2010.In the SP,each sample corresponds to a one-minute air traffic scenario slice of a sector,comprising 28 complexity factors(refer to Refs.14,15)and a complexity level(Low/Normal/High)obtained from ATCos.There are total 5760(960 for each sector)samples in the SP.

To evaluate the models’performance based on test results,we employ 7 criteria:Acc,AccL,AccN,AccH,AccBC,PPT and PMTT.Acc is the overall classification accuracy on test samples;AccL/AccN/AccH is the classification accuracy on the test samples belonging to the Low/Normal/High category respectively(in the rest of this paper,we simply use ‘Low/Normal/High”to denote the category of Low/Normal/High air traffic complexity level);AccBC is the average accuracy of base classifiers,which is counted only for RS and GFSS_RS.In addition to these 5 criteria applied to measuring model’s accuracy,we also use the pre-processing time(PPT)and the pure model training time(PMTT)to jointly indicate the computational cost of model training.PPT refers to the procedure duration of gaining the necessary apriori knowledge(such as the knowledge of significant factors in FP)before generally training a model.PMTT denotes the time of only purely training a model,that is,subtracting the PPT from the whole training duration.

The foremost empirical study we perform is the performance test of 6 complexity evaluation models:BPNN_PCA,ATCEM,RSSVM,GFSS_RSSVM,RSNNand GFSS_RSNN.For all the experiments in Group A,the parameter settings remain the same for the same model:τ ing2independence coefficient calculation is set as 0.1;θLNand θNHused in the transition fromg2tol(g2)are set as 0.4 and 0.75 respectively;the number of base classifiers(T)for ATCEM,RS and GFSS_RS is uniformly set to 10;the size of each FSS(k)for RS and GFSS_RS is identically set to 9;the α, β in GFSS_RS are equally designated as 1(according to the results in Group B experiments,these settings can generally ensure good performance of GFSS_RSs built on the training sets of different sizes);for SVMs in RSSVMand GFSS_RSSVM,we use the Gaussian kernel with its bandwidth parameterized as 1;for BPNNs in RSNNand GFSS_RSNN:the hidden layer has 15 units;the maximum number of training iterations is 500;the maximum mean square error of training set classification is 0.15.

To investigate how the training set size influences models’performance,we train the 6 models by the training sets of different sizes(ranging in 18,30,60,...,540),then test the models on a test set of uniform size 1080.The results are displayed in Fig.3.Note that each result presented in Section 4(Figs.3–9 and Table 2)is the average result of 60 independent runs.In each run,both the training and test samples are randomly selected(without replacement)from the SP,and no sample is selected as training sample and test sample simultaneously.In Figs.3 and 4,the bars indicate a 95%confidence interval.

For RS and GFSS_RS,the average accuracy of base classifiers(AccBC)and the accuracy benefit deriving from integrating base classifiers(illustrated by the difference between Acc and AccBC)are also worthy of attention because the two elements combine to boost the ensemble accuracy.Fig.4 displays how the Acc,AccBC of RSSVM,GFSS_RSSVM,RSNNand GFSS_RSNNchange with the training set size.

In addition to model’s accuracy,the computational cost of training is another aspect of model’s performance.Hence,the comparison of models’PPT and PMTT is roughly conducted in this paper(Table 2).The training set size that we applied for this comparison work is 270.The hardware configurations of our computer consist of Intel Core i5-5200U CPU and 8 GB RAM.

Through the experimental results obtained,we observe the following:

Fig.2 Six sectors studied in experiments.

Fig.3 Models’performance varying with training set size.

Fig.4 Acc and AccBC of two types of RS and GFSS_RS.

(1)Fig.3 illustrates that,as the training set size changes(ranging from 18 to 540),the proposed GFSS_RSSVMand GFSS_RSNNcan always obtain better performance than the other 4 methods,particularly when the training set size is less than 270.Specifically;(A)compared with BPNN_PCA and ATCEM,our model has distinct accuracy advantage when the training set size is between 60 and 270.This advantage is derived from better smallsample learning ability of our model,which can be reflected by the steep performance increase of our model as the training set size grows beginning at 18;(B)The GFSS_RSSVM/GFSS_RSNNcan always obtain higher Acc than RSSVM/RSNN,illustrating that the GFSS strategy can stably promote GFSS_RS to perform better than traditional RS;(C)As indicated by the relatively narrow confidence intervals,the GFSS_RS’s performance is more stable than that of the BPNN_PCA and ATCEM.We speculate that the instabilities of BPNN_PCA and ATCEM’s performance are derived from their factor reduction procedures, which produce unstable outputs under the condition of small samples.

(2)Fig.3 demonstrates that,compared with BPNN_PCA and ATCEM,the RSSVMand GFSS_RSSVMhave obvious advantages when classifying ‘Low” and ‘High”samples,and poor classification ability on ‘Normal”samples.The RSNNand GFSS_RSNN,by contrast,obtain much more balanced results on AccL,AccN and AccH.Hence,we infer that the base classifier MSVM tends to classify ‘Normal” sample into ‘Low”and ‘High” categories.Despite the imbalance on the classification accuracies of the three categories,we still recognize that MSVM is more suitable to be base classifier due to the GFSS_RSSVM’s advantage on identifying high complexity level,which is useful for avoiding operational risk in real applications.Additionally,compared with GFSS_RSNN,the GFSS_RSSVMrequires significantly less PMTT under the same training set size(Table 2).

(3)Fig.4 shows that,compared with RS,GFSS_RS has higher AccBC.The results support our previous thinking:the proposed GFSS strategy can inhibit the noisy and redundant factors from being selected into FSSs and prompt the classification information contained in factors to be effectively mined.Note that the higher average accuracy of base classifiers and the sufficient accuracy benefit derived from base classifier integration conjointly contribute to higher ensemble accuracy.

Table 2 PPT and PMTT of 6 models when training set size is 270.

(4)Table 2 illustrates that,compared with BPNN_PCA and ATCEM,our approach has less PMTT and less PPT.For our model,the PPT is used mainly for calculating the independence between each two factors.Note that the results of the pre-process based on a noiseless and comprehensive dataset could be applied repeatedly unless the evolution of the ATMS makes the preprocessing results invalid.Therefore,it might be unnecessary to do pre-processing each time before training a new GFSS_RS.

In short,the experimental results show that,for the task of small dataset-based air traffic complexity evaluation,the GFSS_RS has advantages over the current representative complexity evaluation models such as BPNN_PCA and ATCEM.

4.2.Group B experiments:research on GFSS_RS’s parameters

For the proposed GFSS_RS,there are several important parameters,such as τ ing2independence coefficient calculation,θLNand θNHused in the transition fromg2tol(g2),the size of FSS(k),and the α and β (determine the strength of the guidance from factor noise and independence analysis).They influence the model’s performance in a fuzzy mechanism.To determine how GFSS_RS’sparameters affect the model’s performance,we conducted several parameter research studies in Group B experiments.In these experiments,all of the GFSS_RSs use MSVM as base classifier.Except for the researched parameters,all of the settings of the GFSS_RS remain the same as in Group A.

4.2.1.Parameter research onτ

τ is the only parameter in theg2independence coefficient calculation.For example,there are two factors ft1,ft2 and their dataset{(ft1i,ft2i)},and we intend to measure ft2’s independence from ft1.If τ is configured larger,the regressed curvef(ft1)will become smoother,leading to a higher value ofHere,we design a group of experiments to explore how τ influences the performance of the GFSS_RS.In these experiments,6 training set sizes are involved;for each training set size,τ ranges from 0.05 to 0.5 in steps of 0.05.The criterion of Acc is utilized for assessing the performance of the GFSS_RS.The experimental results are displayed in Fig.5.In Fig.5,the 6 curves,denoted as Acc_90,Acc_180,...,Acc_540 respectively,correspond to the GFSS_RS model built with 90,180,...,540 training samples.We can observe that,with respect to the significant change of τ,each curve has slight ups and downs,revealing that the GFSS_RS’s performance is robust to the choice of τ.Considering that high τ can raise the computational complexity of the factor independence analysis(for a locally weighted linear regression,high τ means a larger local sample subset employed to calculate each single regressed point),we believe that it is better to choose a relatively small value for τ.

Fig.5 GFSS_RS’s performance change when increasing τ.

4.2.2.Parameter research onθLNandθNH

θLNand θNHare the threshold parameters in the transition fromg2independence coefficient tol(g2)independence level.Here,we design a group of experiments to explore the relationship between GFSS_RS’s performance and the values of θLNand θNH.We let both θLNand θNHrange in 0.1,0.2,...,0.9 while keeping θLN＜ θNH,and then obtain 36 combinations of(θLN,θNH)in total.Based on each combination of(θLN,θNH),we build a GFSS_RS and test its performance.The experiments are performed under 6 training set sizes,and the results are displayed in Fig.6.

Fig.6 shows that,among each of the 6 subplots,the(θLN,θNH)that best promotes GFSS_RS’s performance respectively is(0.8,0.9),(0.3,0.7),(0.2,0.6),(0.2,0.8),(0.1,0.5)and(0.5,0.7).Therefore,it is difficult to identify the generally best solution of(θLN,θNH)for all of the 6 training set sizes.Here,we perform further statistical analysis to find this generallyWe selectnbestchoices of(θLN,θNH)that best promote the GFSS_RS’s Acc in each subplot,then obtainby averaging all of the selected(θLN,θNH) (6 ·nbestin total).Whennbest=1,2,3,4,the=(0.35,0.70),(0.38,0.75),(0.41,0.76),(0.43,0.75)respectively. Therefore,can be roughly identified as(0.40,0.75).Despite the uncertainty of the best choice of(θLN,θNH)in these subplots,any solution of(θLN,θNH)will not obviously affect the performance of GFSS_RS;hence,the performance of GFSS_RS is robust to the choice of(θLN,θNH).

4.2.3.Parameter research on size of FSS

To probe the relationship between the FSS sizekand the GFSS_RS’s performance,we let the size of FSSkvaries from 3 to 17 and observe the corresponding changes in GFSS_RS’s performance as evaluated by the 5 criteria.We experiment on training sets of different sizes.The results are presented in Fig.7(In Figs.7–9,the bars indicate the standard deviations of results).

Fig.6 GFSS_RS’s performance change as θLNand θNHvary.

In Fig.7,the results on training sets of various sizes reveal common phenomena.The optimum FSS size,with which the GFSS_RS would obtain its best performance,is approximately 7–9 for every training set size.The AccBC generally increases with the FSS size.Moreover,the AccL and AccH curves indicate that our algorithm has the potential to be customized:We could tune our model to be more sensitive to low/high traffic complexity situations by decreasing/increasing the size of FSS.

4.2.4.Parameter research onαandβ

Here,we explore the links between the GFSS_RS’s performance and the α and β.Essentially,a larger α / β means more strength in inhibiting the noisy/redundant factors from being selected into each FSS.The experiments are divided into 2 stages.

Fig.7 Performance of GFSS_RS based on different training set of FSS.

Fig.8 GFSS_RS’s performance change when simultaneously increasing α and β.

Fig.9 GFSS_RS’s performance change when tuning α and β under ‘α + β =2”.

In the first stage,we uniformly set both the two parameters α,β as 0,1,2,3,4,5 successively and observe how the GFSS_RS’s performance is affected.The guidance strength from factors’noise and independence knowledge grows exponentially with the increase of α and β.The experimental results are illustrated in Fig.8.

Fig.8 shows that the AccBC tends to increase when α and β are raised.And for each training set size,there always exists a‘sweet spot” for the combination of α, β (roughly when α, β equal 1–2)at which the Acc is maximized.The sweet pot represents the balance between average accuracy and diversity of base classifiers.Both of these two aspects jointly promote the ensemble accuracy,but usually negatively correlate with each other.27For the GFSS_RS,reducing the guidance on FSS generation can damage the average accuracy of base classifiers,while imposing too much guidance can hurt the diversity of base classifiers.Both of these adjustments can lower the GFSS_RS’s accuracy.

In the second stage experiments,we let α range from 0 to 2(α =0,0.4,0.8,1.2,1.6,2);at the same time,β varies from 2 to 0,keeping the relationship of‘α+ β =2”.Fig.9 presents the corresponding performance variation of GFSS_RS.

In Fig.9,the curves of AccBC indicate that the guidance from factor’s noise degree is more critical for building accurate base classifiers than the guidance from the independence between factors.Through investigating the characteristics of both Acc and AccBC curves,we infer that,although the guidance from factor’s independence has little effect on promoting AccBC,it can generate much diversity among base classifiers,which is reflected in the variations of the difference between Acc and AccBC.This interesting phenomenon may be derived from the intrinsic characteristics of the original FP.In the FP,most factors are very relevant with the complexity level,and at the same time,closely correlate with each other.For a FP of this type,generating FSS under guidance from the independence between factors may create more diversity among FSSs than sampling factors randomly.

5.Conclusion and future work

In this paper,we propose a new ensemble learning model for evaluating the complexity level of a sector based on small samples.To encourage the mining of the classification information contained in each factor,we generate multiple diverse FSSs consisting of less noisy and redundant factors under the guidance from factor analysis.Then,we construct a base classifier for each FSS and obtain the ultimate evaluation result by integrating the results of all base classifiers.The results of the experimental studies based on real-world data illustrate the proposed model’s advantages over several other comparable models when the training samples are limited.

We hope that the GFSS_RS model can be applied in some real-world ATM environments.Moreover,we believe that our model can be further improved in the future in the following two areas:(1)we can try to further improve our model’s performance by identifying a reasonable approach to optimize parameters,such as α and β;(2)because the unlabeled traffic complexity samples are easily obtained,we can attempt to build a more accurate complexity evaluation model by making use of unlabeled samples through semi-supervised learning techniques.

Acknowledgements

This study was co-supported by the State Key Program of National Natural Science Foundation ofChina (No.91538204),the National Science Fund for Distinguished Young Scholars(No.61425014)and the National Key Technologies Ramp;D Program of China(No.2015BAG15B01).

Appendix A.An example of identifying noisy and redundant factors in FP by using SNR and g2independence coefficient

In this paper,we use the SNR and theg2independence coefficient respectively to identify noisy and redundant factors in the FP.Here,we will provide an example.The studied datasetD,which is generated by randomly selecting samples in the SP,comprises 270 samples of the 28 complexity factors(refer to Refs.14,15for factor introduction).We use SNR to identify the noisy factors.First,the SNR value of every factor is calculated.Then,we select two factors which have relatively high SNR values(Nb,Dens),and two factors which have relatively low SNR values(hpro_1,avg_vs)as exemplified factors.Fig.A1 displays 4 scatterplots,each of which is for one of these 4 factors and associated complexity levels.A large dot indicates the presence of multiple overlapping dots.From the scatterplots,we can see that the SNR is an effective indicator to determine whether a factor is informative or noisy.The informative factors(specified by high SNR value)clearly have relatively stronger abilities to discriminate among different complexity levels than the noisy factors(having low SNR values).Note that the ‘noise” is a relative concept because it is impossible to determine an explicit boundary between‘informative” and ‘noisy”.

To identify the redundant factors in the FP,we calculate theg2independence coefficient between each two factors.Theg2independence coefficient is designed to measure one factor’s independence from another factor.It is asymmetric,and the range ofits value is[0,1].A largerg2coefficient indicates greater independence of a factor in relation to another factor.Here,we take the two factors creed_ok and track_disorder as examples.We calculate theis a factor in FP except creed_ok)and find that only 1 out of 27values are less than 0.4.Then,we calculate the(ft is a factor in FP except track_disorder)and find that 9 out of 27values are less than 0.4.Thus,we can infer that creed_ok is a unique factor in the FP,whereas track_disorder is a redundant factor.Similar to the word ‘noise”, ‘redundant”is also a relative concept because it is impossible to determine an explicit boundary between ‘unique” and‘redundant”.

Fig.A2 illustrates 4 scatterplots.Each scatterplot is for track_disorder and another factor.The red line in each subplot is the regression curve of two factors.From these scatterplots,we can intuitively know that track_disorder has little independence from all of the other 4 factors(Nb,Dens,speed_disorder and Conv).And all of these 4 factors have lowg2with track_disorder.Therefore,theg2independence coefficient is an effective metric for the independence between two factors.

Fig.A1 Scatterplots offactors and complexity levels.

Fig.A2 Scatterplots of track_disorder and another factor.

1.Brazdilova SL,Casek P,Kubalcik J.Air traffic complexity for a distributed air traffic management system.Proc Inst Mech Eng,Part G:J Aerosp Eng2011;225(6):665–74.

2.Song ZX,Chen YZ,Li ZL,Zhang DF,Bi H.A review for workload measurement of air traffic controller based on air traffic complexity.In:25th Chinese control and decision conference,2013 May 25–27,Guiyang.Piscataway(NJ):IEEE Press;2013.p.2107–12.

3.Pfleiderer EM,Manning CA,Goldman SM.Relationship of complexity factor ratings with operational errors.Oklahoma City(OK):Civil Aerospace Medical Institute,FAA;2007.Report No.:DOT/FAA/AM-07/11.

4.Christien R,Benkouar A,Chaboud T,Loubieres P.Air traffic complexity indicatorsamp;ATC sectors classification.5th USA/Europe air traffic management Ramp;D seminar;2003.p.1–8.

5.Lee K,Feron E,Pritchett A.Air traffic complexity:an inputoutput approach.Proceedings of the 2007 American control conference;2007 July 11–13;New York.Piscataway(NJ):IEEE Press;2013.p.474–9.

6.Lee K,Feron E,Pritchett A.Describing airspace complexity:airspace response to disturbances.J Guid Control Dynam2012;32(1):210–22.

7.Hong YY,Kim YD,Lee K.Conflict management in air traffic control using complexity map.J Aircraft2015;52(5):1524–34.

8.Prandini M,Hu JH.A probabilistic approach to air traffic complexity evaluation.Joint 48th IEEE conference on decision and control and 28th Chinese control conference;2009 Dec 16–18;Shanghai.Piscataway(NJ):IEEE Press;2009.p.5207–12.

9.Prandini M,Putta V,Hu JH.A probabilistic measure of air traffic complexity in three-dimensional airspace.Int J Adapt Control Signal Process2010;24(10):813–29.

10.Prandini M,Piroddi L,Puechmorel S,Brazdilova SL.Toward air trafficcomplexity assessment in new generation airtraffic management systems.IEEE Trans Intell Transp Syst2011;12(3):809–18.

11.Delahaye D,Puechmorel S.Air traffic complexity based on dynamical systems.49th IEEE conference on decision and control;2010 Dec 15–17;Atlanta,Georiga.Piscataway(NJ):IEEE Press;2010.p.2069–74.

12.Puechmorel S,Delahaye D.New trends in air traffic complexity.EIWAC 2009:Proceedings of ENRI international workshop on ATM/CNS;2009.p.55–60.

13.Gianazza D.Forecasting workload and airspace configuration with neural networks and tree search methods.Artif Intell2010;174(7):530–49.

14.Gianazza D,Guittet K.Selection and evaluation of air traffic complexity metrics.DASC 2006:Proceedings of 25th digital avionics systems conference;2006 Oct 15–19;Portland,Oregon.Piscataway(NJ):IEEE Press;2006.p.1–12.

15.Xiao MM,Zhang J,Cai KQ,Cao XB.ATCEM:A synthetic model for evaluating air traffic complexity.J Adv Transport2016;50(3):315–25.

16.Ho TK.The random subspace method for constructing decision forests.IEEE Trans Pattern Anal Mach Intell1998;20(8):832–44.

17.Chatterji GB,Sridhar B.Measures for air traffic controller workload prediction.Proceedings of the first AIAA aircraft technology,integration and operations forum;2001 Oct 16–18;Los Angeles California.Reston:AIAA;2001.p.1–15.

18.Kopardekar P,Schwartz A,Magyarits S,Rhodes J.Airspace complexity measurement:an air traffic control simulation analysis.Proceedings of 7th USA/Europe air traffic management Ramp;D seminar;2007.p.1–9.

19.Djokic J,Lorenz B,Fricke H.Air traffic control complexity as workload driver.Transport Res Part C:Emerg Technol2010;18(6):930–6.

20.Sun X,Wandelt S.Network similarity analysis of air navigation route systems.Transport Res Part E:Logist Transport Rev2014;70(70):416–34.

21.Cook A,Blom HA,Lillo F,Mantegna RN,Micciche S,Rivas D,et al.Applying complexity science to air traffic management.J Air Transp Manage2015;42:149–58.

22.Golub TR,Slonim DK,Tamayo P.Molecular classification of cancer:class discovery and class prediction by gene expression monitoring.Science1999;286(5439):531–7.

23.Hassan M,Terrien J,Muszynski C,Alexandersson A,Marque C,Karlsson B.Better pregnancy monitoring using nonlinear correlation analysis of external uterine electromyography.IEEE Trans Biomed Eng2013;60(4):1160–6.

24.Hastie T,Tibshirani R,Friedman J.The elements of statistical learning.2nd ed.Berlin:Springer;2009.p.191–9.

25.Vapnik V.Statistical learning theory.New York:Wiley Publishers;1998.p.191–218.

26.Manikandan J,Venkataramani B.Study and evaluation of a multi-class SVM classifier using diminishing learning technique.Neurocomputing2010;73(10–12):1676–85.

27.Yu HL,Ni J.An improved ensemble learning method for classifying high-dimensional and imbalanced biomedicine data.IEEE/ACM Trans Comput Biol Bioinform2014;11(4):657–66.

30 May 2016;revised 20 August 2016;accepted 30 November 2016

Available online 8 June 2017

*Corresponding author at:School of Electronics and Information Engineering,Beihang University,Beijing 100191,China.

E-mail address:xbcao@buaa.edu.cn(X.CAO).

Peer review under responsibility of Editorial Committee of CJA.

Production and hosting by Elsevier

http://dx.doi.org/10.1016/j.cja.2017.04.018

This is an open access article under the CC BY-NC-ND license(http://creativecommons.org/licenses/by-nc-nd/4.0/).

Air traffic control;

Air traffic complexity;

Correlation analysis;

Ensemble learning;

Feature selection