Intelligent Identification over Power Big Data:Opportunities,Solutions,and Challenges

2023-01-21 08:04LiangLuoXingmeiLiKaijiangYangMengyangWeiJiongChenJunqianYangandLiangYao

Liang Luo,Xingmei Li,Kaijiang Yang,Mengyang Wei,Jiong Chen,Junqian Yang and Liang Yao

1Dali Power Supply Bureau of Yunnan Power Grid Co.,Ltd.,Dali,671000,China

2School of Computer and Software,Nanjing University of Information Science and Technology,Nanjing,210044,China

ABSTRACT The emergence of power dispatching automation systems has greatly improved the efficiency of power industry operations and promoted the rapid development of the power industry.However,with the convergence and increase in power data flow,the data dispatching network and the main station dispatching automation system have encountered substantial pressure.Therefore,the method of online data resolution and rapid problem identification of dispatching automation systems has been widely investigated.In this paper,we perform a comprehensive review of automated dispatching of massive dispatching data from the perspective of intelligent identification,discuss unresolved research issues and outline future directions in this area.In particular,we divide intelligent identification over power big data into data acquisition and storage processes,anomaly detection and fault discrimination processes,and fault tracing for dispatching operations during communication.A detailed survey of the solutions to the challenges in intelligent identification over power big data is then presented.Moreover,opportunities and future directions are outlined.

KEYWORDS Data acquisition;data storage;anomaly detection;service fault-tolerant scheduling

1 Introduction

Recently,with the rapid development of the social economy and technology,the demand for a safe and stable power supply has generated higher requirements[1].The relevant power sector has also been constantly improving its own ability to meet the increasing needs of people,gradually recognizing the importance of safety in power production[2,3].

The emergence of power dispatching automation systems has greatly improved the efficiency of power industry operations,solved many problems faced by the power system,and greatly promoted the rapid development of the power industry [4,5].Driven by the development level of the social economy and science and technology,the development of power dispatching automation has strong scientific and economic support,and the dispatching level has gradually improved [6].With the continuous development of teleoperation technology and improvement in the computer network level,the development of power grid dispatching automation systems has undergone qualitative changes,from a single-function teleoperation telemetry device to the current multifunctional intelligent microcomputer teleoperation system,and the popularity and stability of products,accuracy and integrity of information have been greatly improved[7,8].Currently,substations above 110 kV,most 35 kV substations and various types of power plants are equipped with corresponding dispatching automation systems[9,10].

As the speed of power grid construction accelerates and its scale expands,various dispatching automation systems are continuously built,and the business interaction applications between local dispatching systems and county dispatching systems become more frequent,so the power dispatching data network emerges[11–13].The power dispatching data network is a special data network serving power dispatching and production,and its safe,stable and reliable operation is a basic guarantee for the safe production of the whole power grid[14–16].The network has an important role in coordinating the joint operation of power system components,such as generation,transmission,transformation,distribution and consumption,and in ensuring the safe,economic,stable and reliable operation of the power grid and strongly guarantees the communication needs of power production,power dispatch,reservoir dispatch,fuel dispatch,relay protection,safety automatic devices,remote operation and power grid dispatch automation[17,18].

With the construction of a power supply company’s power dispatching data network,the main station has been consecutively connected to all county dispatching services,resulting in further convergence and an increase in information flow in the main station,which puts considerable pressure on the dispatching data network and the main station dispatching automation system [19,20].Due to access to county transfer automation business,the operation analysis report of power dispatch automation shows that many hidden defects are generated in the information flow,which is very difficult to trace and analyze due to the large amount of data and fast disappearance of information flow[21,22].The dispatching automation system sometimes encounters the problem of misreporting or omitting information due to misdirected data from communication channels or plant and station general control equipment[23–25].In addition to regular functional defects,there are still tremendous abnormal phenomena for which it is difficult to determine specific causes,which is not conducive to the elimination of power dispatch automation system abnormalities and becomes a bottleneck for improving the operation level of power dispatch automation systems[26,27].

1) Data analysis is not sufficiently accurate,and hidden problems are difficult to discover.The timely detection of many hidden problems from the master control and dispatch automation system is difficult through the dispatch automation system.For example,after the integration of local and county consolidation and regulation and control,an increasing number of dispatching objects are consecutively connected to the master station,and the dispatching automation system is stuck and unresponsive,which is likely to be related to problems such as intermittent errors and frequency of messages.

2) Problem data capture and diagnosis are not immediate enough.Intermittent false online,unsuccessful remote control,and partial signals that are not uploaded when tremendous signals are uploaded occasionally occur.Due to the lack of timely and in-depth analysis,the problem cannot be immediately identified,which hinders remote regulation and control and increases work costs.For example,if historical information is checked without a breaker variation signal,the system cannot determine the accident tripping after successfully tripping the reclosing action.This situation affects the efficiency of power grid repair.On the other hand,the outage information cannot be instantly reported.

3) Lack of data support for work such as counterevent analysis.Due to the lack of a black box mechanism,data for many problems are not instantly saved,which causes problems for scheduling automation personnel in learning,fault analysis,and drills.

To effectively identify the safety hazards inherent in the dispatching automation system in real time,it is necessary to improve the automation operation and management level of power supply companies and to lay a solid foundation for the safe,stable,high-quality and economic operation of the power grid through online analysis of the data volume of the dispatching automation system and rapid problem identification research.

The remainder of this paper is organized as follows: The acquisition and structured storage methods of the data flow are surveyed in Section 2.Section 3 investigates the existing anomaly detection approaches.In Section 4,the service fault-tolerant scheduling methods are surveyed.Section 5 concludes this paper.

2 Acquisition and Structured Storage of Data Flow

Presently,the single storage function enables optimization analysis and fault tracing when retrieving data [28,29].In addition,the current scheduling data network service information flow storage function mainly stores the information flow data generated for the remote communication between the station and the master station but lacks the storage function for the internal data flow of the master station system[30,31].It is difficult to combine data network information flow with scheduling key services to carry out data flow fault anomaly discrimination based on business scenarios[32,33].

To achieve the integrity of big data acquisition,data flow acquisition equipment depends on the high performance and fast processing capacity of software and hardware,which can quickly and deeply analyze the complete collected data and upload the data information after pattern recognition to the data business analysis platform[34,35].Therefore,we need to analyze large amounts of data collection and storage solutions.The framework of data acquisition and storage is illustrated in Fig.1.

Figure 1:Framework of data acquisition and storage

2.1 Challenges in Data Acquisition and Storage

In existing works on the acquisition and structured storage of data flow,there are some open challenges,which are illustrated in Fig.2.

Figure 2:Challenges in data acquisition and storage

1) Effective and valid data collection.The process of obtaining raw data from an external system or network is referred to as data collection [36,37].Effective and valid data collection is necessary because inefficient and improper data collection will negatively impact subsequent processing.

2) Efficient data transmission.Due to the high bandwidth consumption and the energy efficiency,transmitting numerous data to storage facilities becomes challenging[38,39].

3) Reliability and persistency of data storage.Considering the tremendous amount of data,it is challenging to achieve reliability and persistency of data storage while balancing the cost[40,41].

2.2 Current Solutions

2.2.1 Current Solutions to Data Acquisition

The data acquisition methods are listed in Table 1.Fig.3 illustrates the main performance of the acquisition methods.

Table 1:Data acquisition methods

Table 1 (continued)Target Reference Methods Advantage[46] Method of traffic data collection using vehicle-mounted monocular camera Optimizing resource utilization Enhance the flexibility and coverage of traffic data collection[47] Data acquisition scheme supported resource utilization optimization Optimize the limitation of the wireless communication bandwidth Achieve accurate monitoring data and reduce the number of participating vehicles,energy consumption and communication costs Security protection[48] On-demand vehicular sensing framework without infrastructure[49] Route optimization method Improve the collection of multimedia data considering dynamic factors[50] Data acquisition system based on blockchain Utilize the UAV as a relay to gather data[51] Security-preserving data collection and sharing scheme based on blockchain Ensure efficient and real-time data collection with security protection

Figure 3:Main performance of the data acquisition methods

Hagenauer et al.[42]introduced vehicular micro clouds as edge servers into vehicular networks,which contributes to data collection.In [43],a data forwarding algorithm,which improves the efficiency of the data delivery ratio by restricting the broadcast messages,was proposed.

Considering the system robustness,Turcanu et al.[44]combined the long-term evolution network with a vehicular hoc network to optimize the data volume and collection cycle.Due to the lightweight signaling process,the repeated collection of large amounts of data is avoided,and the impact on system load is minimal.

Ren et al.[45]proposed a data relay mule-based collection scheme,named DRMCS,to accomplish efficient data acquisition without numerous increases in redundancy.Considering the delay and task completion rate with data acquisition,DRMCS utilized a micro mobile data center selection method based on a simulated annealing algorithm to improve the data acquisition performance and fault tolerance of data collection.

To address the limitation of flexibility and coverage of traffic data collection,a traffic data collection method that uses a vehicle-mounted monocular camera is proposed[46].

Nie et al.[47]focused on the potential of vehicular sensors in traffic data acquisition.The data are collected by sensors and then transmitted to road side units while moving along the road.However,wireless communication bandwidth is limited due to the numerous update data generated by vehicular network applications.A data acquisition scheme for supporting resource utilization optimization is proposed.

Rahman et al.[48]designed an on-demand,vehicular sensing framework without infrastructure,in which the users’phones serve as mobile collecting sensors.Through the proposed framework,accurate monitoring data can be obtained,and the number of participating vehicles,energy consumption and communication costs can be reduced.Considering the dynamic factors in multimedia data collection,Li et al.[49]proposed a route optimization method in which two rules of data and vehicle priority are applied to improve the collection of multimedia data in the Internet of Things.

Islam et al.[50] designed a data acquisition system based on blockchain,which utilized an unmanned aerial vehicle (UAV) as a relay to gather data in the Internet of Things environment.In the proposed system,the data are encrypted before the transmission process with the assistance of the UAV.The data are then integrally stored after the verification of the edge servers.

To address the problem of data security,Kong et al.[51]proposed a security-preserving,data collection and sharing scheme based on blockchain.During the data acquisition process,the disjunctive normal form cryptosystem and an identity-based signcryption scheme are integrated into the secure variance calculation of the collected data.In intelligent transportation systems (ITSs),efficient and real-time data collection is significant.

2.2.2 Current Solutions on Data Storage

The data storage methods are listed in Table 2.Fig.4 illustrates the main performance of the acquisition methods.

Table 2:Data storage methods

Table 2 (continued)Target Reference Method Advantage[55] Semistructured data storage and processing engine Extract semantic values from a tremendous amount of patient data[56] Semistructured data in the data flow,based on the DL semistructured tree miner algorithm Effectively mined and stored data with the time attenuation model

Figure 4:Main performance of the data storage methods

Power systems generate millions or even billions of status,debug,and error records every day.To ensure the security and sustainability of power systems,it is necessary to quickly process and analyze a large amount of power data to realize real-time decisions.Traditional solutions typically use relational databases to manage power data.However,when the amount of data substantially increases,the relational database cannot effectively process and analyze a large amount of power data.

Jin et al.[52] proposed a distributed database based on Apache.DPI is a packet-based,deep inspection technology that detects different network application layer payloads,such as HTTP and DNS,and that determines the validity of packets by detecting the payload.

A multilevel filtering method is proposed to accomplish similarity join of the fuzzy string.With the proposed method,Wang et al.[53]designed an elastic framework that transformed the problem of calculating fuzzy matching similarity into the weighted maximum matching problem at the element level,record level and similarity level.

Based on regular expression,a reformative method in DPI is proposed.In the face of increasingly complex attacks,accurate string recognition has difficulty accurately obtaining features.Regular expressions with flexibility and high efficiency are widely employed in feature fuzzy matching.In the matching process,Sun et al.[54] utilized the character interval to describe multiple consecutive characters,which improves the transmission efficiency.

With the intention of handling the problem of increasing medical expenses caused by the swift increase in the quantity and quality of medical data,Satti et al.[55] designed a semistructured data storage and processing engine,which can extract semantic values from numerous patient data generated by a variety of data sources at different rates and different levels of abstraction.

In[56],the authors concentrated on the mining of semistructured data in the data flow based on the deep learning(DL),semistructured tree miner algorithm.The data can be effectively mined and stored with the time attenuation model.

2.3 Opportunities in Data Acquisition and Storage

Although some of the previously described challenges in data acquisition and storage are addressed,opportunities remain,as illustrated in Fig.5.

Figure 5:Opportunities in data acquisition and storage

1) Security in data transmission.Due to the limitation of network transmission conditions,data transmission is vulnerable to attack[57].Yuan et al.[58]discussed the security of wireless data transmission.Therefore,the joint optimization of security protection and efficiency during data transmission is one of the future directions of data acquisition.

2) Privacy preservation and security assurance.Although data storage has received widespread attention in recent years,as summarized in Fig.6,data storage security still faces severe opportunities.Bazai et al.[59] pointed out that MapReduce has potential risks of privacy disclosure.Hence,balancing privacy preservation and stability during the storage process is a promising direction for data storage.

Figure 6:Privacy protection schemes for data acquisition and storage

3 Anomaly Detection of Data Flow

3.1 Challenges

Anomaly detection and fault discrimination have been investigated in many existing works[60,61].Nevertheless,with the advent of the fifth generation (5G),the amount of data has significantly increased,which causes potential data anomalies and operating faults[62,63].Specifically,challenges for anomaly detection and fault discrimination methods arise,as illustrated in Fig.7.

1) Lack of training samples.Sufficient training samples are required to construct a model with high performance[64].However,as data collection is challenging,existing samples are usually lacking.Therefore,training samples with the same characteristics should be generated.

2) Anomaly detection in time series data.Time-series data,such as weather data and power data,have high requirements for real time[65].Therefore,to ensure the integrity of time series data,anomaly detection should be considered.

3) Privacy preservation in anomaly detection.To more efficiently detect anomalies,online,realtime anomaly detection methods are usually adopted [66].However,the private information of users may be detected,resulting in the disclosure of users.Therefore,privacy preservation is one of the challenges in anomaly detection.

Figure 7:Challenges in anomaly detection of data flow

3.2 Goals on Anomaly Detection Methods

The reliability and real-time service scheduling flow directly influence the function[67].Therefore,the service latency should be matched with the requirements of users while scheduling the service flow [68,69].The investigation of the abnormal feature detection and fault discrimination of data flow is divided into two parts.First,with the feature analysis of the information flow,the formation mechanism and feature quantity should be examined.Second,the abnormal feature detection method of data flow is formed by monitoring the information flow.In the next section,we summarize existing works on anomaly detection methods regarding the two goals of improving the stability of data flow and enhancing the accuracy of data flow.The framework of anomaly detection is shown in Fig.8.In addition,the main anomaly detection methods are illustrated in Fig.9.

Figure 8:Framework of anomaly detection

Figure 9:Main anomaly detection methods

3.2.1 Improving the Stability of Data Flow

The anomaly detection methods to improve the stability of data flow are listed in Table 3.

Table 3:Anomaly detection to improve the stability of data flow

Table 3 (continued)Target Reference Method Advantage Enhance the security of the automated vehicle transportation[80] Anomaly discrimination and classification approach[79] Anomaly detection approach with signal filtering discrimination Improve the stability of the automotive industry

Zhu et al.[70]investigated the anomaly detection of data with time series.Conventional anomaly detection approaches can only detect abnormal data with time series at a shallow level because of the instability of time series data.The authors combined long short-term memory (LSTM) with a generative adversarial network(GAN)and designed a fusion model,named LSTM-GAN,to detect abnormal features in time series data.

The anomaly detection methods in computer networks were surveyed in [71].First,the threat from attackers or crackers was introduced.Second,the deficiency of the traditional anomaly detection approach,which was based on the signature of threat,was analyzed.Last,the authors summarized existing anomaly detection systems and discussed the challenges and open problems.

Li et al.[72]explored network intrusion detection from the perspective of the stability of feature selection.Specifically,two feature selection approaches were evaluated,i.e.,variables importance measure with a random forest (RF-VIM) and recursive feature elimination with a support vector machine (SVM-RFE).SVM-RFE could select the significant features but was susceptible to the imbalance rate.RF-VIM could provide stable subsets.

Existing anomaly detection methods performed poorly in the absence of training samples.Therefore,an adaptive online anomaly detection approach in small samples was proposed[73].This method predicted the unknown anomalies by classifying the known anomalies.

To reduce the financial loss of financial statement fraud for investors,Yao et al.[74]investigated abnormal data detection in financial statements.First,the sources of abnormal data were analyzed.Second,the problem was modeled with DL.Last,a random forest method combined with feature selection and DL classification,which performed better when processing a large amount of financial data,was proposed.

Li et al.[75]proposed an anomaly detection approach based on a GAN for a cyber-physical system(CPS).By modeling the time sequence of sensors and actuators in the CPS,the potential anomaly was detected.Moreover,the proposed method effectively identified the anomalies caused by various attacks.

The stability of the wind turbine indicates its operating conditions.To develop the condition and anomaly detection for the wind turbine,Zhang et al.[76] presented a probabilistic anomaly discrimination method based on artificial intelligence,which was superior to conventional methods.

Sun et al.[77]analyzed the micro anomaly detection of the primary components in satellites.Due to the low discrimination accuracy in the conventional methods,an anomaly detection model based on the optimization sequence was constructed.By extracting the features of satellite telemetry data and segmenting the phases,anomaly discrimination for telemetry data was achieved.

Anomaly detection is necessary for the Internet of Things(IoT).However,the data are generally labeled to discriminate the anomalies.Guo et al.[78]investigated unsupervised anomaly detection in the IoT for time series data.The gated recurrent unit(GRU)was selected to represent the correlations among data,and Gaussian mixed priors were employed to characterize the data.

Currently,automated vehicle transportation,which is a novel MEC-based scenario,has been emerging.To enhance the security of automated vehicle transportation,Wang et al.[79] presented an anomaly detection approach combined with signal filtering discrimination.The anomalies were detected according to the trajectories of vehicles using the adaptive extended Kalman filter.Furthermore,the states of vehicles were realized by analyzing the states of surrounding traffic,which was more consistent with real conditions.

Numerous data were generated during the production and testing phases in the automotive industry.To evaluate the performance of vehicular systems,potential faults should be discriminated against.By analyzing the connections of vehicular systems,a robust anomaly discrimination and classification approach was presented[80].

3.2.2 Improving the Accuracy of Data Flow

The anomaly detection methods to improve the accuracy of data flow are listed in Table 4.

Table 4:Anomaly detection to improve the accuracy of data flow

Table 4 (continued)Target Reference Method Advantage Generate samples to train the identification model[92] Identification and correction method for drilling data[91] Generation and identification model with GAN Correct the abnormal drilling data[93] Extreme gradient boosting framework Improve the classification accuracy[94] Abnormal traffic discrimination model Generate the substation samples

Anomaly detection in computer networks was investigated in [81].First,the anomaly was classified according to the abnormal data.Second,the anomaly was exposed through a specific filter.Last,the bias scoring mechanism was utilized to adaptively detect the anomaly.

Song et al.[82] presented a framework for detecting abnormal sequences,which is available for intrusion identification and fault discrimination.First,the framework projected the data into the feature space based on the model.To improve the identification capacity of anomaly detection,the discriminative features were extracted from the generation model.Second,the anomalies were detected by the classifiers generated from the transformed data.

Anomaly detection is necessary to maintain the stability of power systems.The artificial neural network(ANN)could train offline data and reduce the consumption of online resources.Therefore,ANNs can be applied to power systems to detect faults.Yadav et al.[83]discussed the application of ANNs to power systems for anomaly detection,identification,and classification and compared the performance of the approaches.

On-orbit anomaly detection is a primary problem in satellite management [84].To identify the anomalies for complex satellite telemetry data,a counting method for the telemetry data features,which detected the abnormal data by extracting the changing frequency and extent of data to illustrate the data features,was presented.

To identify and detect the anomalies in the process of chemical plants,an unsupervised approach that combines graph theory(GT)with generative topographic mapping(GTM)was presented in[85].Specifically,GTM offered a policy for calculating the similarity between two samples,whereas GT distinguished normal items from abnormal items by clustering.

Weather data analysis can be implemented by the IoT and big data framework.To extract the features and modes in complicated weather data,a weather sensor anomaly detection algorithm using clustering was explored[86].

To improve the accuracy of satellite telemetry data,Du et al.[87]proposed an anomaly detection model for satellite telemetry data with sequence features to improve their accuracy.First,the telemetry data were steadily separated to obtain stable residual and data trends.Second,the anomaly detection model was designed by fusing the features.

The monitoring and acquisition data in the wind turbine system were imbalanced because of the large amount of data.Therefore,the abnormal data were difficult to accurately discriminate.With the deep neural network(DNN),Chen et al.[88]presented an intelligent anomaly detection method,which solved the class imbalance by classifying the source monitoring data.

The anomaly detection methods with DL are investigated in detail [89].According to the basic assumptions and research methods,anomaly detection methods are classified to distinguish normal behavior from abnormal behavior.After evaluating the performance,the advantages and deficiencies of these methods are also discussed.

Park et al.[90] investigated fault detection for time sequence data and presented an anomaly detection method combined with an autoencoder and LSTM.Normal offline data were employed to train the autoencoder to identify and classify the anomaly.

DL,which is an algorithm driven by neural networks,is rapidly developing.DL models with feature representation are applied to fault detection.However,the misclassification rate was increased when the fault data were limited.To improve the accuracy of anomaly detection,Zhou et al.[91]designed a generation and identification model with a GAN,where global optimization was performed to generate more fault samples and train the identification model.

To improve the quality of drilling data,Yang et al.[92] investigated anomaly detection and proposed an identification and correction method for drilling data.First,to detect all kinds of abnormal data,the local detection algorithm was designed to obtain the reasons for abnormal data.Second,the effectiveknearest algorithm was utilized to correct the abnormal data.

To improve the classification accuracy of the scheme for protecting the power transformer,Raichura et al.[93]designed the extreme gradient boosting framework to distinguish the outside faults and inner anomalies.Moreover,a convolutional neural network (CNN) was employed to classify the faults.A comparison with other machine learning algorithms,such as SVMs,revealed that the proposed method performed better in classification accuracy.

With the development of digitalization,the flow of substation communication networks is increasing.Moreover,abnormal traffic discrimination has been the key to maintaining network security.Yang et al.[94] constructed an abnormal traffic discrimination model for a substation communication network based on the ResNet model,solving the problem of insufficient substation samples.

3.3 Opportunities for Anomaly Detection to Improve Detection Performance

Although some of the previously described challenges in data acquisition and storage are addressed,opportunities to improve the performance of anomaly detection are discussed in existing works and are illustrated in Fig.10.

Figure 10:Opportunities in anomaly detection of data flow

1) Anomaly detection in small samples [73].Ideally,the training process of anomaly detection and fault discrimination models should cover as many kinds of anomalies to be effective.However,the practical samples fail to reach the ideal situations,which reduces the prediction performance for the rare anomaly of the anomaly detection method [95,96].Therefore,anomaly detection in small samples is a future research direction of anomalies.

2) Online anomaly detection [80].Anomalies should be detected in time after occurring and eliminated with relevant techniques.Therefore,to improve the detection accuracy,an online anomaly detection approach is necessary[97,98].Although online anomaly detection methods have been investigated in certain existing works,the detection accuracy and efficiency are too low to meet the requirements of practical application scenarios.Therefore,online anomaly detection is also a future direction of anomaly detection to improve the detection accuracy and efficiency.

3) Mixed trained samples[82].Existing anomaly detection and fault discrimination methods are mostly inherently trained with only normal data.However,it is not possible to obtain samples that include only normal data in practical scenarios.Therefore,the anomaly detection and fault discrimination model should be trained in mixed samples,including normal and anomalous data [99].In addition,the trained samples need to contain labeled and unlabeled data to improve the performance of the model[100,101].Hence,detection under mixed trained samples is also a direction for future anomaly detection.

4 Service Fault-Tolerant Scheduling Based on Communication

4.1 Challenges

Most existing works focus on service fault-tolerant scheduling.However,with the development of a heterogeneous system and an increase in data,certain challenges about service fault-tolerant scheduling algorithms are surveyed,as illustrated in Fig.11.

Figure 11:Challenges in service fault-tolerant scheduling

1) Dynamic scheduling.The dynamic fault-tolerant scheduling of services effectively reduces the delay and energy consumption caused by resource redistribution and improves resource utilization[102].

2) Criticality levels of run-time faults.The criticality levels of run-time faults represent the priority to handle.To enhance the efficiency of service fault-tolerant scheduling,run-time faults with high criticality levels must be addressed first[103].

3) Service scheduling in heterogeneous systems.Heterogeneous systems bring convenience to service scheduling and increase the complexity of the systems.Many complex faults occur in heterogeneous systems,which provides new challenges for fault-tolerant scheduling algorithms[104].

4.2 Goals on Service Fault-Tolerant Scheduling

With the development of big data and 5G,numerous data have been generated according to the requirements of users [105,106].The service schedule,which is a technique that improves resource utilization by scheduling the execution of service,renders the data storage and analysis convenient[107,108].However,potential faults,e.g.,missing data,may occur in service scheduling.Therefore,service fault-tolerant scheduling has a significant role in service scheduling and directly affects the reliability of the network system[109,110].Fig.12 demonstrates the service fault-tolerant scheduling framework.The data center includes multiple hosts,denoted byH={h1,h2,...,hn}.In addition,each host provides many virtual machines(VMs).Letvi,jdenote thej-thVM onhi.As shown,first,the users provide the service requirements,which include a task queue.Second,to satisfy the requirements,the system scheduler is used to schedule the tasks.Due to possible faults,such as the loss of data and anomaly of the scheduler,the fault-tolerant mechanism is added to the system scheduler to eliminate the potential faults.Last,the scheduled tasks are processed in the VMs of the data center.

Figure 12:Framework of service fault-tolerant scheduling

Next,we will summarize existing works on service fault-tolerant scheduling from four optimization goals,i.e.,reducing energy consumption,decreasing service response latency,improving resource utilization and enhancing the reliability of systems,which are shown in Fig.13.

Figure 13:Goals of service fault-tolerant scheduling methods

4.2.1 Reducing Energy Consumption

The service fault-tolerant scheduling methods for decreasing energy consumption are listed in Table 5.

Table 5:Service fault-tolerant scheduling to reduce energy consumption

Currently,executing clustering services will increase the efficiency of scientific workflows(SWf)in cloud servers.Vinay et al.[111]presented a heuristic algorithm based on the earliest finish time of clusters to develop fault-tolerant scheduling in cloud computing (CC).When the service in clusters is unsuccessfully executed,the proposed algorithm will execute again using idle time,decreasing the resource consumption.

The scheduling algorithms in CC focus on high-performance computation and computing costs.However,because of the incomplete scheduling strategies,the execution efficiency of computing tasks is hard to guarantee.Therefore,to build a foundation for constructing an efficient fault-tolerant framework,Pandita et al.[112]evaluated the fault-tolerance performance of the existing scheduling algorithm in CC.

Nair et al.[113] proposed an energy-aware,efficient heuristic scheme to solve fault-tolerant scheduling for real-time tasks in heterogeneous systems.The authors designed a standby-sparing method,where the efficient core was utilized to process the critical tasks and the high-performance core was utilized to process the tasks affected by faults.

Three mode redundancy (TMR) is used to eliminate faults in homogeneous systems with high energy consumption.Yu et al.[114]presented a fault-tolerance scheduling strategy in heterogeneous systems by developing TMR.Specifically,the services without the requirements of fault tolerance were still executed with the traditional TMR.Otherwise,the services were executed with the proposed method,which decreased the energy consumption in systems.

In mobile cloud computing(MCC),mobile devices are usually resource limited.The scheduling strategy must be updated when scheduling resources change.Lee et al.[115]proposed a fault-tolerant scheduling method with checkpoints and a replication mechanism to handle potential faults,which reduced energy consumption.

Chinnathambi et al.[116] presented a Byzantine fault detection algorithm and designed a scheduling and checkpoint optimization algorithm to tolerate and eliminate the Byzantine fault.In addition,the proposed algorithm exponentially decreased the fault-tolerant overhead and effectively allocated the virtual resources.

4.2.2 Decreasing Response Latency of Service

The service fault-tolerant scheduling approaches for reducing the response latency of service are listed in Table 6.

Table 6:Service fault-tolerant scheduling to reduce the response latency of service

With the development of intelligent computing techniques in CC,fault tolerance has become increasingly significant [117].Therefore,a fault-tolerant-oriented service scheduling scheme was presented.The checkpoint strategy was employed to migrate the execution-failed services,which efficiently reduced the service response delay.

Yao et al.[118] presented a fault-tolerance scheduling algorithm based on resubmitting and duplication.First,as the workflow was divided into several subtasks,the deadline was also divided.Afterward,the fault-tolerance strategy with resubmission and duplication of each subtask was allocated.To maximize the idle time,an online adjustment method for the scheduling strategy of unexecuted tasks was also designed.

Considering the failure of computation tasks in CC,Abd Latiff et al.[119]presented a dynamic clustered scheduling method based on the League Championship Algorithm(LCA).This algorithm could monitor the available resources and prevent premature failure of tasks,which decreased their execution latency.

Cao et al.[120]examined how to prolong the lifetime of fault-tolerant,mixed-criticality embedded systems.Since the mixed-integer linear programming was time-consuming when employed in largescale systems,a heuristic algorithm based on the cross-entropy was presented,balancing the running time of tasks and lifetime of systems.

Applications in the Industrial Internet of Things (IIoT) usually require high reliability and low access latency.Ahrar et al.[121] explored the service schedule in IIoT and proposed a multipath scheduling algorithm that considered the potential faults in the paths.This algorithm optimized the reliability and fault tolerance of IIoT by analysis of the experiments in the simulated heterogeneous scenarios.

4.2.3 Improving Resource Utilization

The service fault-tolerant scheduling methods to improve resource utilization are listed in Table 7.

Table 7:Service fault-tolerant scheduling to improve resource utilization

Fault tolerance is widely employed in CC.Soniya et al.[122] investigated fault-tolerant service scheduling in CC.First,a dynamic resource allocation method with a fault-tolerant mechanism,which enhanced resource utilization,was presented.Second,combined with the virtual machine scheduling approach,a dynamic scheduling scheme with a fault-tolerant mechanism for real-time services in CC was proposed.

Zhu et al.[123]examined the fault-tolerant mechanism in a real-time workflow.First,based on the conducted workflow model,which tolerates real-time faults,the service allocation and communication approach is presented,improving the computation resource utilization by fully using idle resources.

In cloud systems,fault tolerance has been the primary requirement for the execution of computation tasks.Therefore,Ding et al.[124] worked on the fault tolerance of the task workflow and presented an offline elastic scheduling algorithm with fault tolerance to dynamically regulate resource allocation and increase resource utilization in cloud systems.

Yan et al.[125]presented a dynamic elastic fault-tolerant scheduling method for cloud services,realizing fault tolerance and increasing resource utilization.First,a fault-tolerant task allocation method was designed.Considering the uncertainty,two task scheduling models with fault tolerance were interchangeably employed.Second,the overlapping mechanism was adopted to improve resource utilization in the cloud.

Marahatta et al.[126] proposed a dynamic task allocation and scheduling scheme based on fault tolerance to coordinately optimize energy efficiency and resource utilization.Specifically,the upcoming tasks were classified and then allocated to the appropriate virtual machine for execution.In addition,a flexible resource supply mechanism was also developed to optimize energy efficiency.

Chen et al.[127]designed a fault-tolerant framework to address faults according to the criticality levels of run-time faults.To avoid the overallocation of computational resources,an overrun handling protocol was also proposed,which ensured fault recovery.In addition,an offline scheduling analysis technique was adopted to evaluate the proposed approach.

4.2.4 Enhancing the Reliability of Systems

The service fault-tolerant scheduling methods to improve the reliability of systems are listed in Table 8.

Table 8:Service fault-tolerant scheduling methods to improve the reliability of systems

Table 8 (continued)Target Reference Method Advantage Enhancing reliability of systems Improve the reliability of systems by the number of tolerant permanent faults[133] Adaptive fault-tolerant scheduling algorithm[132] Heuristic fault-tolerant task scheduling algorithm Select the most appropriate fault-tolerant technique to address the faults[134] Service fault-tolerant scheduling method in scientific workflows Improve the reliability of systems[135] Task clustering algorithm with fault tolerance Synthetically consider the execution latency and cost of workflows

Zhang et al.[128]investigated fault tolerance in a power management system and designed a faulttolerant scheduling method.Specifically,the online method calculates the running rate of the system and operates the deceleration mechanism according to the actual workload.

Grid computing serves computation-sensitive and long-operating applications.To guarantee the quality of service(QoS),these applications need to tolerate potential faults.Based on the ant colony algorithm,Idris et al.[129] presented a fault-tolerant task scheduling algorithm in grid computing,ensuring that the tasks can be normally executed when faults occur.

In real-time systems and embedded systems,criticality-mixed task scheduling is usually considered.Zhou et al.[130]designed a fault-tolerance method in criticality-mixed systems to improve the security of tasks with different critical levels.

Wang et al.[131]proposed a fault trace approach for the power grid in a big data platform.First,Spark was used to handle the faults.Second,the fault was analyzed by data mining.Last,according to the decision tree,the reasons for faults were inferred.In addition,this method could take full advantage of the monitoring data and infer the fault reasons.

The complexity of heterogeneous systems increases the possibility of faults in the systems,resulting in the growing significance of efficient task scheduling strategies with fault tolerance.Hence,Liu et al.[132]proposed a heuristic,fault-tolerant task scheduling algorithm to improve the reliability of systems by dynamically computing the number of tolerating permanent faults.

Alarifi et al.[133]presented a fault-tolerant scheduling algorithm to allocate service requests to the appropriate devices in the IoT.When the service execution failed,the proposed algorithm selected the most appropriate fault-tolerant technique from replication,resubmission,and checkpoint techniques,increasing the reliability of the IoT.

Fault tolerance is the primary technique in CC.When the workflow experiences faults,the applications provide a protection mechanism to ensure the safety of the systems.Talwani et al.[134]compared the techniques that optimized the fault tolerance in scientific workflows according to realworld workflows.

Task clustering can enhance the computational granularity of the scientific workflow to execute tasks with distributed computing resources.Khaldi et al.[135]proposed a task clustering algorithm with fault tolerance.The algorithm considered the constraints of workflow execution time and execution cost to develop the performance of the workflow.

4.3 Opportunities for Service Fault-Tolerant Scheduling to Enhance Scheduling Efficiency

Although some of the previously mentioned challenges in service fault-tolerant scheduling are addressed,opportunities to enhance the efficiency of service scheduling remain,as illustrated in Fig.14.

Figure 14:Opportunities in service fault-tolerant scheduling

1) Elastic resource scheduling.The rich resources in data centers are used to process a large amount of data,which consumes resources and power and negatively impacts the natural environment [136].Therefore,the green processing of data becomes increasingly significant to protecting the environment.For each service,the resource requirements of services should be analyzed in detail.Based on the consideration of fault tolerance,the resources should be elastically allocated for services in the data center to save energy [137].Therefore,elastic resource scheduling is a future direction of service fault-tolerant scheduling.

2) Prediction for errors in tasks.Frequent task migration causes additional energy consumption and influences the system performance[138].Therefore,potential faults can be predicted based on the characteristics of tasks executed in the long term.Resources should be provided to address the faults,reducing the energy consumption [139].Thereby,prediction for errors in tasks is also a future research direction of service fault-tolerant scheduling.

3) Security in service fault-tolerant scheduling.Service fault-tolerant scheduling requires the detection of multiple attributes of services,possibly including the privacy of the user,which leads to the significance of security [140].A service fault-tolerant scheduling method with high performance should address the relationship of security and fault tolerance[141].Hence,security in service fault-tolerant scheduling is also a future research direction.

5 Conclusion

In this work,a comprehensive and detailed survey of intelligent identification based on power big data is presented.First,the data acquisition and storage process are investigated.Second,the anomaly characteristics and fault discrimination techniques of a massive amount of data are analyzed.Furthermore,the problem of fault tracing for dispatching operations during communication is discussed.This survey is presented to promote the further progress of intelligent identification based on power big data.However,numerous research issues in this area are still open and need further efforts,including optimizing the distributed intelligent identification process and balancing security protection and performance.

Funding Statement:The authors received no specific funding for this study.

Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.