大数据驱动的公共卫生风险监测:理论框架与实践反思

2022-07-02 06:22徐文东时如义余孝东
信息资源管理学报 2022年3期
关键词:中国矿业大学徐州外国

王 超 徐文东 时如义 余孝东

(1.中国矿业大学公共管理学院,徐州,221116; 2.中国矿业大学外国语言文化学院,徐州,221116)

1 Introduction

Since the beginning of the 21st century,under the background of globalization and deepeningmodernization,the outbreaks and spread of epidemics have become more rapid and extensive than ever before due to the increasing concentration and mobility of people,the continuous deterioration of the ecological environment,closer trade,as well as other instable social and political situations.Whether in regional response or in global action it is urgent to explore and adopt new scientific and technological means to achieve accurate risk surveillance and improve the ability of decision-making departments in risk research and emergency response.At the same time,the rise of big data technology is expected to help deal with the inefficiency of traditional public health risk surveillance,especially the issues related to access to medical services and devices when the hospitals are overwhelmed.More case studies,computing models,empirical studies,theoretical articles,mixedmethod approaches,and advanced methodologies are inurgent need and of great significance in understanding,explaining,predicting,and managing pandemic crises such as COVID-19[1].It is also expected to fundamentally reverse the passive situation,making it possible to build a comprehensive big data-driven public health risk surveillance,prevention,and control system.

Big data surveillance has developed into a research field with great potential and spawned a series of emerging interdisciplinary fields,such as big data computational epidemiology[2],digital epidemiology[3],precision public health[4],etc.These areas began to adopt multi-source data and mathematical models to understand and analyze the spatiotemporal spread of epidemics,especially to understand the transmission mechanism in complex networks.Real-time information from website search,social media,Internet of Things and other channels provides sufficient and powerful data to support these fields.Therefore,big data has a broad application prospect in the field of public health,especially in epidemic surveillance.However,the real practices have encountered various problems.The coverage and operational efficiency of the global public health risk surveillance system are far from enough.As a result,the governance effectiveness is still not satisfactory[5].In 2008,Google Flu Trends (GFT),a “highly expected”influenza prediction system launched by Google,seriously underestimated the outbreak of H1N1 influenza in the following year and failed to help stop its global spread in time[6].At the beginning of the Ebola epidemic in 2014,some media hyped that global disease surveillance system named“Health Map”had the capability to predict the epidemic trend from news and social network data,especially by adopting big data technology to track the infected cases from mobile data.However,it did not play the expected role in practice[7].Before the outbreak of COVID-19,the World Health Organization(WHO)announced that it had established a global surveillance system to monitor and analyze the epidemic in real time,with its goals including monitoring the global spread of the epidemic,providing early epidemiological information to support national,regional,and global risk assessment,and timely detecting new cases in countries without the virus in the past[8].But in the following months,the growing epidemic basically declared the failure of the global surveillance system.It can be seen that although big data has achieved remarkable practical results in fields such as marketing and earth science,as some scholars have pointed out,public health governance still relies on traditional surveillance systems,waiting for new results of the big data revolution[9],especially the current major challenges of how to employ various big data analysis methods to deal with the global COVID-19 pandemic[10].So,what is the theoretical connotation of big data-driven public health risk surveillance?What is the progress of its practice globally?What important development enlightenment can be acquired from the dilemma in practice?This paper is in response to the above questions,aiming at constructing a theoretical framework,and promoting the generation and development of a big data-driven public health risk surveillance model.

It should be noted that in order to clearly illustrate the current research situation of big datadriven public health risk surveillance,this paper follows a dual-line mode of“theory and practice”,conducting research according to the basic idea of “progress-problem-enlightenment”,among which “progress”refers to the research progress and practical progress in sorting out big data-driven public health risk surveillance, “problem”means the multiple challenges faced by big data-driven public health risk surveillance,and “enlightenment”means extracting new solutions from the current focus and direction of the development of big data-driven public health risk surveillance on the basis of progress and problems.

2 Literature Review

2.1 Core concepts

As an emerging research field,big data-driven public health risk surveillance is a product of interdisciplinary integration including public health science,risk science and data science.The connotation of big data-driven public health risk surveillance has the following three essential core concepts.The first one is“big data-driven”.According to the explanation of Techopedia[11],big datadriven means that management decisions and processes are determined by data or made according to the results of big data analysis,rather than relying on intuition or personal experience.This reflects that big data-driven aims to overcome the shortcomings of traditional empirical decision-making,especially in decision-making and governance action in emergency.The second one is “public health risk”.According to the Emergency Response Law of the People's Republic of China[12],public health emergency refers to “sudden occurrence of major infectious diseases,mass unexplained diseases,major food and occupational poisoning and other events seriously affecting public health that cause or may cause serious damage to public health”.Therefore,public health risk can be defined as the occurrence of public health events and the possibility of causing public health damage and socio-economic losses.The third one is “surveillance”,which is usually described as the process of providing actionable information.Among these three concepts, “epidemic surveillance”has a longer history,which can be traced back to at least the 19th century.Alex Langmuir,the first chief epidemiologist at the US Infectious Disease Center(now the US Centers for Disease Control and Prevention),held the earliest version of its definition: “By systematically collecting,collating and evaluating morbidity and mortality reports and other relevant data,timely and regular information is disseminated to people who need to know,with the incidence rate and incidence rate being continuously followed”[13].This definition emphasizes the continuity of the surveillance process,the support of multi-source data,and the surveillance results of regular and timely feedback.

Based on the above concepts and the definition of the Centers for Disease Control and Prevention(CDC),this paper argues that big data-driven public health risk surveillance,based on systematic,comprehensive,and continuous data collection,management,and analysis,through timely and accurate dissemination and sharing of public health information to relevant interest groups,aims at assisting decision-makers better conduct risk situation perception and judgment.The shared goal of big data surveillance and traditional surveillance is to systematically collect,interpret,and disseminate data to support public health decision-making and action.However,big data surveillance stresses more on contextualization,diversity,real-time,comprehensiveness,and accuracy.In terms of supporting major public health risk prevention and control,big data surveillance can assist decision-making and problem-addressing under complex circumstances in which traditional surveillance cannot take effect,such as quick epidemic tracing,large-scale vaccination screening,identification of high-risk groups,and prediction of epidemic trend.

2.2 Research progress

Under the context of the continuing spread of the global pandemic and the accelerating development of digitalization,the research on big data-driven public health risk surveillance has become a hot issue for scholars from all over the world.Haafza et al.[14]analyzed the literature on the application of big data to COVID-19 surveillance through the method of systematic literature review(SLR).The analysis results showed that the current problems mainly included epidemic research and judgment,risk identification,risk assessment,risk decision-making,and drug big data application.Specifically,it includes identification of early infection cases[15],close contact screening[16],online public opinion surveillance,virus host analysis,rapid visualization of epidemic information,balancing and managing the supply and demand of material resources[16],pandemic forecast evaluation[17],etc.These roles and functions better support decision makers to perceive situations and make judgments[18].

In terms of specific applications,many scholars have employed multi-source data to verify the value of big data in public health analysis and epidemic prediction.For example,Chen et al.[19]proposed a disease dynamics prediction algorithm based on mobile big data and proved that the algorithm could effectively improve the accuracy of Ebola epidemic prediction.Zhang et al.[20]established a pertussis epidemic early warning model using data such as climate,network queries,and school calendar records,and found that applying Internet data could predict pertussis epidemics by combining climate data.Rocklöv et al.[21]applied big data(air passenger traffic,twitter data,and vector capacity estimates for Aedes albopictus)to the 2017 European chikungunya outbreak and found that big data analysis was able to determine the features of virus spreading.Nduwayezu et al.[22]proposed a social network system(SNS)analysis that found a correlation of 0.75 between Nigerian malaria cases recorded by Twitter and the average rainfall in Nigeria.Hassan et al.[23]adopted Twitter data to extract behavioral patterns from social networks and monitor influenza outbreaks in the United States,with health center data collected to track actual clinical conditions,confirming that influenza-related traffic on social media is strongly correlated with actual influenza outbreaks.Mavragani et al.[24]studied the correlation between Google Trends data and official health data from CDC.They found that big data surveillance can accurately measure people’s behavioral changes to the disease.Moreover,big data collected from social networks and other related unconventional data streams allow us to reconstruct the epidemiological story of the early spread of COVID-19[25].Doganer et al.[26]collected the search words related to coronavirus and their relative search volume(RSV)from 11 different countries affected by the COVID-19 outbreak from Google Trends.The research shows that Google Trends data can be used to build the forecast model for case numbers in the COVID-19 outbreak.These research results further affirm the feasibility and value of big data in epidemiological surveillance.

Big data can also be combined with other technologies to improve technical empowerment in various fields,which has become a research focus in recent years.For example,data mining,machine learning,artificial intelligence,geographic information systems (GIS)[27],and statistical modeling techniques can effectively deal with slow system response caused by delayed reporting in traditional surveillance[28].However,the existing research focuses on the specific risk issues,and emphasizes the empirical research.There is a lack of systematic review and summary.Especially when COVID-19 is still threatening,the actual effect of big data-driven public health risk surveillance needs a clear response.Therefore,the paper systematically sorts out the practical progress of big data application in the field of public health risk surveillance since the 20th century,and constructs a theoretical model of big data-driven public health risk surveillance from the perspective of system interaction,which provides a new theoretical framework and empirical basis for academic studies on the application of big data.

3 Theoretical Framework

Although globally there are many typical practices of applying big data in public health risk surveillance,it is still necessary to establish a more effective theoretical framework for the research of big data in dealing with major public health events such as COVID-19.Therefore,this paper attempts to propose an integrated analysis framework from the perspective of system structure,based on which the big data-driven public health risk surveillance framework contains two core parts:big data governance(big data governance with big data resources as the core and risk scenarios as the field)and risk surveillance (surveillance activities with risk scenarios as the core and big data as environmental variables and endogenous factors).Big data governance refers to managing a large amount of data in the organizational environment,adopting different analysis tools to make use of it in organizational decision-making[29],emphasizing big data governance behavior as well as process,and focusing more on standards,systems,and technical elements.Big data governance is a governance process aiming at the “availability and utilization”of public health big data,while risk surveillance is a surveillance activity aiming at“comprehensibility and controllability”of public health risks.Big data governance is the basis and premise of risk surveillance,because only through governance can the big data be available for risk surveillance.In the context of public health risk,these two elements are in the relationship between means and objectives forming a dynamic closed-loop system with continuous feedback through data flow.The whole framework generally involves the following five core links(see Figure 1).

The first is scenario analysis.Scenario analysis is the logical starting point of risk surveillance.It aims to capture the complex factors that may affect risk change in population,economy,climate,policy,and other aspects,and involves the expression of the demands of relevant stakeholders[30].The complexity of public health risks is not only the object of big data-driven risk surveillance,but also in the scope of big data governance.Big data-driven risk surveillance is a series of actions based on specific situations,involving the combination of situations in multiple spatial and temporal dimensions.The intersection of spatial and temporal dimensions constitutes the complexity,dynamics and uncertainty of public health risk scenarios.Its data is massive,multi-source,heterogeneous,and dynamically updated.Generally speaking,public health risk surveillance driven by big data can be divided into index based epidemic surveillance under normal scenario and public health emergency surveillance under abnormal scenarios,both of which need scenario analysis and scenario linking.However,these two surveillance methods are often confluent at the occurrence of public health emergencies.

The second is open sharing,which aims to guarantee the accessibility of data from the source end.Public health big data not only exists in the field of health management,but also involves various related aspects such as personal travel trajectories and social networks.Therefore,it is necessary to ensure data accessibility from the source end.It not only realizes the necessary data opening to the public,but also requires cross-department,cross-level and cross-domain data sharing to meet the basic needs of risk surveillance,such as data size,real-time acquisition,multi-source heterogeneity,authenticity and reliability.

The third is standardized management,which aims to solve the problems in data availability,including the sufficiency of available data and the universality of data governance participants.Standardized management is an important goal of big data governance,the purpose is to increase the value of data and minimize data-related costs and risks[31].The key to the availability of public health big data is data standardization,which is also the core of big data governance.Data standardization should focus on the core link of data life cycle,stressing the standard design of data collection,data quality control,data privacy protection,technology utilization and so on.Big data is data that exceeds the processing capacity of conventional database systems.Big data standard is related to the quality of big data.It is closely related to the diversity of data types,data sources and application domains[32].

The fourth is situation perception.This part aims to solve the “what”and“how”problems of public health risks.Big data-driven risk surveillance integrates a variety of technical models,which can establish epidemic surveillance models to process and visualize multi-source data,so as to transform complex real-time risk situations into risk decision situations,and help decision makers perceive the overall situation of risks in a more specific,timely and intuitive way.

The fifth is risk prediction.This is not only the essential task of big data-driven risk surveillance,but also the premise and basis of risk early warning decision-making,aiming to solve the problem of“how to deal with risks”.On the basis of epidemic surveillance models and algorithms,the risk evolution trend,epidemic development trend and possible results are analyzed and predicted by integrating various dynamic factors and combining with historical and real-time data,so as to help decision makers carry out early warning of risks and determine and arrange the priorities in crisis response actions.Moreover,different data types bear distinctive information and characteristics that dominate different forecasting tasks and require different analysis technologies and forecasting models[33].

4 Practical Progress

At present,big data-driven public health risk surveillance develops in the application and exploration stage.Driven by the demand for prevention and control of major epidemics such as COVID-19,H1N1 influenza A,Ebola virus (EBOLA)and Middle East Respiratory Syndrome(MERS),based on the advancement of big data technology,the practice and exploration of big data-driven risk surveillance around the world are generally active and positive with the key progress made in important aspects.

4.1 The expansion of data sources and participants

The traditional epidemic surveillance usually adopts the data in medical and health systems,such as electronic health records,laboratory test data,patient’s telephone,insurance claim records,hospitalization certificates and other public health databases.However,these data resources face problems such as obvious lag and high collection costs,which makes it difficult to meet the needs of accurate surveillance.In contrast,the data flows from the Internet,Internet of Things,social media and other channels are more dynamic and real-time,which makes it possible to sense risks in time,track epidemic situations,and deeply explore crowd behavior.Big data is first of all based on the research of association.The collection of data often goes beyond the existing formal channels,which means a behavior of collecting data without a specific management purpose[34].Data based on the Internet and social media are often not collected and used through official channels.However,these data resources provide more information about health status and behavior,including the retrieval experience,contact histories and travel trajectories of individuals,which are key elements for understanding epidemic transmission and prediction modeling[35].In addition,data from animal and environmental health systems can also be used to identify epidemiological risks.

In recent years,the rise of participatory surveillance and collaborative governance initiatives have not only profoundly changed the risk governance structure,but also promoted the participation of social subjects and the establishment of public-private partnerships,which not only improves the speed,type,accuracy,and scale of data collection,but also helps to improve the efficiency of public health risk surveillance and reduce social and economic costs.On the one hand,the public can voluntarily report their disease symptoms and living conditions through short messages,microblogs,or other social networks.On the other hand,this information can remind government departments to pay attention to situations such as disease or epidemic spread,road closures,power failures and material shortages in affected areas.For example,in response to the Ebola,Airtel,Sierra Leone’s largest mobile service provider,has been able to work with IBM to allow the local public to send messages free of charge to the government about the Ebola epidemic.This initiative creates a thermodynamic chart linking new epidemic information to geographical locations[36].Social organizations apply big data analysis to the process of epidemic relief by participating in map drawing,crowdsourcing translation,rapid material distribution and social media communication.Various types of enterprises rely on their own professional knowledge and technology platforms to facilitate the collection and analysis of big data.More than 150 companies in the world participated in response to the Ebola epidemic,20%of which contributed expert skills,medical services or rescue capabilities[37].

4.2 Optimization of risk surveillance model and prediction algorithm

Epidemiological model serves as an effective theoretical method in understanding epidemic transmission mechanisms,predicting transmission trends,and evaluating different intervention strategies.It depends on multiple data sources such as human interaction behavior,clinical surveillance,and Internet data,as well as environmental or biological factors that can alter pathogen dynamics[38].In the past 15 years,by integrating large-scale data sets and simulating population dynamics,the accuracy of using mathematical and computational models to simulate the real world in epidemiology has been significantly improved[39].In particular,the application of big data technology improves the accuracy of risk surveillance models and the efficiency of prediction algorithms.

Big data surveillance integrates indicator-based surveillance and event-based surveillance.Indicator-based surveillance refers to structured data collected by routine surveillance systems under normal situations,such as health records,medical records,laboratory testing records,etc.,which can detect the possible epidemic risk early according to relevant indicators.Event-based surveillance mainly refers to unstructured data collected from formal and informal sources under abnormal situations or after emergencies.It is the behavior of continuous surveillance for certain real events after the event,such as predicting the speed of the epidemic,the mode of transmission,the size of the affected population,and discovering people’s possible improper decisions during the epidemic.The big data surveillance model can also include more variables such as pathogens,crowd behavior and government intervention (quarantine means,contact isolation,use of masks,etc.)to improve the accuracy of epidemic prediction.

So far,several epidemic surveillance models and algorithms for integrating big data technologies have been proposed.The first is based on statistical mathematical models,such as the spatiotemporal model,regression analysis model,hidden Markov model,time series autoregressive model,and so on.The second is state space models based on dynamic systems,such as the complex network model,agent-based simulation,stochastic model Markov chain,continuous deterministic SIR model,and so on.The third model is based on machine learning,such as networkbased data mining,surveillance networks,and so on[40].The embedding of social networks and the use of big data training models promote the continuous optimization of epidemic surveillance models and algorithms.The close combination of surveillance models and social network analysis is helpful to form a new understanding of diffusion process through network structure characteristics,such as network density,diameter,and clustering coefficient.At the same time,the big data training model based on real-time data changes the passivity and inefficiency of traditional surveillance,playing a greater surveillance role even with limited data.

4.3 Development of risk surveillance system

With the gradual expansion of the cooperation scopes and the continuous innovation of technical means,the public health risk surveillance system has been evolving into two trends.The first is the change in the surveillance scope from regional key surveillance to global system surveillance.The second is the transformation of surveillance means from single technology to complex technology integration and iteration.

From the perspective of the surveillance scope,early risk surveillance is mainly about data collection and analysis of individuals or institutions in big cities.In the second half of the 19th century,the dissemination of global surveillance information was usually completed by the weekly health report of major diseases.In the United States,health weekly began in Washington,D.C.in 1886,mainly on information about disease incidence rate and mortality in major city of the United States and some other countries[41].In 1952,the World Health Organization(WHO)established the Global Influenza Surveillance and Response System (GISRS),which greatly promoted the formation of the global public health cooperation network and the expansion of the surveillance scope through effective collaboration and data sharing.Entering the 21st century,the global health risk surveillance system has received unprecedented attention and development due to the continuous emergence of new and recurrent infectious diseases and the threat of bioterrorism.“Healthmap”is a publicly available public health information system launched in September 2006 with the support of Google Earth.In 2014,the system found news reports of abnormal fever in Guinea on March 14th,9 days before the official announcement of Ebola epidemic information.At the same time,the surveillance method based on Internet retrieval index has also been taken seriously.The“Google Flu Trends”released by Google in 2008 uses keyword tracking technology to collect a large number of relevant data to find the outbreak of influenza[42].During the novel coronavirus pneumonia outbreak in 2020,the visual map of the Center for Systems Science and Engineering(CSSE)of Johns Hopkins University showed the global real-time distribution of the epidemic.The system integrates multiple authoritative and reliable data sources and has become an important reference channel for governments,mainstream media,and public health personnel in many countries.

From the perspective of surveillance methods,newly emerged technologies have provided more possibilities for big data surveillance systems,realizing iterative development in global public health risk surveillance systems.In the second half of the 20th century,an active new field in global public health risk surveillance was the combination of surveillance systems with geographic information systems(GIS)and global positioning systems (GPS),leading to revolutionary changes in the surveillance system that not only enabled people to more accurately compile epidemiological maps,but also helped decision makers quickly assess risks and formulate prevention and control measures through visual surveillance results.In the 21st century,the combination of artificial intelligence(AI)and geographic information systems(GIS)has created geographic artificial intelligence(Geo AI).The value of this technology in the field of public health is emerging[43].In 2018,the Getulio Vargas Foundation (FGV),a famous think tank in Brazil,studied how blockchain technology can provide an open,transparent,mutual trust and sharing,real and traceable surveillance scheme for epidemic prevention and control[44].In the same year,Kangbai,a scholar on infectious diseases at the University of Munich,Germany,proposed a set of blockchain systems for the surveillance of the Ebola epidemic[45].In the future,it is expected to develop a hybrid surveillance system under the integration of emerging technologies,that is,combining traditional surveillance data with data from Internet search,social media and crowdsourcing to track,predict and prevent global epidemics through multi-source big data.

4.4 Global expansion of public and private cooperation network

Based on the demand for knowledge,experience,and interaction between public and private organizations,the action guidelines and legal frameworks of disaster risk management provide an institutional guarantee for the public-private partnership,and thus form a close cooperation network.For example,one of the important initiatives of the Sendai Framework for Disaster Risk Reduction (2015-2030),signed by 187 UN member states in 2015,is to promote multi-agent collaboration.In addition to formulating disaster reduction policies and plans,government should also create cooperation opportunities for public and private sectors,social organizations,and individuals to participate in risk surveillance and response.

In the field of public health surveillance,WHO contributes to regional and global data sharing and risk surveillance through the establishment of an international cooperation network,including the United Nations with its organizations,International Red Cross and other non-governmental organizations[46].Among them,Global Outbreak Alert and Response Network (GOARN)is a cooperative network established by WHO in 2000 to coordinate global epidemic response.The network contains more than 250 technical institutions around the world and provides personnel and other resources to the impacted countries to deal with public health emergencies.“Go.Data”,an epidemic investigation tool developed by GOARN partners,collects data in epidemic and public health emergencies.It is promoted in multinational organizations by WHO and other organizations,including the European Center for Disease Prevention and Control(ECDC),Medecins Sans Frontieres (MSF),Public Health Agency of Canada (PHAC),Centers for Disease Control and Prevention (CDC),etc.In addition,other important surveillance networks also operate with the participation of a large number of partners.For example, “Healthmap”,a global epidemic surveillance network,is supported by partners including the National Institutes of Health(NIH)and CDC.GISRS partners include national and regional epidemiological and regulatory agencies,academic research institutions,influenza vaccine manufacturers,and other stakeholders.It also promoted the formation of the Global Initiative on Sharing All Influenza Data(GISAID)in 2008,advocating the combination of Gene Sequence Data(GSD)with other clinical,virological,and epidemiological data on the basis of trust-based cooperation networks[47].The Humanitarian Practice Network (HPN),whose members are governments,NGOs,and institutions from more than 130 countries and regions,aims to improve the performance of humanitarian action by promoting individual and institutional learning.

5 Practical Dilemma

Although big data-driven public health risk surveillance shows imaginative prospects and is making positive and encouraging progress in practice,as a new practice mode,the development of big data-driven public health risk surveillance still faces many constraints.The dilemmas of big data-driven public health risk surveillance are summarized in Figure 2.

The first challenge comes from the complexity of public health risks,which reflects the conflict between the high uncertainty of public health risks and the limited human risk cognition.The second problem lies in public health big data effectiveness,which mainly involves data quality and standards.The third one is in public health big data governance,that is,how to ensure the security of data from institutional and technical aspects.The fourth is big data application,which includes a combination of political,economic,and cultural factors.These dilemmas jointly restrict the effectiveness of big data-driven public health risk surveillance in the following five specific aspects.

5.1 The complexity of big data surveillance aggravated by the particularity of public health risks

In terms of fully making use of big data,public health risk surveillance still lags behind,partly because of the particularity of public health risks.

The first problem is the uncertainty of risks.At present,there is no sufficient knowledge and research about the global distribution of most infectious diseases.Only 2%of the disease maps have been fully accomplished[48].Novel coronavirus pneumonia is not fully recognized by people according to CDC.The second is the risk’s liquidity and cross-border spreading.Being different from earthquakes,floods,and other natural disasters,it shows obvious cross-region characteristics.The public health risk is spread among people,so it has strong cross-domain spreading nature.On the one hand,human behavior,social contact networks and epidemics are closely intertwined,which makes the epidemic transmission dynamics in large-scale population very complex.On the other hand,affected by regional factors such as national systems,political interests,geography and culture,collaborative surveillance is more difficult,making it easy to trigger a series of conflicts beyond public health risks.

The third is the systematic relevance of risks.Major public health risks will affect production procedures,the pattern of industrial chains and even the global governance order,causing a series of political,economic,and social problems.Therefore,on the one hand,it is necessary to make progress in the research of public health risk theory in pathology,epidemiology,risk science,management,and other related disciplines.On the other hand,it upgrades the difficulties in making public health risk data accessible,available and shared,which increases the complexity and difficulty of big data surveillance and requires decision makers to find a rational balance.

5.2 Accurate surveillance restricted by the quality and analysis technology of public health big data

From the technical point of view,the basic premise of the full play of the big data surveillance effect is the massive data with enough value and the prediction model algorithm close to real practice.At present,both of them are facing certain challenges.

The first challenge is the spotty quality of public health big data.In reality,when policy makers interact with public health big data,they are prone to encounter many obstacles related to data volume,diversity,real-time and accuracy.One of the important reasons is that most of the big data are not generated by effective and reliable equipment for scientific analysis[49],which also makes it difficult for massive big data to be quickly integrated and effectively utilized.In addition,due to some human factors,the network is full of various false reports,wrong information and biased ideas,together with data fracture,data missing and other phenomena,which pose a threat to the value of big data surveillance.For example,in the prevention and control of the COVID-19 pandemic,it is difficult for the epidemic surveillance system of the Korea Center for Disease Control(KCDC)to take immediate action on infectious diseases due to human-caused under-reporting and delayed reporting[50].

The second challenge is the lack of accuracy of existing epidemic surveillance models.The main purpose of big data surveillance is to use technical models to transform data into information directly used for decision-making.However,in reality,there is insufficient evidence available in surveillance models to realize early warning and prevention of epidemic situations.On the one hand,since there are no fixed algorithms and models for big data analysis technology,it is necessary to develop algorithms and models suitable for task requirements under specific risk scenarios[51].Technical challenges posed by parameter estimation and validation also hinder real-time surveillance and prediction of epidemic spread based on data-driven models[52].On the other hand,dynamic changes of individual behaviors,mixed implementation of intervention policies,and cognitive degree of epidemiological characteristics will affect the epidemic situation,which makes it more difficult to achieve epidemic prediction through technical modeling.Google Flu Trends(GFT)in 2009 largely missed the first wave of H1N1 influenza outbreak,mainly due to the over fitting of its algorithm.Because its big data analysis was carried out through correlation rather than causality.Therefore,relying only on correlation analysis might lead to false prediction results.

5.3 Public health big data governance standard insufficiency and privacy protection concern

How to achieve the balance between big data sharing and privacy protection has always been an institutional problem restricting big data governance.This is especially true for the application of big data in the field of public health risk surveillance.

On the one hand,the inadequacy and inconsistency in public health data standards lead to the necessity of using multiple surveillance systems to improve the prediction effectiveness.There is a lack of interoperability between disease surveillance systems in many countries.Although different systems may provide complementary information on disease activity,due to differences in data sources,standards,and legal norms,the way to integrate these systems remains to be improved[53].

On the other hand,there are threats to both public health big data security and individual privacy safety.In general,the collection and storage of personal information for the control of infectious diseases is usually conducted under the protection of public health and privacy legislation.However,the epidemic surveillance system needs rapid and accurate laboratory diagnosis and continuous tracking of information of affected individuals,which means that an individual’s name,age,address,and relevant medical data are usually reported to the public health department without his or her timely knowledge or approval.

Especially in face of new diseases to which preventive measures might not be effective(such as vaccination),in order to determine the source and harmfulness of the disease,it is more practicable and efficient to gather information and feedback directly from the patients and their close contacts[54].For example,in the Ebola epidemic,the main concerns of mobile healthcare service providers are security and privacy,which makes the penetration of big data difficult to be effectively realized[55].In addition,media supervision,political risk aversion,system security and data ethics also hinder the incorporation of unconfirmed personal data into the big data surveillance system.This involves the institutional arrangement of data sharing and privacy protection in normal and abnormal situations and the trade-off between them.

5.4 Big data surveillance effectiveness restricted by the unbalanced regional economy development

The core competence of big data-driven public health risk surveillance systems is significantly restricted by the shortage of medical resources,the specialization of staff and the vulnerability of infrastructure systems.

First,the difference in regional economic development makes it difficult to establish a crossregion big data surveillance system.Although similar systems have been running for decades in some developed countries,the key surveillance system data in many economically backward countries are still mainly collected manually,not to mention the more advanced digital infrastructure conditions conducive to the value of big data[56].The West Africa Ebola epidemic in 2014 was originated from remote villages in underdeveloped countries.Due to inadequate construction of health infrastructure,the shortage of medical professionals and the poor information penetration technology,the effect of using big data for epidemic surveillance in West Africa failed to meet the international standard of the disease control.

Second,the shortage of medical emergency supplies restricts emergency response.Even if the big data surveillance system can detect the epidemic at an early stage,it cannot be ignored that the shortage of medical materials often becomes an important factor restricting the effect of prevention and control.The Ebola crisis has exposed the differences of health system vulnerability caused by the unbalanced global development.Similarly,in the novel coronavirus pneumonia outbreak,the quick detection competence is the key to efficiently studying,judging and controlling the epidemic.However,due to the shortage of medical materials in some underdeveloped areas,it is difficult to achieve this goal.

5.5 Real-time surveillance difficulties resulted from insufficient risk awareness and delayed response

As the Ebola epidemic shows,the economic and political impact of an epidemic outbreak can be catastrophic.Other insecurities,such as political conflict and poverty,can exacerbate the situation[57].Similarly,big data-driven public health risk surveillance is not only a technical application process,but also involving a series of key issues such as organizing,decision making,action taking and so on.

On the one hand,weak awareness and negligence in risk lead to the lack of ability in early warning and rapid response.Although more and more data and information may help reduce the uncertainty of risks,it does not necessarily reduce the fuzziness of decision-making[58].This is closely related to the risk awareness and data control ability of decision makers.In the Ebola epidemic,local,national,and international health officials initially believed that the epidemic,as in the past,would be a static and controllable local disease event.Therefore,the prevention and control measures were not implemented until the number of Ebola cases in Sierra Leone surged and the death toll doubled in May.At the same time,officials in charge of supervising the economic and financial departments within the government hindered public health prevention and control actions because they were worried about the impact of the border closure on Sierra Leone’s economy[59].

On the other hand,the impact of collective slowness caused by political conflicts on the global public health system cannot be ignored,which directly restricts the effectiveness of big data surveillance.Setting priorities and allocating resources to mitigate the impact of conflicts and disasters is not only a technical challenge,but also a serious political challenge.To some extent,both the Ebola epidemic and the novel coronavirus pneumonia outbreak have exposed the lack of political commitment of countries to global public health security,which highlights the impact of political factors on epidemic prevention and control.

6 Practical Reflection

History has proved that epidemics and other risks will not disappear.Adapting to risks and learning to coexist with them have become the normal operation of human society.The progress of big data-driven public health risk surveillance practice and the difficulties it faces show that as a new practice mode,big data-driven public health risk surveillance is in the process of open development.It is necessary to make continuous efforts in the following aspects to promote the formation of this model which is expected to play an effective role in the modernization of the public health governance system and capacity.

6.1 Expanding risk knowledge and exploring big data-driven theoretical paradigm of public health risk

Since the theory of risk society was put forward more than 30 years ago,the connotation of risk has become more complex and profound.The boundary of risk itself is gradually blurred,and it is more about the coupling risks and their interweaving influences.Furthermore,the crisis under the framework of the risk society is global,local and individual at the same time,which makes the global risk become a form of“organized irresponsibility”.Therefore,big data-driven risk surveillance needs expanded and advanced risk knowledge range and theories.

The first concern is to understand the modern connotation of public health risks.More and more risk researchers are deeply aware of the importance of social and cultural factors for understanding the connotation of risks.The novel coronavirus pneumonia is a global public health risk in name,but it covers multiple risk attributes such as the ecological environment,technology application,economic society,and international politics.In particular,its significant cross-border mobility and political and economic relevance go beyond the traditional perception of public health risks.Deepening the theoretical understanding of the connotation of public health risk modernity is of fundamental significance for both meeting the needs and building the models of big data surveillance.

The second focus is the correlation between big data surveillance and traditional surveillance.Big data surveillance is not a substitute for traditional surveillance,but a new upgrade.It is necessary to integrate the two in terms of research assumptions,data sources,model algorithms and system support according to the specific scenarios of public health risks,so as to integrate the traditional relatively static surveillance based on indicators and the real-time dynamic data surveillance based on events.It should be pointed out that traditional surveillance is still applicable in many public health risk situations.It is also necessary to clarify the applicable boundary and conditions between traditional surveillance and big data surveillance.

6.2 Strengthening data sources and building public health big data systems based on cooperation and mutual trust

In the big data era,the disintegration of traditional social capital has prompted people to gradually form a new network relationship based on data flows.As a new type of social capital,big data needs to be trusted before it can be put into action,which not only means to create credible highquality data to make data analysis more effective,but also refers to guarantee the formation of efficient and unified actions in the process of risk governance.

Firstly,a relatively complete public health data system should be established.With the concept of data collaborative governance,government departments should break through data barriers and establish an open and shared public health data platform.It is necessary to broaden the channels of basic information collection,include the relevant information of key groups such as“sentinel group”and wildlife zoologists,and improve the sensitivity of government data platforms to the public health risk situation perception.

Secondly,it is necessary to emphasize data commitment and expand the paths and channels for multiple agents to participate in big data governance.Some international organizations and foreign governments have strengthened the credibility of data sources and increased the supply of multi-source data by means of“data sourcing initiatives”and “promissory d ata”,which has achieved satisfactory practical results.

The third focus is to cultivate public data literacy and data culture under the circumstance of mutual trust.Public participation,information transparency and ethical frameworks need to be built on trust in the use of big data technology to monitor epidemics.Cultivating public data literacy and data culture on the basis of mutual trust to ensure the orderly participation of the public in data governance helps to control the quality of big data from the source.In the future,the wide application of blockchain and other technologies will be expected to be combined with big data technology to jointly solve the problems of data traceability,data sharing,and privacy protection.

6.3 Promoting multi-agent collaboration and establishing a big data-driven public health risk surveillance mechanism

The rise of governance in the 1990s gave birth to the trend of multi-governance.The concepts and models of community disaster reduction,grid governance and collaborative governance have been widely applied in practice.Big data surveillance also needs a multi-agent collaboration mechanism based on the concept of governance.

Firstly,a governance mechanism of public health big data should be established based on coconstruction-co-sharing multiple agents.Big data governance is a long-term and complex system project that includes core elements such as organization,standards,process,technology,and system,as well as key contents such as data standards,data quality and data security.It is necessary to clarify the rights and responsibilities of multiple agents in data governance in terms of laws and policies,standards,and implementation mechanisms,so as to maximize the public value of big data governance through effective collaboration.

Secondly,a big data-driven collaborative mechanism should be built for risk surveillance.It is necessary to improve the multi-channel epidemic surveillance and rapid response system such as direct network reporting and social media sharing.It is also necessary to strengthen the coordination ability of big data across departments,fields,and industries.At the same time,the active participation of social multiple agents should be taken seriously to improve the effectiveness of risk communication.The disease control and epidemic prevention system,emergency response system and material support system should also be integrated to achieve the goal of real-time accurate surveillance and prevention and control of public health risks.

Thirdly,a mechanism for researches in public health key technologies should be established.On the basis of integrating a series of related disciplines,including epidemiology,molecular biology,preventive medicine,risk science,behavioral science,computational social science,management science,and so on,researches are expected to be carried out on basic scientific issues of epidemiology and major infectious diseases as well as research and development of key technologies for disease prevention and control.The transformation from data to evidence and then to decisionmaking will be achieved by strengthening the accurate data collection and the optimization of different epidemic prediction and early warning models,and by dealing with the defects of early warning in big data surveillance.

6.4 Stressing the value of evidence and exploring evidence-based public health decision-making mechanism driven by big data

Due to the absence of effective vaccines and treatments,the prevention and control of epidemics largely depends on comprehensive interventions such as early case surveillance,real-time early warning and quarantine.In highly complex risk situations,the evidence-based decision-making mechanism based on public health big data is required to solve the problem on how to apply the most appropriate prevention and control measures at the most appropriate time before determining the priorities in emergency responses.

The first concern is to widely collect the existing practical evidence to form the decision evidence chain of the risk evolution process.Evidence-based decision-making e mphasizes that any public health risk decision-making should be based on the best practice evidence,which should stress data opening,transparency,and real-time sharing,collect the group decision-making of different subjects and fields,classify and integrate the practical evidence such as clinical manifestations,epidemic transmission law and emergency management,and optimize the diagnosis basis of new diseases.In this way,the decision-making of major epidemic risks is no longer mainly based on the personal experience of decision makers,but through evidence-based thinking and optimal practice evidence from big data.

Secondly,it is necessary to explore the applicable boundary of big data decision-making based on evidence.Big data decision-making cannot completely replace traditional risk decision-making.Based on the practice of public health risk decision-making,it is necessary to further extract,integrate and develop the evidence and evidence chain of the two decision-making methods and make a comparative analysis.Through evidence-based approaches,the applicable boundary between big data decision-making and traditional risk decision-making,as well as how to realize mixed decisionmaking,are clarified.

The third focus should be placed on the comparative evaluation of public health policies.Public health policies and risk intervention measures should be compared and evaluated based on evidence-based concepts and methods.In addition to the horizontal comparative evaluation of the same policy measures(such as quarantine measures)in different cases( public health events),special attention should be paid to the evidence-based comparative evaluation of similar cases,including the horizontal comparative analysis of similar case policies and measures among countries,regions,organizations,and individuals,and the diachronic vertical comparative study of policies and measures in similar cases.The former is conducive to horizontal mutual acquisition and exchange of knowledge,while the latter is conducive to the generation of vertical overall knowledge,so as to serve the optimization of public health policies with the development of evidence-based knowledge.

6.5 Reshaping cooperation concept,integrating and building a global health risk surveillance system

As early as the 1920s,due to the cholera epidemic sweeping Europe which then pushed the world into a new era of“infectious disease diplomacy”,governments realized that the spread of the disease would not stop at the border[60].Obviously,strengthening global cooperative governance and multilateral response to achieve epidemic data sharing is crucial to protect the public from major public health risks.

Firstly,the concept of people-oriented global health risk surveillance should be established.According to Human Security Now issued by the UN Commission for Human Security in 2003,“People-centeredness is important because,irrespective of the threat,what matters is people—not borders,not international relations,not even money and economics”[57].It is necessary to develop a people-centered risk communication strategy that emphasizes mutual trust and cooperation,and promote the formation of a consensus on global public health risk surveillance[61].

Secondly,the balanced development of global risk surveillance systems should be guaranteed.Generally speaking,less developed countries and regions are the first victims of major epidemics due to backward surveillance systems and insufficient medical resources,often endangering the entire public health prevention and control system.It is necessary to continuously expand the global surveillance cooperation network,strengthen humanitarian aid to the countries with backward medical resources and digital infrastructure,so as to enhance the capacity of the global public health risk surveillance system.

Thirdly,a global public health risk surveillance system with big data as the core should be established.The globalization of public health risks has made the international community form a close community of common destiny,which requires different countries to explore the concept and cooperation program of global public health security from a global perspective,so as to avoid isolated governance,stigmatization,discrimination and other political obstacles.At the same time,based on highlighting the establishment of expanded multilateral platforms for public health emergencies,it is also necessary to improve active multi-agent governance and supervision.An integrated global public health risk surveillance system is expected to be established based on unified data commitment,data standards and surveillance principles,in order to jointly promote the progress of global public health risk prevention and control practice.

7 Conclusion

Modern risks have complex characteristics such as endogenous,systematic,and cascade.They not only bring uncertainty and huge disasters to social development,but also inject new power into the operation of social systems and promote the development of social governance systems and mechanisms with the times.The 2003 SARS epidemic has left a lot of valuable experience for China to fight new epidemics today,such as “Xiaotangshan mode”,information transparency,and whole-society cooperation.COVID-19 will bring new governance experience to China and the world in building a more resilient public health risk prevention and control system.In the era of big data,this epidemic is also a realistic test of the effect of big data monitoring.As a response to global public health challenges,this study has explained the theoretical framework and practical reflection of big data-driven public security risk monitoring.As a national governance technology,the application of big data in the field of public health risk surveillance is a complex mechanism,which is related to a series of problems in big data governance and risk governance,involving many fields such as public health,risk governance and information technology.

It could be an important practical topic and research direction to truly form a national public health emergency management system under the framework of international mutual trust and commitment and promote the modernization of national governance capacity.This construction should be rooted in the national governance situation and the framework of public health agreements.Experience is likely to be reflected and summarized in the application of technical forces in the process of epidemic early alarming and response.Furthermore,major epidemic prevention and control systems and mechanisms involving multiple participants should also be upgraded.

猜你喜欢
中国矿业大学徐州外国
疫散待春回
中国矿业大学(北京)土壤修复生态材料研究 黄占斌课题组
爆笑三国 第三回 三让徐州
徐州过年纪实
苏翠2号梨在徐州的引种表现及配套栽培技术
外国公益广告
中国矿业大学教育培训工作简介
外国如何对待官员性丑闻案
外国父母看早恋,有喜有忧
国务院总理温家宝给中国矿业大学2009届毕业生的批示