Mapping forest fires by nonparametric clustering analysis

2018-03-27 12:10BulentTutmezMertOzdoganAhmetBoran
Journal of Forestry Research 2018年1期

Bulent Tutmez·Mert G.Ozdogan·Ahmet Boran

Introduction

Turkish forests are generally located in mountainous regions and are usually natural or with high biodiversity values.Turkey has 9000 plant species of which 3000 are endemic.Most of these are located in forestlands(OGM 2014).As in different parts of the world,tens of thousands of hectares in Turkey are devastated by forest fires annually.Forests play a major ecological role in Mediterranean ecosystems(Curovic et al.2011).In particular,cities along the Mediterranean Sea are directly in fluenced by fires with subsequent devastation.Since there has been an increasing tendency in the number of forest fires within the last few years as well as the magnitude of loss, fires encountered should be considered due to their noteworthy ecological effects.There are many factors that could be become more of an issue such as tree species, fire scene,temperature,and human factors.These factors gain a critical importance.It is projected that,primarily in the Mediterranean,there will be an increasing impact of human pressure on the natural environment due to increases in tourism and to increased urbanization(Perona and Brebbia 2010).

Forest fires have disastrous economic,social and environmental effects.They lead to the destruction of forests and wildlife.As discussed by Omi(2005),forest fires sometimes result in tragic and costly outcomes,and in the short-term,a burned area may not be visually pleasing.To most people, fires create images of blackened destruction,including dead trees,displaced wildlife,and devastating loss of human life and property.

A forest fire(wild fire)often differs from other fires by its extensive size and resultant damage.In recent years,many countries have given importance to this issue and have taken measures to forecast and control potential problems by using climate and environmental measurements and risk assessment-based studies(Kalabokidis et al.2015;Yang et al.2015).In addition to risk and climate-based analyses of fires,studies have focused on geographic information-based modelling and mapping(Cere et al.2012;Mohammadi et al.2014;Pourghasemi 2016).

As discussed in Duncan et al.(2009),monitoring of managed fire regimes has critical importance to appraise management goals,provide information necessary for adaptive management,and compare natural fire regimes.Identifying factors in fluencing the occurrence of fire and location of fire hazard regions and mapping the areas have crucial importance to minimize the frequency of fire and fire damage.The information required includes fire-risk mapping, fire detection and monitoring and damage assessment.Effective management needs the use of new methods of remote sensing and GIS(Sunar and özkan 2001;Bourgeau-Chavez et al.2002;Shao and Duncan 2007).Wireless sensor networks and satellite imagery are emerging technologies that can be employed for managing fire-scar detection and fire regimes(Duncan et al.2009;Aslan et al.2012).

At this stage,classi fication and mapping of forest fires using reliable parameters becomes important for preparing effective decision-support systems.Clustering and classification analyses allow the detection of the space/time pattern distribution of forest fires(Pereira et al.2015).These analyses are also useful to assist fire-managers in identifying risk areas,implementing preventive measures and conducting strategies for an ef ficient distribution of fire fighting resources(Tuia et al.2007;Tonini et al.2009).Recently,Orozco et al.(2012)have suggested a model which detects clusters of forest fires in space and tests their statistical signi ficance using scan statistics and spatiotemporal modelling.

The problem of data clustering has been widely studied in the data mining and machine learning literature(Aggarwal and Reddy 2014).The main motivation for clustering analysis is to group observations into clusters such that similar records are lumped together in the same cluster.Although many clustering methods used in different problems,there are a limited number of studies in literature for analyzing categorical(nominal scale)data encountered.

Therefore,mapping fires depending on the before mentioned parameters can provide useful information to understand and control the fire mechanism.Evaluating a region by a new fire-scar detection method can also provide new tools to build a decision-support system.However,there is a limitation on this pattern recognition application in that some of the measurements have nominal-scale properties such as species,level of devastation,and temperature.Thus,a differentiated clustering algorithm is required for appraising this type categorical data.

In the present study,forest fires recorded in the Antalya(Western Mediterranean)region,a well-known tourist destination along the Mediterranean Sea,were appraised by a nonparametric clustering approach or k-modes clustering.Because detrimentalfactorsand devastation records derived from fires were addressed nominal-scaled records,a categorical data-based clustering algorithm has been applied.Although some algorithms were developed for clusters from categorical data,these procedures have drawbacks(Huang and Ng 1999).For example,most of them are heuristic approaches that do not absolutely optimize an overall measure of fit.In some cases,means are not appropriate measures of a central tendency for categorical data(Chaturvedi et al.1997).

By using observations recorded between 2011 and 2014,the fires have been clustered and mapped using the simple matching dissimilarity measure.The k-modes clustering can be applied fast and it does not make any distributional assumptions.In this way,the practical importance of categorical data-based clustering and non-parametric analysis required in environmental science and forestry has been performed.The results can be used to establish a decisionsupport system to monitor and review forest fire activities,and to enhance fire management ef ficiency.

Materials and methods

K-modes clustering algorithm

The k-modes algorithm is an evocative algorithm of kmeans clustering which must have either interval or ratioscale input data.The algorithm has been introduced by Chaturvedi et al.(2001)as a novel nonparametric clustering procedure.The main superiority of the k-modes method is that it does not make any distributional assumptions relating to the data.In addition,it is a fast and computationally intensive algorithm.In the most general sense,the k-modes algorithm is a nonparametric bilinear model to obtain clusters from categorical data(Huang and Ng 2003).The general structure of the algorithm is expressed below as illustrated in Chaturvedi et al.(2001)and Gan et al.(2007).

where W is a matrix of generalized centroids,determined as modes of certain measurements;If k be the number of clusters being sought,S denotes a n×k binary indicator matrix and C represents an n×d data matrix as;

It should be noted that the data matrix C given in(1)is known,while both W and S are unknown and must be predicted.Different from interval scale data,the problem focused on in this paper addresses categorical data.Therefore,the data matrix described in(2)is considered as categorical.Inotherrespects,thematrixofcentroidsWwillhave nominal scale value or categorical components.

The matrices W and S are predicted iteratively(predicting S given predictions of W,then revising the predictions of W given the new predictions of S).This process will be repeated until the quality of clustering is not improved.(Chaturvedi et al.2001).L0loss function addresses the main criterion of the quality.

Letĉ=SW.The parameter estimation problem can be expressed by minimizing an Lp-norm based loss function as follows(Gan et al.2007):

where cijandare the(i,j)entries of C andrespectively.L0is the limiting case as p→0 and simply counts the number of inconsistencies in the matrices C and Cˆ,i.e.,

where δ(.,.)can be expressed according to the simple matching dissimilarity(SMD)measure(Kaufman and Rousseeuw 1990)as follows:

Note that the SMD measure has a critical role in data clustering to consider categorical data.Let x and y be two categorical objects de fined by d categorical variables.As given in Huang(1998),the dissimilarity between x and y determined by the simple matching distance is expressed by

Study area and data

This study focused on fires encountered in the Antalya(Western Mediterranean)district between 2011 and 2014.According to reports prepared by the Antalya Regional Directorate(Boran and Yorulmaz 2014),the district covers 2,110,997 hectares;forests cover 55%,the rest of the region contains open fields.The main species are Calabrian Pine(65%),Cedarvood(16%),Larch(8%),etc.

The data sets provided by the Regional Directorate of Forestry were employed in the applications(OGM-RD 2015).Figure 1 shows the fire locations.The data properties and descriptive statistics for each year are summarized in Table 1.

Fig.1 Fire locations from 2011 to 2014

Table 1 Descriptive statistics for data sets

Fig.2 Clustering results for 2011

Using the observations recorded between the years 2011–2014,the fires were analysed for the available critical parameters such as manner of fires(coded),tree species(coded),maximum temperature(categorical),and burned area(numerical).Each parameter was equally important.To perform the modelling,the data were rescaled.In clustering literature,it is often suggested that the data should be appropriately normalized before clustering(Jain and Dubes 1988).Normalizations to scale of all numeric variables in the range[0,1]were performed.

Results

The main motivation for the implementation is to obtain optimal structure of generalized centroids which describe the modes of the measurements.At the first step,the initial centroids and modes randomly selected n distinct records from the data sets.The matrices S(indicator matrix)and W(generalized centroids)were unknown and these matrices were estimated iteratively by using the k-modes implementation.The(i,j)th entry of matrix W corresponds to the nominal scale value or category determining the centroid for the ith cluster and the jth categorical variable.

Fig.3 Clustering results for 2012

Fig.4 Clustering results for 2013

To provide some general structure,the number of the clusters was restricted to 5.Therefore,implementations have been carried out for 2,3,4 and 5 clusters.

The iterative algorithm has been applied until the L0loss function did not improve.Finally,the algorithm generated the fuzzy partition matrix to obtain more information to help to determine the final clustering and to identify the fire observations.The applications showed that there are no large differences between final loss(L)values for each implementation.However,as was expected,L values varied across the years.Figures 2,3,4 and 5 illustrate the classi fied fires on the maps.

Fig.5 Clustering results for 2014

Discussion

Because firesintheWesternMediterraneancreatesubstantial economicand environmental losses,developing toolssuchas maps and models has gain critical importance for their management and to decrease their negative impacts.Although grouping of fire cases(with categorical,nominal-scale variables)intomeaningfulcategoriestorevealusefulinformation is a dif ficult task,the use of the k-modes clustering algorithm can provide some possibilities to overcome this challenge.

As shown in Table 1,the most destructive year was 2013.In addition to the number of occurrences,maximum burned area was recorded.There is a parallel between 2011 and 2014.Although maximum temperatures are close,no clear relationship between the strength of the fire and temperatures was observed.

The spatial positions of the fires and relationships amongst the variables have not been considered during the clustering implementations.However,the maps show that there are some dependencies between spatial positions and fires.Since the results indicate some trends and convergences,these can be assessed by local management authorities.From a general perspective,species and manner of fires relating to human factors could be expressed as the prominent factors.Especially in the lower number of clusters,the patterns can be monitored clearly.

Determining the optimal number of clusters should be kept in mind.However,the algorithm aims to apply a mode seeking-based supervised procedure(modes instead of means for clusters)on the ground of expected number of clusters.In practice,the suitable number of clusters mainly depends on separation and complexity of the map.The main statistical and technical superiority of this clustering procedure is its nonparametric analysis capability,which means that it did not make any distributional assumptions on the fire records.In addition,the k-modes clustering algorithm did not become computationally intense even when the number of categories to be clustered became very large as in temperature and burned area.

Based on a Geographical Information Systems(GIS)perspective,the fires can be monitored and identi fied over large areas in a timely and cost-effective manner by using satellite sensor imagery in combination with spatial variability.A supervised classi fication can be performed with some ground referenceinformation.Thestudyalsoprovidesanopportunity tomap firesduringlongperiodsbyremotesensingimagery.In addition to remote sensing and GIS techniques discussed in Duncan et al.(2009),the proposed evaluation tool,nonparametrick-modesclusteringmethod,iseffectiveformonitoring and mapping managed fire-scar regimes and may be appropriate for other fire-dependent systems worldwide.

Despite all these advantages,it should be noted that one of the main dif ficulties of the k-modes implementation is to obtain a globally optimal solution.The algorithm can exclusively guarantee a locally optimal explanation.As mentioned above,another certain dif ficulty with k-modes is about the determination of optimal number of clusters.There are no valid and statistically reliable indices in the literature to obtain the optimal number of clusters in the observed data via the k-modes algorithm.

Conclusions

Forest fires play a large in ecological end environmental losses,as well as the deterioration of natural resources and destruction of habitats.The fires recorded in the Antalya region have been monitored and mapped using a nonparametric k-modes clustering approach.The k-modes clustering algorithm can be implemented rapidly and does not make any distributional assumptions concerning the available data.The results show that the algorithm can be used as an alternative tool to detect fire-scars.

The use of clustering and modelling procedures to evaluate the fires encountered in the Mediterranean districts can provide some information to understand the relationships between fires on the ground and critical parameters and spatial positions.Both investigators and regional authorities candeveloplocaldecision-supportmechanismsbasedonthe trends and variations illustrated by the maps.To analyze fire activitiesandtakesomeprecautionswillcreateaddedvalues in daily life and environment as well as in tourism and in the national economy.From a general perspective,the nonparametric clustering analysis has potentials for its application in global-scale fire monitoring because it is simple and reliable.

AcknowledgementsThe authors would like to extend their appreciation to Regional Directorate of Forestry in Antalya for permission to utilize the data sets.They would like to thank the editor Dr.Guofan Shao and anonymous reviewers for constructive comments.

Aggarwal CC,Reddy CK(2014)Data clustering,algorithms and applications.CRC Press,Boca Raton

Aslan YE,Korpeoglu I,Ulusoy ö(2012)A framework for use of wireless sensor networks in forest fire detection and monitoring.Comput Environ Urban Syst 36(6):614–625

Boran A,Yorulmaz T(2014)Calculating emissions and economic losses arising from forest fires,a sample district:Antalya,Undergradute thesis,Akdeniz University,Antalya(in Turkish with English abstract)

Bourgeau-Chavez LL,Kasischke ES,Brunzell S,Mudd JP,Tukman M(2002)Mapping fire scars in global boreal forests using imaging radar data.Int J Remote Sens 23(20):4211–4234

Cere R,Conedera M,Matasci G,Kanevski M,Tonini M,Vega-Orozco C,Volpi M(2012)Wildland-urban interface mapping using multi-temporal landsat imagery:the case of forest fires in Southern Swiss Alps.European Geosciences Union General Assembly,Vienna

Chaturvedi A,Green P,Carroll JD(1997)Empirical findings obtained from evaluating k-modes and overlapping k-centroids clustering.CSNA Meeting,Washington,DC

Chaturvedi A,Green P,Carroll JD(2001)K-modes clustering.J Classif 18:35–55

Curovic M,Medarevic M,Pantic D,Spalevic V(2011)Major types of mixed forests of spruce, fir and beech in Montenegro.Austrian J For Sci 128(2):93–111

Duncan BW,Shao GF,Adrian FW(2009)Delineating a managed fire regime and exploring its relationship to the natural fire regime in East Central Florida,USA:a remote sensing and GIS approach.For Ecol Manage 258:132–145

Gan GJ,Ma CQ,Wu JH(2007)Data clustering:theory,algorithms,and applications.SIAM,Philadelphia

Huang ZX(1998)Extensions to the k-means algorithm for clustering large data sets with categorical values.Data Min Knowl Discov 2(3):283–304

Huang ZX,Ng MK(1999)A fuzzy k-modes algorithm for clustering categorical data.IEEE Trans Fuzzy Syst 7(4):446–452

Huang ZX,Ng MK(2003)A note on k-modes clustering.J Classif 20:257–261

Jain A,Dubes R(1988)Algorithms for clustering data.Prentice Hall,Englewood Cliffs

Kalabokidis K,Palaiologou P,Gerasopoulos E(2015)Effect of climate change projections on forest fire behavior and values-atrisk in Southwestern Greece.Forests 6(6):2214–2240

Kaufman L,Rousseeuw P(1990)Finding groups in data—an introduction to cluster analysis.Wiley,New York

Mohammadi F,Bavaghar MP,Shabanian N(2014)Forest fire risk zone modeling using logistic regression and GIS:an Iranian case study.Small-scale For 13(1):117–125

OGM(2014)Progress report.General Directorate of Forestry of Turkey,Ankara

OGM-RD (2015)Forest fires in Antalya region.Unpublished Document of Regional Directorate of Forestry,Antalya

Omi PN(2005)Forest fires:a reference handbook.ABC-CLIO,California

Orozco CV,Tonini M,Conedera M,Kanevski M(2012)Cluster recognition in spatial-temporal sequences:the case of forest fires.Geoinformatica 16:653–673

Pereira MG,Caramelo L,Orozco CV,Costa R,Tonini M(2015)Space-time clustering analysis performance of an aggregated dataset:the case of wild fires in Portugal.Environ Model Softw 72:239–249

Perona G,Brebbia CA(2010)Modelling,monitoring and management of forest fires II.WIT Press,Southampton

Pourghasemi HR(2016)GIS-based forest fire susceptibility mapping in Iran:a comparison between evidential belief function and binary logistic regression models.Scand J For Res 31(1):80–98

Shao GF,Duncan BW(2007)Effects of band combinations and GIS masking on fire scar mapping at local scales in east-central Florida,USA.Can J Remote Sens 33:250–259

Sunar F,özkan C(2001)Forest fire analysis with remote sensing data.Int J Remote Sens 22(12):2265–2277

Tonini M,Tuia D,Ratle F(2009)Detection of clusters using spacetime scan statistics.Wildland Fire 18(7):830–836

Tuia D,Lasaponara R,Telesca L,Kanevski M(2007)Identifying spatial clustering phenomena in forest- fire sequences.Phys A 376(1):596–600

Yang W,Gardelin M,Olsson J(2015)Multi-variable bias correction:application of forest fire risk in present and future climate in Sweden.Nat Hazards Earth Syst Sci 15(9):2037–2057