Spatial data analysis - Prediction models in spatial data analysis for landslide hazard mapping

Spatial data analysis

Prediction models in spatial data analysis for landslide hazard mapping

Overview

views of prediction map

What is a prediction model?
A spatial prediction model identifies or delineates vulnerable areas to be affected by (or for):
- Future landslides
- Habitat of endangered animals
- Aquifer vulnerable areas
- Good industrial areas
Timing is NOT considered here.
Motivation? Why and when do we need the models?
- Can we identify vulnerable areas to be affected by future landslides directly by a specialist at a reasonable cost?
  - Yes! DO IT. (no prediction model)
  - No! Prediction model can be useful
- Suppose the answer is NO?
  The geomorphologists cannot identify such vulnerable areas directly at a reasonable cost.
- Can the geomorphologists obtain indirect but indicating (causal) factors? at a reasonable cost?
  - No! NO HOPE (probably)
  - Yes! Prediction model plays a crucial role
Procedures - What to do?
- Mathematical frameworks
Estimation of parameters in the frameworks
- Validation of results
Examples ? Does it really work?
Concluding remarks

Brief history of spatial prediction modeling at the GSC

1972: Agterberg, F.P. et al., Prediction models for locating undiscovered copper deposits.

1977-1980: Chung C.F., Development of SIMSAG (interactive graphic system), CDC 6400 + tektronix 4014 terminal maximum number of pixels: 100 x 100

1990: Agterberg, F.P. and Bonham-Carter G.F. (eds.), GSC paper 89-9, Statistical Applications in the Earth Sciences

1992 - 1997: Spatial Data Analysis Laboratory. Development of spatial data integration system SGI, Windows NT/95/98/3.1 maximum number of pixels (in practice) 4000 x 4000

Basic idea: the favorability function

At each pixel, p:

ƒ(T_p: given m causal factors v_k(p)=1 . . . m)

T_p: (p will be affected by a future landslide of type D)

The "sureness" that the proposition T_p is true given the m causal factors (v_k(p)=1 . . . m), is being measured.

By "sureness" we mean: probability, certainty, belief, plausibility, possibility ...

Mathematical frameworks

Three mathematical frameworks used for the models are:

Probability theory
- Joint Conditional Probability Function
  The measurement of the "sureness" that the proposition T_p:(p will be affected by a future landslide) is true, given the m causal factors (v_k(p)=1 . . . m) is assumed to be the joint conditional probability function
  f(F_p|c₁,c₂,...,c_m) = Prob(F_p|c₁,c₂,...,c_m)
  representing hazard at p with pixel information on
  p:(c₁, c₂,...,c_m)
  
  F: the unknown area to be affected by future landslides
  Θ: The set of pixels whose pixel values are (c₁, c₂,...,c_m)
  The prior probability representing hazard at p with no pixel information at p is:
  (size of (F∩A) /size of A)
  where A represents the whole study area.
  Pixel p is hazardous:
  Prob{F_p|c₁,c₂,...,c_m} >> Prob{F_p}
  Pixel p is not hazardous:
  Prob{F_p|c₁,c₂,...,c_m} << Prob{F_p}
- Likelihood ratio function
  The likelihood ratio at p is defined as:
  λ =Prob{c₁,c₂,...,c_m|F_p}/ Prob{c₁,c₂,...,c_m|notF_p} =1-Prob{F_p} / Prob{F_p} * Prob{F_p|c₁,c₂,...,c_m}/ 1- Prob{F_p|c₁,c₂,...,c_m}
- Monotone functions of the likelihood ratio function
  - Weights of evidence function
    model by Peirce (1878) and Good (1950,1960, and 1976) (also see Spiegelhalter, 1986; Agterber et al, 1990; Bonham_Carter et al. 1989).
    WoE{F_p|c₁,c₂,...,c_m} = log_eλ
  - Certainty factor function
    model by Shortliffe and Buchanan (1975:MYCIN), Heckerman (1986), Pearl(1988) and Chung and Fabbri (1990)
    CF{F_p|c₁,c₂,...,c_m} = λ -1 / λ -1
  - Comparison of the weights of evidence and certainty factor functions at two pixels:
    Let (c₁, c₂,...,c_m) be the m pixel values of p
    Let (c₁, c₂,...,c_m) be the m pixel values of q
    If Prob{F_p} <=Prob{F_q} Then λ(p)<=λ(p)
    Relative "hazardness" of the pixels is the same regardless of which measurements are used for the study.
Dempster-Shafer evidential theory
- Belief and plausibility functions
Zadeh's fuzzy set theory
- Membership function

Direct estimation

S_p: "p has been affected by a past landslide of a given type

Prob{S_p|c₁,c₂,...,c_m}= size of S _Θ / size of Θ

Prob{S_p}=size of S / size of A
where S represents the areas affected by past landslides

Prob(F_p|c₁,c₂,...,c_m) = Prob(S_p|c₁,c₂,...,c_m)

λ_D = Prob{c₁,c₂,...,c_m|S_p} / Prob{c₁,c₂,...,c_m|notS_p} =1-Prob{S_p} / Prob{S_p} * Prob{S_p|c₁,c₂,...,c_m} / 1- Prob{S_p|c₁,c₂,...,c_m}

CF_D{F_p|c₁,c₂,...,c_m} =λ -1 / λ -1

WoE_D{F_p|c₁,c₂,...,c_m} = log_eλ

The order is preserved. The estimators are simple to compute, and do not require any mathematical assumptions. They fail badly as predictors of the occurrence of future landslides. They should be computed as benchmarks of the performance of spatial data as causal factors of landslides.

Bayesian estimation

Prob{F_p|c₁,c₂,...,c_m}= Prob{F_p}Prob{c₁,c₂,...,c_m|F_p} / Prob{c₁,c₂,...,c_m}

Conditional independence assumption given F_p

Prob{c₁,c₂,...,c_m|F_p} =Prob{c₁|F_p}Prob{c₂|F_p} ...Prob{c_m|F_p}

Prob{F_p|c₁,c₂,...,c_m} =(Prob{c₁|F_p}...Prob{c_m|F_p} / Prob{F_p|c₁,c₂,...,c_m})

Prob{F}(Prob{F_p|c₁} / Prob{F})... (Prob{F_p|c_m} / Prob{F})

Prob{c₁,c₂,...,c_m}= size of Θ / size of A,

Prob{c_k}= size of A_kck / size of A,

Prob{F_p|c_k} = size of F ∩ size of A_kck / size of A,

Prob{S_p|c_k} =size ofS∩ size of A_kck / size of A,

Prob{S_p} =size ofS / size of A

Bayesian estimate at each pixel p:

Prob{F_p|c₁,c₂,...,c_m} =(Prob{c₁|S_p}...Prob{c_m|S_p} / Prob{S_p|c₁,c₂,...,c_m}) Prob{S}(Prob{S_p|c₁} / Prob{S_p})... (Prob{S_p|c_m} / Prob{S_p})

The likelihood ratio becomes:

λ = λ₁ ... λ_m

λ₁ =Prob{c₁|F_p} / Prob{c₁|notF_p}
=Prob{F_p|c₁}(1-Prob{F_p} / Prob{F_p}(1-Prob{F_p|c₁}

λ^' = λ^'₁ ... λ^'_m

λ^'_k =Prob{c_k|S_p} / Prob{c_k|notS_p}
=Prob{S_p|c_k}(1-Prob{S_p} / Prob{S_p}(1-Prob{S_p|c_k}

CF₂={F_p|c₁,c₂,...,c_m}=(λ' - 1) / (λ' + 1)

WOE₂={F_p|c₁,c₂,...,c_m}=log_eλ'

The advantage of this estimator is that it depends only on bivariate conditional probabilities of the occurrences of past landslides given pixel values at each layer separately. However, the price of this advantage is adherance to the conditional independence assumption.

Validation methods

Time robustness:
We divide the occurrences into two time periods "past" and "future". Construct the prediction model based on the "past" occurrences, and then validate the results with respect to "future" occurrences.
We select a year so that approximately half the events occur during or before it. We pretend that year is the current one and use these events in the model to see how well the rest are predicted.
Space robustness:
Divide the study areas into several non-overlapping subareas. To construct a prediction map for each subarea, use the data from outside of the subarea ONLY.

Figure 1: This is a mosaic of nine prediction maps based on nine divisions of the study area, Rio Chinchina, Colombia using the Algebraic sum operation of the Fuzzy set framework. To construct a prediction map for each section, use the data from the other eight sections. Then mosaic the prediction maps (one from each subarea). Compare the mosaic-prediction map against past landslide occurrences.
Random:
Divide the occurrences into two groups randomly: group 1 and group 2. Construct the prediction map based on group 1 ONLY, then validate the results with respect to group 2 occurrences. Repeat the procedure in reverse order.
Combination:
Sometimes using a combination the random, space robustness and time robustness validation procedures is useful.

2006-09-01

http://www.gsc.nrcan.gc.ca/sda/landslide_e.php