Lecture 41 Wednesday, April 25, 2007
What was covered?
- Stationarity and isotropy
- Statistics for spatial data
- The semivariogram for geostatistical data
- Moran's I for lattice data
- Ripley's K for point process data
Non-spatial measures of variation
- The variance, denoted
or
to emphasize the identity of the variable in question, is the average squared deviation about the mean.

An equivalent formula is the following.

In this formula the variance is the average squared differences among all pairs of observations. The multiplier of one half accounts for the fact that the double sum counts each of these squared differences twice. Versions of the second formula appear in many spatial statistics.
- The covariance generalizes the variance to the case when therea are two variables measured for each observation. The covariance indicates whether two variables on average vary together in the same direction or in opposite directions.

- To improve interpretability the covariance is often standardized by dividing it by the product of the square root of the variances of each variable. The result ranges between –1 and 1 and is called the Pearson correlation coefficient r.

- With spatial data these formulas can be extended in various ways.
The problem of replication with spatial data
- One of the purposes of a spatial analysis is to characterize the nature of the spatial process. This is made difficult because we have no replication of the process. Our spatial sample is a sample of size one.
- If we want to characterize an attribute of a typical individual we get more individuals. The different individuals thus comprise the sample. If on the other hand we want to learn more about a particular individual we can sample that individual more intensely but in reality those are just subsamples. There is no replication for an individual. The individual is unique.
- If we want to characterize a particular region of space where the characterization desired involves describing the internal relationships within that region of space, we have the same problem.
- The standard solution is to substitute replication in the data for replication of the data. An assumption that is made in spatial analysis is that the spatial process under study repeats itself over its domain D. Such a spatial process is said to be stationary. For a stationary process the absolute coordinates at which we observe the process are unimportant. All that matters are the orientated distances between the points. In a stationary process if we translate the entire set of coordinates by a specific amount in a specified direction, the entire process remains the same.
- If in addition the process is invariant to the direction of the displacement in that only the magnitude of the displacement matters, then the process is said to be isotropic.
Types of stationarity
- The proper way to view spatial data is as multivariate data. Suppose we observe a spatial process at locations s1, s2, ..., sk. The behavior of the spatial process Z(s) can be completely characterized by its joint probability distribution function.

- A very strong sort of stationarity is one in which the joint probability distribution function is invariant under translation. Formally strong stationarity is defined as follows.

- For most applications strong stationarity is too restrictive. A weaker form of stationarity requires only that the moments of the joint distribution don’t change. This is called second-order stationarity. Formally two conditions are required for second-order stationarity.
- The mean is constant over the spatial domain D, i.e.,
for all locations s in D.
- The covariance depends on the separation between points but not on their absolute location, i.e.,
for all locations s. Here C*(h) is a function that depends only on the displacement vector h. It is called the covariogram.
- If it turns out
, where
is the norm of the vector h, so that the covariance depends on the size of the displacement but not its direction from s, the spatial process is said to be both second order stationary and isotropic.
- A technique often used in time series analysis to remove absolute time references and to obtain stationarity is differencing, i.e., constructing the new variable Z(s) – Z(s+h). This leads to a third definition of stationarity, intrinsic stationarity. Formally a spatial process is intrinsic stationary if it has a constant mean and
the variance of the differences of Z at pairs of locations only depends on h, the displacement between the locations. Thus we can unambiguously define what's called the semivariogram, γ(h).

Because it derives from the weakest form of stationarity and is more generally applicable, the semivariogram is the preferred tool for characterizing geostatistical spatial processes.
- Although the semivariogram is less intuitive than the covariogram, it turns out the two are related and so it is easy connect one to the other. If an intrinsic stationary process has the additional property of second-order stationarity, then the covariogram and semivariogram are related as follows.

By definition C(0) is just the variance of the spatial process.
- Note: the terms semivariogram and variogram are often used interchangeably in the literature. Technically the semivariogram is as defined above, while the variogram is twice this quantity. The reason the distinction is important is that the semivariogram has the nice relationship to the covariogram shown above. Typically when someone speaks of the variogram (particularly in software documentation) the actual reference is to the semivariogram.
Graphing the semivariogram
- Fig. 1 shows a theoretical (exponential) variogram for an isotropic process (so that the displacement vector h is replaced by the scalar h). Some of the standard terminology of semivariograms is displayed in the figure.
- sill (the upper asymptote in the figure),
- range (the distance at which sill is reached or for a true asymptotic sill, 95% of the distance to the sill),
- nugget (the magnitude of the discontinuity that occurs at the origin). Theoretically the variogram should pass through the origin but it often does not with real data.
|
|
| Fig. 1 Typical semivariogram of a stationary spatial process |
Fig. 2 Corresponding covariogram for the second-order stationary process of Fig. 1 |
- Fig. 2 relates the semivariogram to the covariogram (when the spatial process is both intrinsic and second-order stationary). As the figure shows the sill corresponds to the variance of the process, C(0). The covariance decreases to 0 from this point as the separation between points increases. Thus as the semivariogram approaches the sill, the covariogram approaches 0, i.e., C(h) → 0.
- A first step toward constructing a theoretical model of the semivariogram is to calculate the empirical semivariogram

Here N(h) is the set of location pairs that are separated by a lag h and
is the number of unique pairs in that set. Observe that this formula resembles the alternative variance formula for non-spatial data given above.
- Typically we plot
versus h (or h for an isotropic process) to assess how the process decays over space. Sometimes we use the empirical semivariogram as a jumping off point for fitting a formal variogram model to the data.
Moran’s I for lattice data
- Although a semivariogram can be estimated for lattice data, doing so typically doesn't make theoretical sense because of the absence of observations at intermediate distances. An alternative measure of spatial association for lattice data is Moran's I.
- Let
and let
be the neighborhood connectivity between sites si and sj, such that
and
> 0 if sites si and sj are connected and 0 otherwise. For lattice data arranged in rectangular grids typical connectivity rules borrow from the game of chess. Thus we can have rook neighborhoods and queen neighborhoods, referencing the legal moves of these pieces in chess. In truth any neighborhood structure is possible.
- Moran’s I is calculated as follows.

where 1 is a column vector of ones and W is the connectivity matrix.
- Moran's I is often expressed in terms of distance classes so that we get a separate value I(d) for each choice of d.

Here
where N(d) is the set of location pairs that are separated by a lag d.
- Typically I(d) is plotted against d producing a plot very reminiscent of the semivariogram in Fig. 1.
Mantel test
- A Mantel test is commonly carried out with both lattice and geostatistical data and is a test of association between points using two different measures of distance. Typically one measure is the difference in attribute values, Z(s), between pairs of points and the other is physical distance.
- A Mantel test works directly with distance matrices and tests whether the entries of the two matrices are correlated. The statistic used is typically the standard Pearson correlation coefficient applied to the distance matrices in which the entries in the lower triangle of the matrix stacked to form a single vector.
- A significance test is usually obtained by generating a permutation distribution for Mantel's r. This is done by randomly permuting the rows and their corresponding columns in one of the distance matrices. Permuting a distance matrix in this way is equivalent to randomly assigning attribute values to locations in the original map.
- Our textbook has further details and numerous examples of this test (Manly, chapter 9).
Point Process Data
- With point process data it is the locations themselves that are random. Thus a typical first step is to examine the distances between pairs of points to look for evidence of clustering. One can examine average distance to nearest neighbors, distance to next nearest neighbors, etc., and compare the statistics calculated to the same values obtained from a known spatial process, such as a spatial Poisson process (that is consistent with the hypothesis of complete spatial randomness, CSR).
- A popular and more efficient way of examining clustering at different scales is to calculate a quantity called Ripley's K. The quantity K is defined such that

Here λ is called the intensity of the spatial process and is equal to the mean number of events per unit area, a value that is assumed constant over the region of interest. E is the expectation operator and so the right hand side is the averge number of events in the neighborhood of a given event.
- If the region in question has area R, we would expect on average
events to occur in that region. So if there are
additional events within a distance h of a single event and a total of
events overall, we would expect
to be the number of ordered pairs a distance of at most h apart.
- Define the indicator function
to be
where
is the distance between ith and jth observed events. If we sum this function over all events i ≠ j we obtain the number of ordered pairs a distance of at most h apart. Thus we have

The latter expression is the formula typically used to estimate Ripley's K.
- There is one complication not addressed by this formula. For points near the edge of R, the indicator function will underestimate the number of neighbors. Thus the estimator needs to be adjusted in some fashion to account for this. To account for edge effects Ripley's K is estimated as follows.

where
is the proportion of the neighborhood of a given point that lies within R.
- CSR refers to a homogeneous process with no spatial dependence. Thus under CSR we expect to see to see roughly the same number of pairs of events in any region. For a given event the number of additional events within a distance h is proportional to the area of a circle of radius h where the proportionality constant is λ, the intensity.
- Thus under CSR we expect
. This suggests examining the statistic L(h) defined as follows.

This should be equal to zero under CSR.
- In a plot of
versus h positive peaks will correspond to clustering and negative troughs to uniformity. Statistical significance is assessed by generating data from a process exhibiting CSR and plotting the extremes for the simulated process thus generating a probability envelope. Places where
pokes out of the envelope are places where the clustering or uniformity are statistically significant.
Course Home Page