a scholarly smorgasbord: spatial stats

Showing posts with label spatial stats. Show all posts

Sunday, November 16, 2014

Maximum likelihood and other parameter estimations

First, a basic definition: A parameter is an unknown, fixed value that describes a characteristic. For example, a mean is a characteristic that describes the average over a population. The true mean is usually not known, but rather estimated.

When we fit a model to our data, we get parameters such as the regression coefficients (β's). In spatial stats, we use a (semi)variogram function to estimate the parameters range, sill, and nugget effect. An introduction to the semivariogram and its parameters may be found here.

A few common methods of parameter estimation used in spatial stats are the least squares (OLS or more commonly WLS) and the likelihood based methods (maximum likelihood MLE or restricted maximum likelihood REML).

Least squares methods fit a model by minimizing the distance between the observed data and the best fit line. Likelihood based methods use the observed data to estimate the population parameters using established distributions. When you code a likelihood estimation, you will input parameters and an underlying distribution. For example, for a spatial stats dataset, you would first investigate the semivariogram to estimate the nugget effect parameter and the distribution (i.e. exponential or linear), and then model the data via MLE.

In R, the likfit command in the package geoR models likelihood based methods. In the same package, variofit models least squares.

Weighted least squares

... or Why can't we just be ordinary squares?

In fitting your linear model, you may be interested in generating a prediction line that describes the relationship between your predictor(s) and your outcome. If you have constant variance in the errors (homoskedasticity), an ordinary least square (OLS) approach is used to fit the model to the data and generate a best fit line. A best fit line essentially minimizes the distance between the observed data and the predictions made by the model. If your data shows constant variance in the errors AND the errors are normally distributed, then OLS is the maximum likelihood estimator.

However, in spatial statistics (the analysis of data with a spatial component that considers spatial dependency) we often use data that violate the rule of error constant variance (heteroskedasticity). In this case, we use weighted least square (WLS) to fit the model to the data and generate a best fit line. In WLS, the error assumptions are that errors are normally distributed with mean vector 0 and nonconstant variance-covariance matrix

σ2W, where W is a diagonal matrix. See this post from Penn State for a short intro to the nonconstant variance-covariance matrix.

Friday, April 15, 2011

Bulletin board wisdom

Question: Can I determine large scale variability by an indicator plot? When I look at a plot of my data by x coordinates, there seems to be a spatial trend.

Answer: The indicator plot suggests a large scale trend because it looks like your x coordinate drives the trend in your indicator plot, and coordinates are ipso facto large scale factors. In order to verify this, you should account for the x coordinate covariate, then do a semivariogram on the residuals (a residual semivariogram, natch) to determine whether this apparent trend was really being driven by that x coordinate or by some yet-unaccounted for variable.

You can account for any variable you feel may be driving your spatial trend and lump that variable into your large scale variability (as long as it is not a regional variable, obviously), then run a residual semivariogram to see if there is still spatial trends.