Sunday, November 16, 2014

Interpretting linear models in R

If you're new to R and stats, check out this awesome post over at the yhatq blog. It walks you through everything, from the code to the analysis, in simple, straight-forward language with code output.

On residuals:
If our residuals are normally distributed, this indicates the mean of the difference between our predictions and the actual values is close to 0 (good) and that when we miss, we're missing both short and long of the actual value, and the likelihood of a miss being far from the actual value gets smaller as the distance from the actual value gets larger.

On variable p-values:
Probability the variable is NOT relevant. You want this number to be as small as possible. If the number is really small, R will display it in scientific notation. In or example 2e-16 means that the odds that parent is meaningless is about 15000000000000000
On R-squared:
Metric for evaluating the goodness of fit of your model. Higher is better with 1 being the best. Corresponds with the amount of variability in what you're predicting that is explained by the model.
On the F-test and resulting F-stat:
This takes the parameters of our model (in our case we only have 1) and compares it to a model that has fewer parameters (sic). In theory the model with more parameters should fit better. If the model with more parameters (your model) doesn't perform better than the model with fewer parameters, the F-test will have a high p-value (probability NOT significant boost). If the model with more parameters is better than the model with fewer parameters, you will have a lower p-value.
All quoted text from post "Fitting & Interpreting Linear Models in R" by yhat, published May 18, 2013 at http://blog.yhathq.com/posts/r-lm-summary.html

No comments:

Post a Comment