R for Researchers: Regression (OLS) solutions
April 2015
This article contains solutions to exercises for an article in the series R for Researchers. For a list of topics covered by this series, see the Introduction article. If you're new to R we highly recommend reading the articles in order.
There is often more than one approach to the exercises. Do not be concerned if your approach is different than the solution provided.
These solutions require the solutions from the prior lesson be run in your R session.
Exercise solutions
These exercises use the alfalfa dataset and the work you started on the alfAnalysis script. Open the script and run all the commands in the script to prepare your session for these problems.
Note, we will use the shade and irrig variable as continuous variables for these exercise. They could also be considered as factor variables. Since both represent increasing levels we first try to use them as scale.
Set the the reference level of the inoc variable to cntrl.
####################################################### ####################################################### ## ## Regression ## ####################################################### ####################################################### str(alfalfa$inoc)
Factor w/ 5 levels "A","B","C","cntrl",..: 1 2 5 3 4 5 4 2 1 3 ...
alfalfa$inoc <- factor(alfalfa$inoc,levels=c("cntrl","A","B","C","D") )
Create a quadratic poly term for the shade variable.
shade2 <- poly(alfalfa$shade, degree=2)
Regress yield on the irrig, inoc, the quadratic shade term, and all their interactions.
out <- lm(yield~(irrig+inoc+shade2)^2, data=alfalfa) summary(out)
Call: lm(formula = yield ~ (irrig + inoc + shade2)^2, data = alfalfa) Residuals: 1 2 3 4 5 6 7 -1.403e-02 2.053e-02 -2.149e-01 -1.712e-01 1.621e-01 6.807e-01 -3.241e-01 8 9 10 11 12 13 14 2.053e-02 5.610e-02 3.044e-01 1.141e-01 -7.523e-01 -8.415e-02 5.321e-16 15 16 17 18 19 20 21 -1.847e-01 3.241e-01 5.610e-02 -4.565e-01 2.258e-01 3.224e-01 -8.210e-02 22 23 24 25 2.092e-01 -1.621e-01 -3.582e-02 -1.403e-02 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 32.4643 1.3937 23.294 0.000173 *** irrig -1.1195 0.4020 -2.785 0.068729 . inocA 4.3853 1.9397 2.261 0.108848 inocB -1.1339 2.5535 -0.444 0.687085 inocC 1.5821 1.5433 1.025 0.380738 inocD 4.6810 1.6295 2.873 0.063909 . shade21 4.0046 5.0646 0.791 0.486852 shade22 -8.5243 7.6699 -1.111 0.347454 irrig:inocA 0.6117 0.5686 1.076 0.360816 irrig:inocB 2.6848 0.8416 3.190 0.049701 * irrig:inocC 1.7532 0.5001 3.505 0.039332 * irrig:inocD 0.1157 0.4993 0.232 0.831676 irrig:shade21 2.5552 1.2161 2.101 0.126428 irrig:shade22 3.4764 1.8453 1.884 0.156083 inocA:shade21 -9.3599 5.2525 -1.782 0.172771 inocB:shade21 -1.4753 3.4398 -0.429 0.696927 inocC:shade21 4.1493 3.2650 1.271 0.293373 inocD:shade21 -0.5848 4.5746 -0.128 0.906373 inocA:shade22 -8.8399 4.1364 -2.137 0.122187 inocB:shade22 7.3414 7.2192 1.017 0.384063 inocC:shade22 0.8405 3.5126 0.239 0.826294 inocD:shade22 -3.5093 3.1239 -1.123 0.343060 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.8035 on 3 degrees of freedom Multiple R-squared: 0.9935, Adjusted R-squared: 0.9478 F-statistic: 21.74 on 21 and 3 DF, p-value: 0.01347
Use the backward selection method to reduce the model. Use the significance of the term as the criteria, as was done in the lesson.
There are two methods provided in this solution.
step(out, test="F")
Start: AIC=-19.94 yield ~ (irrig + inoc + shade2)^2 Df Sum of Sq RSS AIC F value Pr(>F) <none> 1.9370 -19.9437 - irrig:shade2 2 4.9536 6.8906 7.7819 3.8361 0.14904 - inoc:shade2 8 17.6767 19.6137 21.9338 3.4222 0.16995 - irrig:inoc 4 15.2447 17.1817 26.6242 5.9028 0.08823 . --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call: lm(formula = yield ~ (irrig + inoc + shade2)^2, data = alfalfa) Coefficients: (Intercept) irrig inocA inocB inocC 32.4643 -1.1195 4.3853 -1.1339 1.5821 inocD shade21 shade22 irrig:inocA irrig:inocB 4.6810 4.0046 -8.5243 0.6117 2.6848 irrig:inocC irrig:inocD irrig:shade21 irrig:shade22 inocA:shade21 1.7532 0.1157 2.5552 3.4764 -9.3599 inocB:shade21 inocC:shade21 inocD:shade21 inocA:shade22 inocB:shade22 -1.4753 4.1493 -0.5848 -8.8399 7.3414 inocC:shade22 inocD:shade22 0.8405 -3.5093
out2 <- lm(yield~irrig+inoc+shade2+irrig:inoc+irrig:shade2, data=alfalfa) drop1(out2, test="F")
Single term deletions Model: yield ~ irrig + inoc + shade2 + irrig:inoc + irrig:shade2 Df Sum of Sq RSS AIC F value Pr(>F) <none> 19.614 21.934 irrig:inoc 4 16.5458 36.159 29.227 2.3199 0.1216 irrig:shade2 2 1.6273 21.241 19.926 0.4563 0.6451
out3 <- lm(yield~irrig+inoc+shade2+irrig:inoc, data=alfalfa) drop1(out3, test="F")
Single term deletions Model: yield ~ irrig + inoc + shade2 + irrig:inoc Df Sum of Sq RSS AIC F value Pr(>F) <none> 21.241 19.926 shade2 2 63.341 84.582 50.471 19.3832 0.0001257 *** irrig:inoc 4 20.399 41.639 28.754 3.1211 0.0526577 . --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
out4 <- lm(yield~irrig+inoc+shade2, data=alfalfa) drop1(out4, test="F")
Single term deletions Model: yield ~ irrig + inoc + shade2 Df Sum of Sq RSS AIC F value Pr(>F) <none> 41.639 28.754 irrig 1 14.797 56.436 34.356 6.041 0.02501 * inoc 4 155.894 197.534 59.676 15.912 1.380e-05 *** shade2 2 84.328 125.967 52.429 17.214 8.196e-05 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
out5 <- lm(yield~irrig+inoc+shade, data=alfalfa) drop1(out5, test="F")
Single term deletions Model: yield ~ irrig + inoc + shade Df Sum of Sq RSS AIC F value Pr(>F) <none> 45.576 29.013 irrig 1 14.797 60.373 34.042 5.8439 0.02646 * inoc 4 155.894 201.470 58.169 15.3924 1.236e-05 *** shade 1 80.391 125.967 52.429 31.7501 2.402e-05 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Commit your changes to AlfAnalysis.
There is no code associated with the solution to this problem.
Return to the Regression (OLS) article.
Last Revised: 3/2/2015