The Book of R (Question 20.2) Please answer using R code.
Continue using the survey data frame from the package MASS forthe next few exercises.
- The survey data set has a variable named Exer , a factor with k= 3 levels describing the amount of physical exercise time eachstudent gets: none, some, or frequent. Obtain a count of the numberof students in each category and produce side-by-side boxplots ofstudent height split by exercise.
- Assuming independence of the observations and normality asusual, fit a linear regression model with height as the responsevariable and exercise as the explanatory variable (dummy coding).What’s the default reference level of the predictor? Produce amodel summary.
- Draw a conclusion based on the fitted model from (b)—does itappear that exercise frequency has any impact on mean height? Whatis the nature of the estimated effect?
- Predict the mean heights of one individual in each of the threeexercise categories, accompanied by 95 percent predictionintervals.
- Do you arrive at the same result and interpretation for theheight-by-exercise model if you construct an ANOVA table using aov?
- Is there any change to the outcome of (e) if you alter themodel so that the reference level of the exercise variable is“none� Would you expect there to be?
Now, turn back to the ready-to-usemtcars data set. One of the variables in this data frame is qsec ,described as the time in seconds it takes to race a quarter mile;another is gear , the number of forward gears (cars in this dataset have either 3, 4, or 5 gears).
- Using the vectors straight from the data frame, fit a simplelinear regression model with qsec as the response variable and gearas the explanatory variable and interpret the model summary.
- Explicitly convert gear to a factor vector and refit the model.Compare the model summary with that from (g). What do youfind?
- Explain, with the aid of a relevant plot in the same style asthe right image of Figure 20-6 why you think there is a differencebetween the two models (g) and (h).