sales | sqft | adv_cost | inventory | distance | district_size | storecount |
231 | 1.47 | 7.62 | 897 | 10.9 | 79.48 | 40 |
232 | 1.53 | 9.57 | 892 | 9.4 | 51.154 | 12 |
156 | 1.68 | 8.37 | 542 | 7.9 | 60.358 | 41 |
157 | 1.355 | 6.73 | 552 | 6.8 | 55.561 | 68 |
10 | 1.33 | 1.66 | 242 | 3.5 | 89.624 | 14 |
10 | 1.33 | 1.17 | 235 | 3.6 | 86.898 | 62 |
519 | 1.89 | 12.96 | 3670 | 18.5 | 108.857 | 56 |
520 | 1.885 | 12.02 | 3657 | 19.1 | 100.685 | 75 |
437 | 1.7 | 12.29 | 3345 | 17.4 | 90.138 | 59 |
487 | 1.86 | 12.5 | 3322 | 16.5 | 111.284 | 22 |
299 | 1.4 | 9.86 | 1784 | 11.5 | 75.606 | 26 |
195 | 1.63 | 7.22 | 1230 | 9.8 | 64.245 | 27 |
20 | 1.24 | 5.23 | 483 | 2.4 | 55.929 | 11 |
68 | 1.51 | 3.93 | 114 | 4.5 | 73.187 | 33 |
428 | 1.78 | 11.04 | 2829 | 16.4 | 101.192 | 51 |
429 | 1.725 | 9.43 | 3410 | 15.7 | 80.694 | 16 |
464 | 1.72 | 12.19 | 2873 | 15.8 | 105.254 | 84 |
15 | 1.2 | 1.17 | 289 | 3.2 | 80.937 | 31 |
65 | 1.47 | 6.56 | 292 | 3.9 | 80.187 | 97 |
66 | 1.51 | 5.55 | 312 | 3.8 | 85.897 | 66 |
98 | 1.24 | 5.79 | 235 | 6.4 | 90.219 | 75 |
338 | 1.65 | 3.34 | 1160 | 12.1 | 121.988 | 84 |
249 | 1.513 | 2.23 | 1184 | 9.7 | 115.277 | 12 |
161 | 1.4 | 6.95 | 399 | 7.9 | 50.188 | 14 |
467 | 1.46 | 13.17 | 2062 | 16.1 | 101.211 | 89 |
398 | 1.84 | 11.68 | 2103 | 15.9 | 95.406 | 49 |
497 | 1.68 | 12.11 | 2743 | 18 | 80.195 | 14 |
528 | 1.94 | 10.98 | 3779 | 18 | 110.025 | 58 |
529 | 1.765 | 11.11 | 3916 | 18.9 | 103.26 | 52 |
99 | 1.31 | 4.35 | 782 | 4.8 | 111.732 | 52 |
100 | 1.525 | 3.79 | 804 | 4.7 | 99.7 | 41 |
1 | 1.45 | 4.68 | 1116 | 3.4 | 85.882 | 50 |
347 | 1.65 | 10.08 | 2223 | 13.4 | 94.181 | 49 |
348 | 1.811 | 7.87 | 2180 | 12.1 | 95.242 | 50 |
341 | 1.64 | 10.34 | 1494 | 14.3 | 70.693 | 28 |
557 | 1.66 | 13.55 | 3522 | 18.5 | 94.329 | 43 |
508 | 1.698 | 11.53 | 3521 | 16.7 | 99.917 | 50 |
In the “HomeSales†dataset, the response variable,sales, depends on six potential predictor variables,sq_ft, adv_cost, inventory, distance,district_size, and storecount. Fit four simple linearregression (SLR) models corresponding to the four predictors,sq_ft, adv_cost, inventory, anddistance. Then, for each model, create a normalprobability plot and a histogram for the residuals, together withthe two residual scatterplots: residuals vs. fitted values andresiduals vs. observation order.
What do the residual plots for the model with sq_ft asthe predictor indicate about the validity of this regression modeland assumptions made about the errors?
What do the residual plots for the model with adv_costas the predictor indicate about the validity of this regressionmodel and assumptions made about the errors?
What do the residual plots for the model with inventoryas the predictor indicate about the validity of this regressionmodel and assumptions made about the errors?
What do the residual plots for the model with distanceas the predictor indicate about the validity of this regressionmodel and assumptions made about the errors?
One objective of this analysis is to obtain an appropriatesimple linear regression model that can be used to estimate theaverage sales based on a single predictor. State your “best†choicebased on your conclusions in parts (a)–(d).
Complete the table below, using the regression analysis resultsof the four simple linear regression models considered in parts(a)–(d). Based on the table entries, would you change your “bestâ€choice from part (e).
Model predictor | S | R2 | t-stat |
sqft | 110.75 | 66.44% | 8.32 |
adv_cost | | | |
inventory | | | |
distance | | | |
A model including the predictor variable adv_cost is ofspecific interest. Obtain appropriate residual plots and determineif adding either district_size or storecount asan additional predictor to the SLR model with predictoradv_cost is likely to improve its fit.