The dataset HomesForSaleCA contains a randomsample of 30 houses for sale in California. Suppose that we areinterested in predicting the Size (in thousands of squarefeet) for such homes.
State Price Size Beds BathsCA 500 3.2 5 3.5CA 995 3.7 4 3.5CA 609 2.2 4 3CA 1199 2.8 3 2.5CA 949 1.4 3 2CA 415 1.7 3 2.5CA 895 2.1 3 2CA 775 1.6 3 3CA 109 0.6 1 1CA 5900 4.8 4 4.5CA 219 1.1 3 2CA 255 1.2 3 2CA 86 0.6 1 1CA 62 1.2 3 2CA 165 1.9 5 3.5CA 1695 6.9 5 5.5CA 499 1.4 3 2CA 47 1.5 3 2CA 195 2 3 2.5CA 775 1 2 2CA 199 1.4 3 2CA 480 3 5 3CA 173 0.9 3 1CA 189 2.5 2 2CA 230 1.7 3 2CA 380 2.1 5 3CA 110 0.8 2 1CA 499 1.3 3 2CA 399 1.4 3 2CA 2450 5 4 5
1. What is the total variability in the sizes of the 30 homes inthis sample? (Hint: Try a regression ANOVA with any of theother variables as a predictor.)
2. What other variable in the HomesForSaleCAdataset explains the greatest amount of the total variability inhome sizes? Explain how you decide on the variable.
3. How much of the total variability in home sizes is explainedby the \"best\" variable identified in question 2? Give the answerboth as a raw number and as a percentage.
4. Which of the variables in the dataset is the weakestpredictor of home sizes? How much of the variability does itexplain?
5. Is the weakest predictor identified in question 4 still aneffective predictor of home sizes? Include some justification foryour answer.
thank you for your help!