2024 Fall
Homework 5
Due by: Nov. 20, 2024, 11:59 PM
Instructions:
1. Print your First and Last name and NetID on your answer sheets
2. Submit all your answers including Python scripts and report in a single Jupyter Lab file
(.ipynb) or along with a single PDF to Brightspace by due date. No other file formats will be graded. No late submission will be accepted.
3. Total 3 problems. Total points: 100
1. (30 points)
Predict per capita crime rate in the Boston.csv data set. Split the data set into 70% for a training set and 30% for a test set. Fit a lasso model, ridge regression model, and PCR model respectively. Use cross-validation method to determine λ and M (the number of PCs). Present the test error and discuss results for the approaches that you consider.
2. (30 points)
Predict the number of applications received using the other variables in the College.csv data set. Split the data set into 60% for a training set and 40% for a test set.
(a) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.
(b) Fit a lasso model on the training set, with λ chosen by cross-validation. Report the test error obtained, along with the number of non-zero coefficient estimates.
(c) Fit a PLS model on the training set, with M chosen by cross-validation. Report the test error obtained, along with the value of M selected by cross-validation.
3. (40 points)
Use the following code to generate a data set with n = 500 and p = 2, such that the observations belong to two classes with a quadratic decision boundary between them.
(a) Plot the observations, colored according to their class labels. Your plot should display X1 on the x-axis, and X2 on they-axis.
(b) Fit a logistic regression model to the data using X1, X2, X12, X22, and X1×X2 as predictors.
Obtain a class prediction for each training observation (using full data set). Plot the observations, colored according to the predicted class labels.
(c) Fit a SVM using anon-linear kernel (polynomial with d>1 or RBF kernel) to the data. Obtain a class prediction for each training observation (using full data set). Plot the observations, colored according to the predicted class labels.
(d) Comment on your results.