Home Page > > Details

program ProgrammingHelp With ,c/c++,Java ProgrammingHelp With

Assignment 1 Q1
A few more cool things about PCA (30 points)
For parts a) to c) below, please assume the following:
Let be an random matrix such that , i.e. the is the covariance
matrix for row of (the th column of .
Assume that is a positive definite matrix with normed eigenvalue decomposition .
Question parts:
a. (10 points) Let be the vector of scores for the -th row of . Show that the PCA
representation preserves distance between the two vectors and , i.e. that
where . Hint: Use the properties of the various pieces of the eigenvalue
decomposition.
b. (10 points) Using the properties of traces of products of matrices and the definition of in part a),
show that:
showing that the sum of the eigenvalues is equal to the sum of the marginal variances.
c. (10 points) Assume that we generate a random vector such that and
. Let
where as described at the beginning of this question.
i. What are the and ?
ii. What is the distribution of ?
Please show your work in deriving the answers, but you may use standard results for the properties of Normal
random variables.
X = ( |? | )X1 Xp n × p Var(( ) = Σ?iXt)i Σ
i X i Xt
Σ Σ = WΛW t
= W(Yi Xt)i p i X
(Xt)i (Xt)j
|| ? ||(X)t i (X)t j = || ? ||Yi Yj
||u ? v|| = (u ? v (u ? v))t
Σ
tr(Σ) = tr(Λ)
p × 1 Z ~ Normal(0, 1)Zi
Cov( , ) = 0?i ≠ jZi Zj
V = ZWtΛ1/2
Σ = WΛWt
E(V) Var(V)
ViAssignment 1 Q2
Analyzing wine data (30 points)
The data for this exercise comes from a paper by Cortez, et al. (2009)
(https://www.sciencedirect.com/science/article/abs/pii/S0167923609001377?via%3Dihub) where the authors
were trying to relate various chemical properties of red and white wine to perceived quality. For this question,
we will analyze only the data for the chemical properties, not the quality. Also the original paper looked at red
and white wine, we will only use the data for the red.
The data can be read in via:
library(tidyverse)
wine_data<-read_csv("red_wine_data.csv") # Be sure this is in your current working di
rectory
glimpse(wine_data)
Rows: 1,599
Columns: 12
$ `fixed acidity` 7.4, 7.8, 7.8, 11.2, 7.4, 7.4, 7.9, 7.3, 7.8, 7…
$ `volatile acidity` 0.700, 0.880, 0.760, 0.280, 0.700, 0.660, 0.600…
$ `citric acid` 0.00, 0.00, 0.04, 0.56, 0.00, 0.00, 0.06, 0.00,…
$ `residual sugar` 1.9, 2.6, 2.3, 1.9, 1.9, 1.8, 1.6, 1.2, 2.0, 6.…
$ chlorides 0.076, 0.098, 0.092, 0.075, 0.076, 0.075, 0.069…
$ `free sulfur dioxide` 11, 25, 15, 17, 11, 13, 15, 15, 9, 17, 15, 17, …
$ `total sulfur dioxide` 34, 67, 54, 60, 34, 40, 59, 21, 18, 102, 65, 10…
$ density 0.9978, 0.9968, 0.9970, 0.9980, 0.9978, 0.9978,…
$ pH 3.51, 3.20, 3.26, 3.16, 3.51, 3.51, 3.30, 3.39,…
$ sulphates 0.56, 0.68, 0.65, 0.58, 0.56, 0.56, 0.46, 0.47,…
$ alcohol 9.4, 9.8, 9.8, 9.8, 9.4, 9.4, 9.4, 10.0, 9.5, 1…
$ quality 5, 5, 5, 6, 5, 5, 5, 7, 7, 5, 5, 5, 5, 5, 5, 5,…
The variables are self-evident from the names. We will not want to use the quality varible and we can create a
new dataset without it via:
wine_data_chem <- wine_data %>% select(-quality)
head(wine_data_chem)
# A tibble: 6 x 11
`fixed acidity` `volatile acidity` `citric acid` `residual sugar` chlorides

1 7.4 0.7 0 1.9 0.076
2 7.8 0.88 0 2.6 0.098
3 7.8 0.76 0.04 2.3 0.092
4 11.2 0.28 0.56 1.9 0.075
5 7.4 0.7 0 1.9 0.076
6 7.4 0.66 0 1.8 0.075
# … with 6 more variables: free sulfur dioxide ,
# total sulfur dioxide , density , pH , sulphates ,
# alcohol
This is the data you should analyze.
a. (10 points) Using only scatterplots and the sample correlation matrices, summarize what you believe to
be are the most interesting associations you observe amongst these characteristics. Show both the
plots and summaries you generate to support your summaries.
b. (20 points) Perform a principal component analysis of this data using your preferred function. As part of
this analysis, please be sure complete the following tasks:
Report the eigenvalues for all 11 principal compoments.
For the first two principal components, plot and interpret compononents in terms of the original
variables. In particular, explain which variables are most highly correlated with each of these two
components and how these components are different from each other.
Choose the smallest number of principal components that you believe can be used to summarize
the information from the data and justify your choice.

Contact Us - Email:99515681@qq.com    WeChat:codinghelp
Programming Assignment Help!