Home Page > > Details

Introduction to Statistical Machine Learning Assignment 3: Implementation of PCA and Kernel PCA

 The University of Adelaide, School of Computer Science

Introduction to Statistical Machine Learning
Semester 2, 2019 Assignment 3: Implementation of PCA and Kernel PCA
DUE: 4 Nov. 2019, Monday 11:55 PM
Submission 
Instructions and submission guidelines: 
• You must sign an assessment declaration coversheet to submit with your assignment. 
• Submit your assignment via the Canvas MyUni. 
Kernel PCA and PCA
You are required to write code for performing PCA and kernel PCA (KPCA) on a given dataset. 
After performing PCA/KPCA, you will use various classifiers to perform classification. You will 
investigate the effect of applying PCA, KPCA and KPCA with different types of kernels. 
Specifically, the tasks you need to complete are:
• Write code for performing PCA 
• Apply PCA to the given dataset and then perform classification with the nearest 
neighbour classifier. Analyse the performance change against different reduced 
dimensions. (suggestion: from 256 to 10)
• Write code for performing KPCA. Use three kernel functions, Linear Kernel 
(equivalent to PCA), Gaussian-RBF kernel and Hellinger kernel (See the lecture slides)
• Apply KPCA to the given dataset and then perform classification with nearest 
neighbour classifier. Analyse the performance change against different reduced 
dimensions. (suggestion: from 256 to 10)
• [Master student only] Append 256 noisy dimensions to the original data, that is, for 
each sample xi, appending a 256-dimensional random feature vector to xi to make it 
a 512-dimensional feature. Then repeat the KPCA experiments by using a linear SVM 
as the classifier. Test and analyse the results. The following lines give an example code 
(Matlab) for appending noise features
o % Let X be the original data
o [N,d] = size(X); o R = randn(N,d); % using Gaussian noise in this experiment
o X_ = [X,R];
Hint:
For Gaussian-RBF kernel, you need to set the bandwith hyper-parameter 𝜎2
. An empirical 
approach is to choose it as 12𝑁2 ∑ ||𝑥𝑖 − 𝑥𝑗||2
𝑖𝑗 . You are encouraged to scale up and down this 
empirical chosen parameter a little bit to find out the best performed hyper-parameter. 
You can choose either Matlab, Python, or C/C++. I would personally suggest Matlab or 
Python. 
The PCA and KPCA part of your code should not rely on any 3rd-party toolbox. Only Matlab's 
built-in API's or Python/ C/C++'s standard libraries are allowed. However, you can use 3
rd
-
party implementation of linear SVM for your experiments.
You are also required to submit a report (<10 pages in PDF format), which should have the 
following sections (report contributes 50% to the mark; code 50%): 
• An algorithmic description of PCA and KPCA. (5%)
• Your understanding of PCA and KPCA (anything that you believe is relevant to this 
algorithm) (5%)
• Some analyses of your implementation. You should plot an error curve against the number 
of reduced dimensions. For KPCA, results from different kernels should be plotted on the 
same figure with different colours. (20% for master students and 40% for undergraduate 
students)
• You should present and analyse the results of adding noisy dimensions before performing 
PCA/KPCA. For example, you can compare the impact of noisy dimensions to different 
kernels in KPCA. (20% for master students only)
In summary, you need to submit (1) the code that implements PCA/KPCA and (2) a report in 
PDF. 
Data
You will use USPS dataset to test your model. It contains 10 classes with 7291 training 
samples and 2007 testing samples. The feature dimensionality is 256. The training data can 
be found in usps.libsvm and testing data can be found in usps.t.libsvm.
Contact Us - Email:99515681@qq.com    WeChat:codinghelp
© 2014 www.asgnhelp.com
Programming Assignment Help!