MATH4007-E1

The University of Nottingham

SCHOOL OF MATHEMATICAL SCIENCES

A LEVEL 4 MODULE, SPRING SEMESTER 2019-2020

COMPUTATIONAL STATISTICS

Suggested time to complete: TWO Hours THIRTY Minutes

Paper set: 19/05/2020 - 10:00

Paper due: 26/05/2020 - 10:00

Answer ALL questions

Your solutions should be written on white paper using dark ink (not pencil), on a tablet, or

typeset. Do not write close to the margins. Your solutions should include complete

explanations and all intermediate derivations. Your solutions should be based on the material

covered in the module and its prerequisites only. Any notation used should be consistent with

that in the Lecture Notes.

Guidance on the Alternative Assessment Arrangements can be found on the Faculty of Science

Moodle page: https://moodle.nottingham.ac.uk/course/view.php?id=99154#section-2

Submit your answers as a single PDF with each page in the correct orientation, to the

appropriate dropbox on the module’s Moodle page. Use the standard naming

convention for your document: [StudentID]_[ModuleCode].pdf. Please check the

box indicated on Moodle to confirm that you have read and understood the statement

on academic integrity: https://moodle.nottingham.ac.uk/pluginfile.php/6288943/mod_

tabbedcontent/tabcontent/8496/FoS%20Statement%20on%20Academic%20Integrity.pdf

A scan of handwritten notes is completely acceptable. Make sure your PDF is easily readable

and does not require magnification. Text which is not in focus or is not legible for any other

reason will be ignored. If your scan is larger than 20Mb, please see if it can easily be reduced

in size (e.g. scan in black white, use a lower dpi — but not so low that readability is

compromised).

Staff are not permitted to answer assessment or teaching queries during the assessment

period. If you spot what you think may be an error on the exam paper, note this in your

submission but answer the question as written. Where necessary, minor clarifications or

general guidance may be posted on Moodle for all students to access.

Students with approved accommodations are permitted an extension of 3 days.

The standard University of Nottingham penalty of 5% deduction per working day will

apply to any late submission.

MATH4007-E1 Turn over

MATH4007-E1

Academic Integrity in Alternative Assessments

The alternative assessment tasks for summer 2020 are to replace exams that would have

assessed your individual performance. You will work remotely on your alternative assessment

tasks and they will all be undertaken in “open book” conditions. Work submitted for

assessment should be entirely your own work. You must not collude with others or employ the

services of others to work on your assessment. As with all assessments, you also need to avoid

plagiarism. Plagiarism, collusion and false authorship are all examples of academic misconduct.

They are defined in the University Academic Misconduct Policy at: https://www.nottingham.ac.

uk/academicservices/qualitymanual/assessmentandawards/academic-misconduct.aspx

Plagiarism: representing another person’s work or ideas as your own. You could do this by

failing to correctly acknowledge others’ ideas and work as sources of information in an

assignment or neglecting to use quotation marks. This also applies to the use of graphical

material, calculations etc. in that plagiarism is not limited to text-based sources. There is

further guidance about avoiding plagiarism on the University of Nottingham website.

False Authorship: where you are not the author of the work you submit. This may include

submitting the work of another student or submitting work that has been produced (in whole

or in part) by a third party such as through an essay mill website. As it is the authorship of an

assignment that is contested, there is no requirement to prove that the assignment has been

purchased for this to be classed as false authorship.

Collusion: cooperation in order to gain an unpermitted advantage. This may occur where you

have consciously collaborated on a piece of work, in part or whole, and passed it off as your

own individual effort or where you authorise another student to use your work, in part or

whole, and to submit it as their own. Note that working with one or more other students to

plan your assignment would be classed as collusion, even if you go on to complete your

assignment independently after this preparatory work. Allowing someone else to copy your

work and submit it as their own is also a form of collusion.

Statement of Academic Integrity

By submitting a piece of work for assessment you are agreeing to the following statements:

1. I confirm that I have read and understood the definitions of plagiarism, false authorship

and collusion.

2. I confirm that this assessment is my own work and is not copied from any other person’s

work (published or unpublished).

3. I confirm that I have not worked with others to complete this work.

4. I understand that plagiarism, false authorship, and collusion are academic offences and I

may be referred to the Academic Misconduct Committee if plagiarism, false authorship or

collusion is suspected.

MATH4007-E1 Turn over

1 MATH4007-E1

1. (a) i) The truncated Poisson dsitribution has probability mass function

(; ) =

−

!(1 − −)

, = 1, 2, 3,… ,

where > 0 is a parameter. Prior information about is summarized by

() ∝ −|−3|, > 0.

Given observed data 1 = 2, 2 = 5, 3 = 13, derive the log posterior distribution,

denoted by (), up to an additive constant.

ii) It is required to find the maximum of (). An initial interval thought to contain a

maximum is given by 1 = 4.36, 3 = 5.81. Carry out two iterations of the Golden

Ratio method, i.e. find the next two intervals containing a maximum of ().

iii) What is the statistical interpretation of the output of the algorithm?

[15 marks]

(b) The joint density of two random variables and is given by

(, ) ∝ 22 exp(− − 5 − 4), , > 0.

i) Derive the Laplace approximation to the marginal density ().

ii) Evaluate this for the case = 2.

iii) Treating as missing information, give full details of how the EM algorithm can be

used to find the mode of the marginal distribution ().

iv) Starting from an initial value (0) = 0.1, perform two iterations of the EM algorithm.

[25 marks]

MATH4007-E1

2 MATH4007-E1

2. (a) It is required to sample from a density , where

() =

(1 − )

, 0 < < 1,

and > 0 is a constant.

i) Find the constant .

ii) Hence, explain how to sample from using inversion.

iii) Produce one sample from , given a sample = 0.4 from a (0, 1) distribution.

[8 marks]

(b) The joint density of two random variables and is given by

(, ) ∝ 22 exp(− − 5 − 4), , > 0.

i) Show that the marginal density of is proportional to

2−5

( + 4)3

.

ii) The density of a random variablewhich follows a Gamma distribution with parameters

and is

() ∝ −1 exp{−}.

Show how samples from the marginal distribution of can be obtained using the

rejection algorithm, using samples from a Gamma distribution with = 3, = 5.

iii) Assume that samples from the conditional distribution |(|) can be obtained

for any value of . (You do not have to find this distribution or how to sample from

it.) Explain how this and the above result can be used to sample from the joint

density of and to estimate [ ].

[20 marks]

(c) i) Consider the Nearest Neighbour estimator of a density . Carefully describe

the rationale behind this estimator, and discuss the influence of on the resulting

estimates.

ii) Three data points from an unknown density are observed, 1 = 1, 2 = 2 and

3 = 4. Calculate the Nearest Neighbour estimator of , with = 2.

[12 marks]

MATH4007-E1 Turn Over

3 MATH4007-E1

3. (a) Consider the density

(, ) ∝ 22 exp{− − 5 − 4}, > 0, > 0.

i) Find the full conditional distributions (|) and (|).

ii) Hence, describe a Gibbs sampler to sample from .

iii) Suppose instead that the Metropolis-Hastings algorithm is to be used to sample

from . Describe fully a random walk Metropolis algorithm which updates both

variables simultaneously, using proposals of the form

.

As part of your answer, discuss the role of on the performance of the sampler.

iv) Suppose = 2, the identity matrix, and the chain is currently in the state = 2,

= 1. Given 3 independent (0, 1) random numbers

= 0.2, = 0.8, = 0.45,

perform one update of the chain described in the previous part.

[28 marks]

(b) The following data are available, which are believed to be random samples from a

population with mean = 7

9.1 5.8 5.1 9.7 5.5 4.3 6.0

i) Explain why a randomisation test might be preferred to a t-test in order to test the

hypothesis 0 ∶ = 7.

ii) Describe a suitable randomisation test to test0 ∶ = 7, stating any assumptions

you make.

iii) Use the (0, 1) random numbers

{0.46, 0.84, 0.02, 0.76, 0.67, 0.53, 0.22}

in order to calculate one replicate of your test statistic for the test in (ii).

[12 marks]

MATH4007-E1 END