STAT 440 Midterm 2

STAT 440 - Fall 2021 - Midterm 2

Recall that you may use your notes, books, or even the internet to help answer these questions, but all of the

work should be your own and you should not ask anyone for help or about any details related to the class

and project during this 96 hour period (this includes face to face interactions, emails, internet forums, etc.).

You have all of Thursday, Friday, Saturday, and Sunday to work on this midterm You are to turn it in by

midnight on Sunday You will be graded on the accuracy of your answers, the efficiency of your coding, and

the organization/clarity of your write up. You should have plenty of writing to convey what you are doing as

well as comments in your code to explain and make it easier to read. Show all of your work.

You should turn in an RMarkdown file as well as the output through Canvas. If you have any clarifying

questions or notice any issues, don’t hesitate to contact me and Haotian. Note that in some of these questions,

it is up to you to select certain things; this is done on purpose so choose wisely and explain your decisions.

1. Consider the following density

f(x) = Cx3 ex4

for x ≥ 0.

a. Find the CDF and determine the normalizing constant C.

b. Use the inverse CDF method to simulate 100,000 draws from f. Plot the histogram of your sample.

Create a second plot where you zoom in on the x-axis and plot a kernel density estimate as well as

the true density. Comment on the results.

c. Suppose you want to use the normal density with mean 0 and variance σ2

, call it g(x|σ2), to

produce samples from f. Find a constant such that

f(x) g(x|σ2) ≤ M,

for all x. Note this constant can/should depend on σ2

. Feel free to either do this analytically or

do this numerically for a few different values of σ2

. Try to find a σ2

, which produces a small value

of M. Provide a few plots to justify your choice (or show the mathematics if you can).

d. Using your choice of σ2

from above, produce a sample of size 10,000 from f using the accept/reject

method. Produce a histogram and use the sample to estimate the mean of f and produce a

standard error of your estimate.

e. Using the same σ2 and g, use importance sampling (sample size 10,000) to again estimate the

mean of f and produce a standard error for your estimate. Compare with what you saw in (d).

2. The file Szeged_Weather_Summary.csv contains monthly averages for different weather metrics in

Szeged, Hungary.

a. The variable WindBearing denotes the direction in which the wind is originating. The units are in

degrees with 0 denoting due north, 90 due east, 180 due south, and 270 due west. All of the winds

come from either the south east or south west. Create a new variable, Direction which indicates if

the direction is southeast (<=180) or southwest (>180). Construct boxplots of temperature vs

Direction.

b. Use a permutation test to determine if Direction is associated with Temperature (measured in

Celcius).

c. Use a bootstrap method to construct a 95% confidence interval for the effect of Direction on

Temperature (response variable here is Temperature). Do both a parametric and nonparametric

bootstrap. Compare the results.

d. Pick another variable of your choice to associate with Temperature while also including Direction as

another predictor. Explain why you think this variable is either important or interesting (to you).

Fit a linear regression model with the two predictors and use a bootstrap method to construct a

95% confidence interval of the two variables. Interpret your results in the context of the problem.

e. Suppose we wish to compare Temperature and ApparentTemp, as we suspect they may be quite

similar. Let µ1 be the true mean of Temperature and µ2 be the true mean of ApparentTemp. Use

the nonparametric bootstrap to test H0 : µ1 = µ2 versus H1 : µ1 = µ2. Use a 5% significance level.

Perform similar two-tailed tests using nonparametric bootstrap for the median and IQR. For each

test, be sure to include the test statistic, p-value, and a proper conclusion.

3. Unnormalized Density. Let Z be a random variable defined on the interval Z ∈ (0, ∞) with pdf:

fZ(z; a, b) = 1C za 1e ( zb )a

, a, b > 0.

a. Normalizing constant. Find C, for any value of a and b, such that f is a valid density function.

b. Simulation. Simulate 10,000 iid random variables from this distribution with a = 2 and b = 3

and plot a histogram of your 10,000 samples. Overlay a plot of the true density on top of your

histogram.

c. Moments. Calculate (or estimate) the mean and variance of Z when a = 2 and b = 3. If

you estimate these numerically, provide evidence that your estimates are accurate to at least 2

significant digits.

d. Tail Probability by Monte Carlo. Use your 10,000 samples to estimate the probability

P(Z > 8). Report your estimate and the standard error of your estimate. Also, compare your

estimate to the true value of this probability.

e. Tail Probability by Importance Sampling. Estimate P(Z > 8) again, this time using

importance sampling. For your proposal distribution for importance sampling use and exponential

random variable with λ = 2, shifted to have support on (8, ∞) (i.e. with density fX(x) = λe λ(x 8)

for x > 8). Again use 10,000 samples. Report your estimate and the standard error of your

estimate. Is the IS estimate better than the MC estimate? Hint: it is easy to generate from

the proposal distribution using rexp(10000, lambda=2) + 8, and similarly easy to calculate the

density using dexp.

f. MLE. The file “Midterm1.csv” contains 200 iid samples from Z with unknown parameters a

and b, which are to be estimated. Read in this data and numerically estimate the maximum

likelihood estimates of a and b. (You cannot derive the estimators analytically. You must use

some optimization method to conduct the estimation numerically. Clearly state your algorithm.)

g. Bootstrap Suppose we are interested in the mean of Z in (f). Use parametric and nonparametric

bootstrap to construct 95% confidence intervals of E(Z). Compare your results.

h. Confidence Intervals Construct and report asymptotic 95% confidence intervals for aˆ and ˆb.

Clearly show your steps. Conduct simulations to verify it.

i. Permutation Test Using the same data as in (f), suppose I doubt that the first 100 and the last

100 samples do not come from a same distribution. Using a permutation test to justify.

j. Posterior Using the same data as in (f) and given b = 3, suppose the prior distribution of a

follows an exponential distribution with λ = 3. What is the posterior distribution of it? Derive it

analytically and plot the posterior density. Provide a 95% credible interval for a. 2

QQ：99515681
WeChat：codinghelp
Email：99515681@qq.com
Work Time：8:00-23:00

Hots

Ghostwriter Cs1b Spring 2024 Tth Hw08h... 2024-04-19
Help With Managing Financial Risk Prob... 2024-04-19
Ghostwriter Cs 0449 – Project 5: /Dev/ 2024-04-19
Ghostwriter Elec 2141 Digital Circuit ... 2024-04-19
Help With Csc171 — Videogame Projecthe 2024-04-19
Help With Comp3411 Artificial Intellig 2024-04-19
Help With Stat3061: Random Processes &... 2024-04-19
Ghostwriter Accounting 452, Spring 202... 2024-04-19
Ghostwriter Finc5001 Foundations In Fi... 2024-04-19
Ghostwriter 7Ssmm712 – Topics In Appli 2024-04-19
Help With Com 337 - Film Studies For T... 2024-04-19
Ghostwriter Mes202tc - Digital Vlsi Sy... 2024-04-19
Ghostwriter Geography 2041B Distance S... 2024-04-19
Ghostwriter Ecos3006 International Tra... 2024-04-19
Help With Fit5225 2024 Sm1 Creating An... 2024-04-19
Help With Cit 593: Introduction To Com... 2024-04-19
Help With Math 4931: Take Home Examgho... 2024-04-19
Ghostwriter Csci 547|Info 533: Systems... 2024-04-19
Ghostwriter Cs536-S24 Intro To Pls And... 2024-04-19
Help With Fit5212 - Assignment 1Ghostw... 2024-04-19

Programming Assignment Help！