STAT 440 Midterm 2

STAT 440 - Fall 2021 - Midterm 2

Recall that you may use your notes, books, or even the internet to help answer these questions, but all of the
work should be your own and you should not ask anyone for help or about any details related to the class
and project during this 96 hour period (this includes face to face interactions, emails, internet forums, etc.).
You have all of Thursday, Friday, Saturday, and Sunday to work on this midterm You are to turn it in by
the organization/clarity of your write up. You should have plenty of writing to convey what you are doing as
You should turn in an RMarkdown file as well as the output through Canvas. If you have any clarifying
questions or notice any issues, don’t hesitate to contact me and Haotian. Note that in some of these questions,
it is up to you to select certain things; this is done on purpose so choose wisely and explain your decisions.
1. Consider the following density
f(x) = Cx3 ex4
for x ≥ 0.
a. Find the CDF and determine the normalizing constant C.
b. Use the inverse CDF method to simulate 100,000 draws from f. Plot the histogram of your sample.
Create a second plot where you zoom in on the x-axis and plot a kernel density estimate as well as
the true density. Comment on the results.
c. Suppose you want to use the normal density with mean 0 and variance σ2
, call it g(x|σ2), to
produce samples from f. Find a constant such that
f(x) g(x|σ2) ≤ M,
for all x. Note this constant can/should depend on σ2
. Feel free to either do this analytically or
do this numerically for a few different values of σ2
. Try to find a σ2
, which produces a small value
of M. Provide a few plots to justify your choice (or show the mathematics if you can).
d. Using your choice of σ2
from above, produce a sample of size 10,000 from f using the accept/reject
method. Produce a histogram and use the sample to estimate the mean of f and produce a
e. Using the same σ2 and g, use importance sampling (sample size 10,000) to again estimate the
mean of f and produce a standard error for your estimate. Compare with what you saw in (d).
2. The file Szeged_Weather_Summary.csv contains monthly averages for different weather metrics in
Szeged, Hungary.
a. The variable WindBearing denotes the direction in which the wind is originating. The units are in
degrees with 0 denoting due north, 90 due east, 180 due south, and 270 due west. All of the winds
come from either the south east or south west. Create a new variable, Direction which indicates if
the direction is southeast (<=180) or southwest (>180). Construct boxplots of temperature vs
Direction.
b. Use a permutation test to determine if Direction is associated with Temperature (measured in
Celcius).
1
c. Use a bootstrap method to construct a 95% confidence interval for the effect of Direction on
Temperature (response variable here is Temperature). Do both a parametric and nonparametric
bootstrap. Compare the results.
d. Pick another variable of your choice to associate with Temperature while also including Direction as
another predictor. Explain why you think this variable is either important or interesting (to you).
Fit a linear regression model with the two predictors and use a bootstrap method to construct a
95% confidence interval of the two variables. Interpret your results in the context of the problem.
e. Suppose we wish to compare Temperature and ApparentTemp, as we suspect they may be quite
similar. Let µ1 be the true mean of Temperature and µ2 be the true mean of ApparentTemp. Use
the nonparametric bootstrap to test H0 : µ1 = µ2 versus H1 : µ1 = µ2. Use a 5% significance level.
Perform similar two-tailed tests using nonparametric bootstrap for the median and IQR. For each
test, be sure to include the test statistic, p-value, and a proper conclusion.
3. Unnormalized Density. Let Z be a random variable defined on the interval Z ∈ (0, ∞) with pdf:
fZ(z; a, b) = 1C za 1e ( zb )a
, a, b > 0.
a. Normalizing constant. Find C, for any value of a and b, such that f is a valid density function.
b. Simulation. Simulate 10,000 iid random variables from this distribution with a = 2 and b = 3
and plot a histogram of your 10,000 samples. Overlay a plot of the true density on top of your
histogram.
c. Moments. Calculate (or estimate) the mean and variance of Z when a = 2 and b = 3. If
you estimate these numerically, provide evidence that your estimates are accurate to at least 2
significant digits.
d. Tail Probability by Monte Carlo. Use your 10,000 samples to estimate the probability
P(Z > 8). Report your estimate and the standard error of your estimate. Also, compare your
estimate to the true value of this probability.
e. Tail Probability by Importance Sampling. Estimate P(Z > 8) again, this time using
importance sampling. For your proposal distribution for importance sampling use and exponential
random variable with λ = 2, shifted to have support on (8, ∞) (i.e. with density fX(x) = λe λ(x 8)
,
for x > 8). Again use 10,000 samples. Report your estimate and the standard error of your
estimate. Is the IS estimate better than the MC estimate? Hint: it is easy to generate from
the proposal distribution using rexp(10000, lambda=2) + 8, and similarly easy to calculate the
density using dexp.
f. MLE. The file “Midterm1.csv” contains 200 iid samples from Z with unknown parameters a
and b, which are to be estimated. Read in this data and numerically estimate the maximum
likelihood estimates of a and b. (You cannot derive the estimators analytically. You must use
some optimization method to conduct the estimation numerically. Clearly state your algorithm.)
g. Bootstrap Suppose we are interested in the mean of Z in (f). Use parametric and nonparametric
bootstrap to construct 95% confidence intervals of E(Z). Compare your results.
h. Confidence Intervals Construct and report asymptotic 95% confidence intervals for aˆ and ˆb.
Clearly show your steps. Conduct simulations to verify it.
i. Permutation Test Using the same data as in (f), suppose I doubt that the first 100 and the last
100 samples do not come from a same distribution. Using a permutation test to justify.
j. Posterior Using the same data as in (f) and given b = 3, suppose the prior distribution of a
follows an exponential distribution with λ = 3. What is the posterior distribution of it? Derive it
analytically and plot the posterior density. Provide a 95% credible interval for a. 2