# MATH5945Help With ,c/c++,java ProgrammingHelp With

MATH5945: Categorical Data Analysis
Term 3, 2022
Assignment 2
Submission deadline: Friday 28 October, 12:00pm
Deliverables: 2 files uploaded to Moodle: (1) PDF file of your worked solutions, and (2)
SAS file forALL computations. Files names should be surname firstname z123456789 ASS2.
Assignment length: There is a 5 page limit and minimum 12pt font size. Any pages
exceeding this limit or submissions with smaller font sizes will not be marked. Handwritten
assignments will not be accepted. This does not include a SAS file of your code. Your
document should begin with the Plagiarism Statement below (copy-and-paste it).
SAS code: All computations must be performed using SAS. Your SAS code must run as
is and I should not need to modify your code in any way to make it work. You may create
a library to import data, but any other code should only use the WORK library (you may
assume data files of the same name are in my WORK library). SAS should be used for
computing only and answers given only within SAS code will not be marked.
Penalties: Failure to adhere to instructions will result in a minimum 5% mark reduction.
Name: Student Number:
I declare that this assessment item is my own work, except where acknowledged,
and has not been submitted for academic credit elsewhere, and acknowledge that
the assessor of this item may, for the purpose of assessing this item:
Reproduce this assessment item and provide a copy to another member of the
University; and/or,
Communicate a copy of this assessment item to a plagiarism checking service
(which may then retain a copy of the assessment item on its database for the
purpose of future plagiarism checking).
I certify that I have read and understood the University Rules in respect of Student
Signed: Date:
1
1. Consider the observed 2× 2× 2 table created from the binary variables X, Y and Z.
In this case, we are interested in assessing the relationship of X on to Y , while the
variable Z may interact or confound this relationship.
z = 0
y = 0 y = 1
x = 0 a0 b0
x = 1 c0 d0
z = 1
y = 0 y = 1
x = 0 a1 b1
x = 1 c1 d1
The stratified tables have conditional odds ratios ψ?0 = a0d0/(b0c0) and ψ?1 = a1d1/(b1c1).
(a) Using this setup, show that the square root of Woolf’s test for interaction statistic
X 2W =
1∑
i=0
wi
(
log ψ?i ? log ψ?W
)2
can be written as the difference in two independent log odds ratios divided by
its standard error, where log ψ?W is Woolf’s summary log odds ratio and wi =
(1/ai + 1/bi + 1/ci + 1/di)
?1.
(b) Using the values in the above table, now consider the saturated logistic model for
πij = P(Yij = 1)
log
πij
1? πij = β0 + β1xi + β2zj + β3xizj
i. Write out the log-likelihood function for β = (β0, β1, β2, β3)

ii. Find the score function ? logL(β)

, the partial derivatives of the log-likelihood
function.
iii. Find the maximum likelihood estimator for eβ3 , the exponentiated parameter
for the interaction term, by solving the system of equations
? logL(β?)
?β?
= 0.
(c) Using your results from (a), (b) and this element of the inverse of the observed
Fisher’s information matrix for the logistic model
J
,
demonstrate that Woolf’s test for interaction is equivalent to inference for the
interaction term in a saturated logistic regression model for a 2× 2× 2 table.
2
2. The SAS datafile injury contains data from motor vehicle passengers injured in a
crash. The dataset contains the variables:
Name Values
sex female, male
location rural, urban
seatbelt no, yes
injury no, yes
freq 0, 1, 2, . . .
We would like to fit a log-linear model to the four-way contingency table created
from these variables. For ease in interpretation, denote these variables by S, L, B,
I, respectively, in model shorthand and use numbers 1, 2 to identify the levels of a
variable. For example, τS1 represents female sex.
Make this data available in SAS by creating a libname for its location on your computer
and copy this file to your WORK folder using the same filename, i.e., injury.
(a) Check the goodness of fit of the following hierarchical models:
(M1) main effects only
(M2) all two-way interaction terms
(M3) all three-way interaction terms, and
(M4) all four-way interaction terms (saturated)
What is the lowest order model that reasonably fits the data? Give reasons.
(b) Based on the model chosen in part (a), perform forward selection using partitioned
G2 statistics to choose a “best model”. Justify your steps.
(c) Answer these questions regarding the model chosen in part (b).
i. Write out the log-linear model and its logit equivalent using Injury (I) as
the response variable in symbolic form (i.e., τ notation).
ii. Using the symbolic log-linear model, what is the odds ratio ψ(I) of injury for
an individual who wore a seatbelt compared to someone who did not?
iii. What is the estimate of ψ(I) and its 95% confidence interval using the es-
timated model? Be sure to provide strata-level estimates if the final model
includes interaction term(s).