# Help With GU 4205,Help With Java/Python Programming

Statistics GR 5205 004 / GU 4205 005
Columbia University
1 r>Binary Response Variable
In many regression applications, the response Y has only two possible
qualitative outcomes:
– Financial status of firm: sound status/headed toward insolvency
– Coronary heart disease status: has the disease/does not have the disease
– In a study of labor force participation of married women: married woman
in labor force/married woman not in labor force
The constraint on the mean responses to belong to [0, 1] rule out a
linear response function
2
Heart Disease Data
Response: “chd”: indicates whether the person has heart disease or not
The men vary in height (in inches) and the number of cigarettes
(cigs) smoked per day
> data(wcgs, package="faraway")
> summary(wcgs[,c("chd","height","cigs")])
chd height cigs
no :2897 Min. :60.00 Min. : 0.0
yes: 257 1st Qu.:68.00 1st Qu.: 0.0
Median :70.00 Median : 0.0
Mean :69.78 Mean :11.6
3rd Qu.:72.00 3rd Qu.:20.0
Max. :78.00 Max. :99.0
3
Heart Disease Data
> plot(height ~ chd, wcgs)
> wcgs\$y <- ifelse(wcgs\$chd == "no",0,1)
> plot(jitter(y,0.1) ~ jitter(height), wcgs, xlab="Height",
+ ylab="Heart Disease", pch=".")
Figure: Plots of the presence/absence of heart disease according to height.
4
Heart Disease Data
Predict heart disease and explain the relationship between height,
cigarette usage and heart disease.
For the same height and cigs, both outcomes occur. So we model the
probability of getting heart disease P(Y = 1 | X ) rather than Y itself
Figure: Interleaved histograms of the distribution of heights and cigarette usage for
men with and without heart disease.
5
Logistic Regression
Logistic regression defines the probability mass function:
P(Y = 1 | X) = exp(βX)
1 + exp(βX)
which implies that
P(Y = 0 | X) = 1 P(Y = 1 | X) = 1
1 + exp(βX)
where X is a (p + 1)-dim. vector with X0 ≡ 1, and β0 is the intercept
6
Logistic Regression
This plot shows P(Y = 1 | X) and P(Y = 0 | X), plotted as functions of βX
7
Logistic Regression
The logit function
logit(x) = log
(x1 x)
maps the unit interval (0, 1) to the entire real line (?∞,∞)
The inverse logit function, or expit function
expit(x) = logit?1(x) =
exp(x)
1 + exp(x)
maps the real line to the unit interval
In logistic regression, the inverse logit function is used to map the linear
predictor βX to a probability of Y = 1:
P(Y = 1 | X) = logit1(βX)
8
Logistic Regression
Geometric interpretation: a logistic regression fit based on two predictors can
be represented by a S-shape surface in the 3D space
9
Logistic Regression
The linear predictor in logistic regression is the conditional log odds:
log
[
P(Y = 1 | X)
P(Y = 0 | X)
]
= βX = β0 + β1X1 + · · ·+ βpXp
Interpret logistic regression: a one unit increase in Xj results in a change
of βj in the (conditional) log odds
Or: a one unit increase in Xj results in a multiplicative change of exp(βj)
in the conditional odds
exp(βj) is also called the odds ratio, as it is the ratio of the two odds,
corresponding to two scenarios where the values of Xj differ by one unit
10
Heart Disease Example
> lmod <- glm(chd ~ height + cigs, family = binomial, wcgs)
> summary(lmod)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.0041 -0.4425 -0.3630 -0.3499 2.4357
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.50161 1.84186 -2.444 0.0145 *
height 0.02521 0.02633 0.957 0.3383
cigs 0.02313 0.00404 5.724 1.04e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1781.2 on 3153 degrees of freedom
Residual deviance: 1749.0 on 3151 degrees of freedom
AIC: 1755
Number of Fisher Scoring iterations: 5 11
Heart Disease Example
(beta <- coef(lmod))
plot(jitter(y,0.1) ~ jitter(height), wcgs, xlab="Height",
ylab="Heart Disease",pch=".")
plot(jitter(y,0.1) ~ jitter(cigs), wcgs, xlab="Cigarette Use",
ylab="Heart Disease",pch=".")
12
Heart Disease Example
Figure: Predicted probability of heart disease. Left: solid line represents a
nonsmoker, the dashed line is a pack-a-day smoker. Right: solid line represents a
very short man (60 in.), the dashed line represents a very tall man (78 in.)
13
Latent variable model for logistic regression
It may make sense to view the binary outcome Y as being a
dichotomization of a latent continuous outcome Yc :
Y = I(Yc ≥ 0)
Suppose Yc | X follows a logistic distribution with CDF
F (Yc | X) = exp(Yc βX)
1 + exp(Yc βX)
In this case, Y | X follows the logistic regression model:
P(Y = 1 | X) = P(Yc ≥ 0 | X) = 1 exp(0 βX)
Mean and variance relationship for logistic regression
Since Y | X follows Bernoulli(logit(β?X)), the mean is
E[Y | X] = P(Y = 1 | X) = exp(βX)
1 + exp(β?X)
And the variance is
Var[Y | X] = P(Y = 1 | X) · P(Y = 0 | X)
=
exp(βX)
(1 + exp(βX))2
Since the variance depends on X, logistic regression models are always
heteroscedastic (unequal error variances)
15
Estimation in logistic regression
Assuming independent observations (x1, y1), . . . , (xn, yn), the
log-likelihood for logistic regression is
L(β | Y,X) = log