Home Page > > Details

Statistics GR 5205 004 / GU 4205 005

Columbia University

1 r>Binary Response Variable

In many regression applications, the response Y has only two possible

qualitative outcomes:

– Financial status of firm: sound status/headed toward insolvency

– Coronary heart disease status: has the disease/does not have the disease

– In a study of labor force participation of married women: married woman

in labor force/married woman not in labor force

The constraint on the mean responses to belong to [0, 1] rule out a

linear response function

2

Heart Disease Data

Response: “chd”: indicates whether the person has heart disease or not

The men vary in height (in inches) and the number of cigarettes

(cigs) smoked per day

> data(wcgs, package="faraway")

> summary(wcgs[,c("chd","height","cigs")])

chd height cigs

no :2897 Min. :60.00 Min. : 0.0

yes: 257 1st Qu.:68.00 1st Qu.: 0.0

Median :70.00 Median : 0.0

Mean :69.78 Mean :11.6

3rd Qu.:72.00 3rd Qu.:20.0

Max. :78.00 Max. :99.0

3

Heart Disease Data

> plot(height ~ chd, wcgs)

> wcgs$y <- ifelse(wcgs$chd == "no",0,1)

> plot(jitter(y,0.1) ~ jitter(height), wcgs, xlab="Height",

+ ylab="Heart Disease", pch=".")

Figure: Plots of the presence/absence of heart disease according to height.

4

Heart Disease Data

Predict heart disease and explain the relationship between height,

cigarette usage and heart disease.

For the same height and cigs, both outcomes occur. So we model the

probability of getting heart disease P(Y = 1 | X ) rather than Y itself

Figure: Interleaved histograms of the distribution of heights and cigarette usage for

men with and without heart disease.

5

Logistic Regression

Logistic regression defines the probability mass function:

P(Y = 1 | X) = exp(βX)

1 + exp(βX)

which implies that

P(Y = 0 | X) = 1 P(Y = 1 | X) = 1

1 + exp(βX)

where X is a (p + 1)-dim. vector with X0 ≡ 1, and β0 is the intercept

6

Logistic Regression

This plot shows P(Y = 1 | X) and P(Y = 0 | X), plotted as functions of βX

7

Logistic Regression

The logit function

logit(x) = log

(x1 x)

maps the unit interval (0, 1) to the entire real line (?∞,∞)

The inverse logit function, or expit function

expit(x) = logit?1(x) =

exp(x)

1 + exp(x)

maps the real line to the unit interval

In logistic regression, the inverse logit function is used to map the linear

predictor βX to a probability of Y = 1:

P(Y = 1 | X) = logit1(βX)

8

Logistic Regression

Geometric interpretation: a logistic regression fit based on two predictors can

be represented by a S-shape surface in the 3D space

9

Logistic Regression

The linear predictor in logistic regression is the conditional log odds:

log

[

P(Y = 1 | X)

P(Y = 0 | X)

]

= βX = β0 + β1X1 + · · ·+ βpXp

Interpret logistic regression: a one unit increase in Xj results in a change

of βj in the (conditional) log odds

Or: a one unit increase in Xj results in a multiplicative change of exp(βj)

in the conditional odds

exp(βj) is also called the odds ratio, as it is the ratio of the two odds,

corresponding to two scenarios where the values of Xj differ by one unit

10

Heart Disease Example

> lmod <- glm(chd ~ height + cigs, family = binomial, wcgs)

> summary(lmod)

Deviance Residuals:

Min 1Q Median 3Q Max

-1.0041 -0.4425 -0.3630 -0.3499 2.4357

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -4.50161 1.84186 -2.444 0.0145 *

height 0.02521 0.02633 0.957 0.3383

cigs 0.02313 0.00404 5.724 1.04e-08 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1781.2 on 3153 degrees of freedom

Residual deviance: 1749.0 on 3151 degrees of freedom

AIC: 1755

Number of Fisher Scoring iterations: 5 11

Heart Disease Example

(beta <- coef(lmod))

plot(jitter(y,0.1) ~ jitter(height), wcgs, xlab="Height",

ylab="Heart Disease",pch=".")

curve(ilogit(beta[1] + beta[2]*x + beta[3]*0),add=TRUE)

curve(ilogit(beta[1] + beta[2]*x + beta[3]*20),add=TRUE,lty=2)

plot(jitter(y,0.1) ~ jitter(cigs), wcgs, xlab="Cigarette Use",

ylab="Heart Disease",pch=".")

curve(ilogit(beta[1] + beta[2]*60 + beta[3]*x),add=TRUE)

curve(ilogit(beta[1] + beta[2]*78 + beta[3]*x),add=TRUE,lty=2)

12

Heart Disease Example

Figure: Predicted probability of heart disease. Left: solid line represents a

nonsmoker, the dashed line is a pack-a-day smoker. Right: solid line represents a

very short man (60 in.), the dashed line represents a very tall man (78 in.)

13

Latent variable model for logistic regression

It may make sense to view the binary outcome Y as being a

dichotomization of a latent continuous outcome Yc :

Y = I(Yc ≥ 0)

Suppose Yc | X follows a logistic distribution with CDF

F (Yc | X) = exp(Yc βX)

1 + exp(Yc βX)

In this case, Y | X follows the logistic regression model:

P(Y = 1 | X) = P(Yc ≥ 0 | X) = 1 exp(0 βX)

Mean and variance relationship for logistic regression

Since Y | X follows Bernoulli(logit(β?X)), the mean is

E[Y | X] = P(Y = 1 | X) = exp(βX)

1 + exp(β?X)

And the variance is

Var[Y | X] = P(Y = 1 | X) · P(Y = 0 | X)

=

exp(βX)

(1 + exp(βX))2

Since the variance depends on X, logistic regression models are always

heteroscedastic (unequal error variances)

15

Estimation in logistic regression

Assuming independent observations (x1, y1), . . . , (xn, yn), the

log-likelihood for logistic regression is

L(β | Y,X) = log

Contact Us(Ghostwriter Service)

- QQ：99515681
- WeChat：codinghelp
- Email：99515681@qq.com
- Work Time：8:00-23:00

- Help With Ece 380,Help With Java/Pytho... 2023-02-23
- Help With Econ5102,Python/Java Program... 2023-02-23
- Cisc 360Help With ,C/C++ Programminghe... 2023-02-23
- Help With Stat 411,Help With Java/Pyth... 2023-02-22
- Comp90048help With ,Help With Java，C/... 2023-02-22
- Help With Ma1510,Help With C++/Java Pr... 2023-02-21
- Csci561 Programminghelp With ,Help Wit... 2023-02-21
- Econ 178Help With ,Help With R Program... 2023-02-20
- Help With Ecmm461,C/C++ Programminghel... 2023-02-20
- Msmk7021help With ,C++，Python Program... 2023-02-20
- Comp5400help With ,Help With Java/Pyth... 2023-02-20
- Cse214help With ,Help With C/C++，Java... 2023-02-20
- Help With Math5965,Help With R Program... 2023-02-20
- Help With Comp9012,Help With Python Pr... 2023-01-23
- Comp9414: Artificial Intelligence Assi... 2023-01-05
- Comp9444 Assignment 1 Neural Networks ... 2023-01-05
- Final Assignment - Apply All Your Skil... 2023-01-05
- Data7202 Statistical Methods For Data ... 2023-01-04
- Comp307/Aiml420 Assignment 4: Planning... 2023-01-04
- Comp3170 Assignment 3 Moonlit Forest 2023-01-04

Contact Us - Email：99515681@qq.com WeChat：codinghelp

© 2021 www.asgnhelp.com

Programming Assignment Help！