Home Page > > Details

Statistics GR 5205 004 / GU 4205 005

Columbia University

1 r>Binary Response Variable

In many regression applications, the response Y has only two possible

qualitative outcomes:

– Financial status of firm: sound status/headed toward insolvency

– Coronary heart disease status: has the disease/does not have the disease

– In a study of labor force participation of married women: married woman

in labor force/married woman not in labor force

The constraint on the mean responses to belong to [0, 1] rule out a

linear response function

2

Heart Disease Data

Response: “chd”: indicates whether the person has heart disease or not

The men vary in height (in inches) and the number of cigarettes

(cigs) smoked per day

> data(wcgs, package="faraway")

> summary(wcgs[,c("chd","height","cigs")])

chd height cigs

no :2897 Min. :60.00 Min. : 0.0

yes: 257 1st Qu.:68.00 1st Qu.: 0.0

Median :70.00 Median : 0.0

Mean :69.78 Mean :11.6

3rd Qu.:72.00 3rd Qu.:20.0

Max. :78.00 Max. :99.0

3

Heart Disease Data

> plot(height ~ chd, wcgs)

> wcgs$y <- ifelse(wcgs$chd == "no",0,1)

> plot(jitter(y,0.1) ~ jitter(height), wcgs, xlab="Height",

+ ylab="Heart Disease", pch=".")

Figure: Plots of the presence/absence of heart disease according to height.

4

Heart Disease Data

Predict heart disease and explain the relationship between height,

cigarette usage and heart disease.

For the same height and cigs, both outcomes occur. So we model the

probability of getting heart disease P(Y = 1 | X ) rather than Y itself

Figure: Interleaved histograms of the distribution of heights and cigarette usage for

men with and without heart disease.

5

Logistic Regression

Logistic regression defines the probability mass function:

P(Y = 1 | X) = exp(βX)

1 + exp(βX)

which implies that

P(Y = 0 | X) = 1 P(Y = 1 | X) = 1

1 + exp(βX)

where X is a (p + 1)-dim. vector with X0 ≡ 1, and β0 is the intercept

6

Logistic Regression

This plot shows P(Y = 1 | X) and P(Y = 0 | X), plotted as functions of βX

7

Logistic Regression

The logit function

logit(x) = log

(x1 x)

maps the unit interval (0, 1) to the entire real line (?∞,∞)

The inverse logit function, or expit function

expit(x) = logit?1(x) =

exp(x)

1 + exp(x)

maps the real line to the unit interval

In logistic regression, the inverse logit function is used to map the linear

predictor βX to a probability of Y = 1:

P(Y = 1 | X) = logit1(βX)

8

Logistic Regression

Geometric interpretation: a logistic regression fit based on two predictors can

be represented by a S-shape surface in the 3D space

9

Logistic Regression

The linear predictor in logistic regression is the conditional log odds:

log

[

P(Y = 1 | X)

P(Y = 0 | X)

]

= βX = β0 + β1X1 + · · ·+ βpXp

Interpret logistic regression: a one unit increase in Xj results in a change

of βj in the (conditional) log odds

Or: a one unit increase in Xj results in a multiplicative change of exp(βj)

in the conditional odds

exp(βj) is also called the odds ratio, as it is the ratio of the two odds,

corresponding to two scenarios where the values of Xj differ by one unit

10

Heart Disease Example

> lmod <- glm(chd ~ height + cigs, family = binomial, wcgs)

> summary(lmod)

Deviance Residuals:

Min 1Q Median 3Q Max

-1.0041 -0.4425 -0.3630 -0.3499 2.4357

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -4.50161 1.84186 -2.444 0.0145 *

height 0.02521 0.02633 0.957 0.3383

cigs 0.02313 0.00404 5.724 1.04e-08 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1781.2 on 3153 degrees of freedom

Residual deviance: 1749.0 on 3151 degrees of freedom

AIC: 1755

Number of Fisher Scoring iterations: 5 11

Heart Disease Example

(beta <- coef(lmod))

plot(jitter(y,0.1) ~ jitter(height), wcgs, xlab="Height",

ylab="Heart Disease",pch=".")

curve(ilogit(beta[1] + beta[2]*x + beta[3]*0),add=TRUE)

curve(ilogit(beta[1] + beta[2]*x + beta[3]*20),add=TRUE,lty=2)

plot(jitter(y,0.1) ~ jitter(cigs), wcgs, xlab="Cigarette Use",

ylab="Heart Disease",pch=".")

curve(ilogit(beta[1] + beta[2]*60 + beta[3]*x),add=TRUE)

curve(ilogit(beta[1] + beta[2]*78 + beta[3]*x),add=TRUE,lty=2)

12

Heart Disease Example

Figure: Predicted probability of heart disease. Left: solid line represents a

nonsmoker, the dashed line is a pack-a-day smoker. Right: solid line represents a

very short man (60 in.), the dashed line represents a very tall man (78 in.)

13

Latent variable model for logistic regression

It may make sense to view the binary outcome Y as being a

dichotomization of a latent continuous outcome Yc :

Y = I(Yc ≥ 0)

Suppose Yc | X follows a logistic distribution with CDF

F (Yc | X) = exp(Yc βX)

1 + exp(Yc βX)

In this case, Y | X follows the logistic regression model:

P(Y = 1 | X) = P(Yc ≥ 0 | X) = 1 exp(0 βX)

Mean and variance relationship for logistic regression

Since Y | X follows Bernoulli(logit(β?X)), the mean is

E[Y | X] = P(Y = 1 | X) = exp(βX)

1 + exp(β?X)

And the variance is

Var[Y | X] = P(Y = 1 | X) · P(Y = 0 | X)

=

exp(βX)

(1 + exp(βX))2

Since the variance depends on X, logistic regression models are always

heteroscedastic (unequal error variances)

15

Estimation in logistic regression

Assuming independent observations (x1, y1), . . . , (xn, yn), the

log-likelihood for logistic regression is

L(β | Y,X) = log

Contact Us(Ghostwriter Service)

- QQ：99515681
- WeChat：codinghelp
- Email：99515681@qq.com
- Work Time：8:00-23:00

- Ghostwriter Assign Q5debug R Programmi... 2024-06-19
- Ghostwriter Cs 231, Spring 2024 Assign... 2024-06-19
- Ghostwriter Mat 181 Programming For Sc... 2024-06-19
- Ghostwriter Ictten622 Produce Ict Netw... 2024-06-19
- Ghostwriter Cnit 17600 - Intro Compute... 2024-06-19
- Ghostwriter Eco3420 Financial Economic... 2024-06-19
- Ghostwriter Assessment 3: Projecthelp ... 2024-06-19
- Ghostwriter Ec 2 Principles Of Macroec... 2024-06-19
- Help With Cnit 17600 - Intro Computer ... 2024-06-19
- Help With Chemistry 30 Unit D Module 7... 2024-06-19
- Help With Avia 3410: Assignment 1Debug... 2024-06-19
- Ghostwriter Lineare Algebra Ii 2024Hel... 2024-06-19
- Ghostwriter Homework #1 - Mpcs 52072 -... 2024-06-19
- Ghostwriter Ma 134 Calculus I Spring 2... 2024-06-19
- Help With Fin3020s Introduction To Mac... 2024-06-19
- Help With 11175 Introduction To Econom... 2024-06-19
- Ghostwriter Fins5568 Capstone - Portfo... 2024-06-19
- Help With Mpcs 52072 - Gpu Programming... 2024-06-19
- Help With Chem 233 Assignment 4Help Wi... 2024-06-19
- Ghostwriter Efim20036: Limited Depende... 2024-06-19

Contact Us - Email：99515681@qq.com WeChat：codinghelp

© 2021 www.asgnhelp.com

Programming Assignment Help！