Help With GU 4205,Help With Java/Python Programming

Statistics GR 5205 004 / GU 4205 005
Columbia University
1 r>Binary Response Variable
In many regression applications, the response Y has only two possible
qualitative outcomes:
– Financial status of firm: sound status/headed toward insolvency
– Coronary heart disease status: has the disease/does not have the disease
– In a study of labor force participation of married women: married woman
in labor force/married woman not in labor force
The constraint on the mean responses to belong to [0, 1] rule out a
linear response function
2
Heart Disease Data
Response: “chd”: indicates whether the person has heart disease or not
The men vary in height (in inches) and the number of cigarettes
(cigs) smoked per day
> data(wcgs, package="faraway")
> summary(wcgs[,c("chd","height","cigs")])
chd height cigs
no :2897 Min. :60.00 Min. : 0.0
yes: 257 1st Qu.:68.00 1st Qu.: 0.0
Median :70.00 Median : 0.0
Mean :69.78 Mean :11.6
3rd Qu.:72.00 3rd Qu.:20.0
Max. :78.00 Max. :99.0
3
Heart Disease Data
> plot(height ~ chd, wcgs)
> wcgs$y <- ifelse(wcgs$chd == "no",0,1)
> plot(jitter(y,0.1) ~ jitter(height), wcgs, xlab="Height",
+ ylab="Heart Disease", pch=".")
Figure: Plots of the presence/absence of heart disease according to height.
4
Heart Disease Data
Predict heart disease and explain the relationship between height,
cigarette usage and heart disease.
For the same height and cigs, both outcomes occur. So we model the
probability of getting heart disease P(Y = 1 | X ) rather than Y itself
Figure: Interleaved histograms of the distribution of heights and cigarette usage for
men with and without heart disease.
5
Logistic Regression
Logistic regression defines the probability mass function:
P(Y = 1 | X) = exp(βX)
1 + exp(βX)
which implies that
P(Y = 0 | X) = 1 P(Y = 1 | X) = 1
1 + exp(βX)
where X is a (p + 1)-dim. vector with X0 ≡ 1, and β0 is the intercept
6
Logistic Regression
This plot shows P(Y = 1 | X) and P(Y = 0 | X), plotted as functions of βX
7
Logistic Regression
The logit function
logit(x) = log
(x1 x)
maps the unit interval (0, 1) to the entire real line (?∞,∞)
The inverse logit function, or expit function
expit(x) = logit?1(x) =
exp(x)
1 + exp(x)
maps the real line to the unit interval
In logistic regression, the inverse logit function is used to map the linear
predictor βX to a probability of Y = 1:
P(Y = 1 | X) = logit1(βX)
8
Logistic Regression
Geometric interpretation: a logistic regression fit based on two predictors can
be represented by a S-shape surface in the 3D space
9
Logistic Regression
The linear predictor in logistic regression is the conditional log odds:
log
[
P(Y = 1 | X)
P(Y = 0 | X)
]
= βX = β0 + β1X1 + · · ·+ βpXp
Interpret logistic regression: a one unit increase in Xj results in a change
of βj in the (conditional) log odds
Or: a one unit increase in Xj results in a multiplicative change of exp(βj)
in the conditional odds
exp(βj) is also called the odds ratio, as it is the ratio of the two odds,
corresponding to two scenarios where the values of Xj differ by one unit
10
Heart Disease Example
> lmod <- glm(chd ~ height + cigs, family = binomial, wcgs)
> summary(lmod)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.0041 -0.4425 -0.3630 -0.3499 2.4357
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.50161 1.84186 -2.444 0.0145 *
height 0.02521 0.02633 0.957 0.3383
cigs 0.02313 0.00404 5.724 1.04e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1781.2 on 3153 degrees of freedom
Residual deviance: 1749.0 on 3151 degrees of freedom
AIC: 1755
Number of Fisher Scoring iterations: 5 11
Heart Disease Example
(beta <- coef(lmod))
plot(jitter(y,0.1) ~ jitter(height), wcgs, xlab="Height",
ylab="Heart Disease",pch=".")
curve(ilogit(beta[1] + beta[2]*x + beta[3]*0),add=TRUE)
curve(ilogit(beta[1] + beta[2]*x + beta[3]*20),add=TRUE,lty=2)
plot(jitter(y,0.1) ~ jitter(cigs), wcgs, xlab="Cigarette Use",
ylab="Heart Disease",pch=".")
curve(ilogit(beta[1] + beta[2]*60 + beta[3]*x),add=TRUE)
curve(ilogit(beta[1] + beta[2]*78 + beta[3]*x),add=TRUE,lty=2)
12
Heart Disease Example
Figure: Predicted probability of heart disease. Left: solid line represents a
nonsmoker, the dashed line is a pack-a-day smoker. Right: solid line represents a
very short man (60 in.), the dashed line represents a very tall man (78 in.)
13
Latent variable model for logistic regression
It may make sense to view the binary outcome Y as being a
dichotomization of a latent continuous outcome Yc :
Y = I(Yc ≥ 0)
Suppose Yc | X follows a logistic distribution with CDF
F (Yc | X) = exp(Yc βX)
1 + exp(Yc βX)
In this case, Y | X follows the logistic regression model:
P(Y = 1 | X) = P(Yc ≥ 0 | X) = 1 exp(0 βX)
Mean and variance relationship for logistic regression
Since Y | X follows Bernoulli(logit(β?X)), the mean is
E[Y | X] = P(Y = 1 | X) = exp(βX)
1 + exp(β?X)
And the variance is
Var[Y | X] = P(Y = 1 | X) · P(Y = 0 | X)
=
exp(βX)
(1 + exp(βX))2
Since the variance depends on X, logistic regression models are always
heteroscedastic (unequal error variances)
15
Estimation in logistic regression
Assuming independent observations (x1, y1), . . . , (xn, yn), the
log-likelihood for logistic regression is
L(β | Y,X) = log

QQ：99515681
WeChat：codinghelp
Email：99515681@qq.com
Work Time：8:00-23:00

Hots

Ghostwriter Cs1b Spring 2024 Tth Hw08h... 2024-04-19
Help With Managing Financial Risk Prob... 2024-04-19
Ghostwriter Cs 0449 – Project 5: /Dev/ 2024-04-19
Ghostwriter Elec 2141 Digital Circuit ... 2024-04-19
Help With Csc171 — Videogame Projecthe 2024-04-19
Help With Comp3411 Artificial Intellig 2024-04-19
Help With Stat3061: Random Processes &... 2024-04-19
Ghostwriter Accounting 452, Spring 202... 2024-04-19
Ghostwriter Finc5001 Foundations In Fi... 2024-04-19
Ghostwriter 7Ssmm712 – Topics In Appli 2024-04-19
Help With Com 337 - Film Studies For T... 2024-04-19
Ghostwriter Mes202tc - Digital Vlsi Sy... 2024-04-19
Ghostwriter Geography 2041B Distance S... 2024-04-19
Ghostwriter Ecos3006 International Tra... 2024-04-19
Help With Fit5225 2024 Sm1 Creating An... 2024-04-19
Help With Cit 593: Introduction To Com... 2024-04-19
Help With Math 4931: Take Home Examgho... 2024-04-19
Ghostwriter Csci 547|Info 533: Systems... 2024-04-19
Ghostwriter Cs536-S24 Intro To Pls And... 2024-04-19
Help With Fit5212 - Assignment 1Ghostw... 2024-04-19

Programming Assignment Help！