Sunday, July 8, 2012

Probit vs. logit

Repost from Econometrics by Simulation:

* This is a comparison of how well the logit does relative to the probit when the data is generated from the assumptions underlying the probit.

* First let's generate data that is consistent with the probit assumptions

clear
set seed 101
set obs 1000

* x is the explanatory variable
gen x = rnormal()*(1/2)^(1/2)

* u is the error
gen u = rnormal()*(1/2)^(1/2)

* y is the unobservable structural y
gen y = x + u

* In order to do a probit correctly the underlying distribution has to be standard normal (which is not a restriction so long as you remember when generating values.)

* This is why rnormal*(1/2)^(1/2) -> var(y)=var(x+u)=(1/2)*1+(1/2)*1=1
sum y
* Pretty close to 1 in the sample

* y_prob is the probability of observing a 1 given
gen y_prob = normal(y)

* y is the actual binary draws
gen y_observed = rbinomial(1,y_prob)

* now let's try estimating probit first
probit y_observed x
  * Save the estimated coefficient to a local macro
  local coef_probit = _b[x]

* let's predict the probabilities
predict y_probit
  label var y_probit "Probit fitted values"

* Now let's estimate the logit
logit y_observed x
* Save the logit to a local macro
local coef_logit = _b[x]

* predict the probabilities from the logit
predict y_logit
  label var y_logit "Logit fitted values"

* We can see that both the probit and the logit are almost identical
two (line y_logit x, sort) (line y_probit x, sort)


di "It is a somewhat well known property that probits and logits are in practice almost linearly equivalent."
di "The ratio of probit to logit is: `=`coef_probit'/`coef_logit''"

reg y_probit y_logit
* Check out that R-squared!

* So what does all of this practically mean?

* Feel free to switch between probit and logit whenever you want.  The choice should not generally significantly affect your estimates.

* Note: for mathematical reasons sometimes it is easier using one over the other.

* Finally, if you want to recover the original coefficient on x the best thing to do is to take the average partial effect (APE)

probit y_observed x

di (1/2)^(1/2)

test x==.70710678
* The probit results get fairly close but we reject the null

No comments:

Post a Comment