We will focus on the practical challenges that empirical work has to face. These challenges require a purposeful choice of an estimation strategy (a research design), and we will study the strengths and weaknesses of leading approaches. This will enable us to identify good research practice (which is at the heart of what some leading empirical labour econometrician have labelled the Credibility Revolution in Empirical Economics).
Course outline :
Empirical methods for labour economics
All methods will be illustrated with real-world data using R, and several papers in applied labour economics will be discussed.
(I) Selection biases : Problem and Remedies.
Standard econometric tools require that our datasets be random samples, thus being representative of the population or universe of interest. However, this requirement is rarely satisfied in practice. Individuals optimally self-select into economic states, rendering the observed states non-random. For instance, individuals decide whether to accept jobs, and the study of the wage distribution can only use this self-selected group thus yielding distorted results (selection biases). We study Heckman’s idea of attempting to remove this sample selection bias by modelling it explicitly (a Roy model).
Applications : Heckman and Honore (1990, ECTA). “The Empirical Content of the Roy Model.” Roy models of migration : Borjas (1999, Handbook of LabEcon), “The Economic Analysis of Immigration”, Chiquiar and Hanson (2005, JPE), “International Migration, Self‐Selection, and the Distribution of Wages : Evidence from Mexico and the United States” ; Gurgand, M. and D.N. Margolis (2008, JPubE), “Does work pay in France ? Monetary incentives, hours constraints, and the guaranteed minimum income”.
(II) Unobserved Heterogeneity : Fixed Effects, Panel Data Estimators, and Difference-in-Difference
Usually, we cannot measure or observe everything that is relevant for the determination of outcomes. Such unobservable heterogeneity then poses serious problems for the researcher if it is correlated with control variables (the omitted variables problem). Overcoming this problem using an instrumental variables strategy is often not feasible in practice since credible instruments are very difficult to find. However, if we observe the same individual over several periods, such panel data can offer a solution. We will develop and put into practice empirical methods for estimation and inference that exploit such a panel structure. After reviewing the classic approaches, we then proceed to discuss some important papers from the theoretical and applied econometrics literatures. An important setting in which panel data methods have become very popular is the estimation of causal policy effects of natural experiments. We will discuss the challenges and limitations of such difference-in-difference (DiD) strategies.
Applications : Ruhm, C.J. (1996, JoHE), “Alcohol policies and highway vehicle fatalities” ; Card, D., J. Heining, and P. Kline. (2013, QJE) “Workplace Heterogeneity and the Rise of West German Wage Inequality” ; Duflo, E. (2001, AER),” Schooling and Labor Market Consequences of School Construction in Indonesia : Evidence from an Unusual Policy Experiment”
While DiD is a popular estimation method in situations in which the researcher disposes over a panel, drawing inference and testing is challenging is challenging. We consider several such situations which arise when errors are correlated within a group or across time.
Further readings : Moulton (1990, RESTAT), « An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Units » ; Donald and Lang (2007, RESTAT) « Inference with Difference-in-Differences and Other Panel Data » ; Bertrand, Duflo, Mullainathan (2004, QJE), « How Much Should We Trust Differences-in-Differences Estimates ? »
What are the variations in the data that we seek to exploit in order to estimate the objects of our interest (typically the coefficients) ? Are these variations random, or a manifestation of choices ? An econometric model is identified if we can unique solve for these model coefficients. To achieve this, we usually have to impose some structure, such as the assumption that the error term in the linear regression be uncorrelated with the regressors (identification hypothesis). The validity of the chosen empirical strategy therefore depends on the empirical validity of the identification hypothesis. We will examine what constitutes good Research Design.
Application and replication : D. Card (1993), “Using geographic variation in college proximity to estimate the return to schooling.” Angrist and Krueger (2001, JEconPersp) “Instrumental Variables and the Search for Identification : From Supply and Demand to Natural Experiments”, Angrist and Pischke (2010, JEconPersp) “The Credibility Revolution in Empirical Economics : How Better Research Design is Taking the Con out of Econometrics.”