A Python Package for High Dimensional Fixed Effects

December 21, 2023

Linear regression is a fundamental statistical technique used to model the relationship between a dependent variable and one or more independent variables. However, when dealing with high dimensional fixed effects and panel data, it becomes challenging to estimate the model accurately and efficiently. That’s where FixedEffectModel, a Python package developed by Kuaishou DA Ecology group, comes in.

Main Features

FixedEffectModel offers various features to accelerate the estimation of linear models with high dimensional fixed effects:

Linear model estimation: Estimate linear regression models with fixed effects to account for their impact on the dependent variable.
High dimensional fixed effects: Handle panel data with high dimensional fixed effects, which combine time series and cross-sectional data.
Instrumental variable model estimation: Estimate instrumental variable models to handle endogeneity issues and uncover causal relationships.
Robust/white standard error: Calculate robust/white standard errors to account for heteroscedasticity and correlation in the errors.
Multi-way cluster standard error: Estimate multi-way cluster standard errors to adjust for clustered data structures.
Difference-in-difference model: Estimate difference-in-difference models to evaluate causal effects by comparing treatment and control groups before and after an intervention.

The FixedEffectModel package is designed to provide accurate and efficient estimation results, making it an essential tool for researchers and analysts working with complex datasets.

Installation

Getting started with FixedEffectModel is easy. You can install the package directly from PyPI using the following command:

$ pip install FixedEffectModel

Make sure you have Python 3.6 or higher installed, along with the necessary dependencies such as Pandas, Numpy, Scipy, and Statsmodels.

Getting Started

To help you get up and running quickly with FixedEffectModel, let’s walk through a simple case study. We’ll cover the key steps needed to estimate linear models with high dimensional fixed effects.

Loading Modules and Functions

After installing FixedEffectModel and its dependencies, you’ll need to load the required modules and functions. Here’s an example:

import numpy as np
import pandas as pd
from fixedeffect.iv import iv2sls, ivgmm, ivtest
from fixedeffect.fe import fixedeffect, did, getfe
from fixedeffect.utils.panel_dgp import gen_data

The gen_data function is used to simulate panel data for the case study.

Data

Next, let’s generate a simulated dataset with 100 cross-sectional units and 10 time units:

N = 100
T = 10
beta = [-3, 1, 2, 3, 4]
ate = 1
exp_date = 5
df = gen_data(N, T, beta, ate, exp_date)

In this dataset, beta represents the true coefficients, ate is the true treatment effect, and exp_date is the start date of the experiment.

Model Fit and Summary

Now, let’s explore how FixedEffectModel can estimate different types of models and produce model summaries.

Instrumental Variables Estimation

For instrumental variable regression, FixedEffectModel provides two functions: iv2sls and ivgmm.

To use iv2sls, you can define your model formula and call the fit method to obtain the estimation results:

formula = 'y ~ x_1|id+time|0|(x_2~x_3+x_4)'
model_iv2sls = iv2sls(data_df=df, formula=formula)
result = model_iv2sls.fit()
result.summary()

Alternatively, you can specify the variables directly:

exog_x = ['x_1']
endog_x = ['x_2']
iv = ['x_3', 'x_4']
y = ['y']

model_iv2sls = iv2sls(data_df=df, dependent=y, exog_x=exog_x, endog_x=endog_x, category=['id', 'time'], iv=iv)
result = model_iv2sls.fit()
result.summary()

FixedEffectModel also provides specification tests for instrumental variable models using the ivtest function:

ivtest(result)

For the ivgmm function, the usage is similar:

formula = 'y ~ x_1|id+time|0|(x_2~x_3+x_4)'
model_ivgmm = ivgmm(data_df=df, formula=formula)
result = model_ivgmm.fit()
result.summary()

Fixed Effect Model

To estimate a fixed effect model, use the fixedeffect function:

exog_x = ['x_1']
y = ['y']
category = ['id', 'time']
cluster = ['id', 'time']

model_fe = fixedeffect(data_df=df, dependent=y, exog_x=exog_x, category=category, cluster=cluster)
result = model_fe.fit()
result.summary()

You can also use the getfe function to obtain the fixed effects:

getfe(result)

Difference in Difference

FixedEffectModel also supports difference-in-difference (DID) models:

formula = 'y ~ 0|0|0|0'

model_did = did(data_df=df, formula=formula, treatment=['treatment'], csid=['id'], tsid=['time'], exp_date=2)
result = model_did.fit()
result.summary()

Performance and Future Developments

FixedEffectModel is designed to provide efficient and accurate estimation results for linear models with high dimensional fixed effects. It incorporates innovative techniques such as instrumental variable regression, robust standard error calculation, and difference-in-difference modeling.

Looking ahead, the development team at Kuaishou DA Ecology is actively working on adding more features to the package. The upcoming release will include GMM estimation methods and robust standard error calculation based on GMM.

Conclusion

FixedEffectModel is a powerful Python package for accelerating the estimation of linear models with high dimensional fixed effects. Whether you’re working with panel data or exploring causal relationships using instrumental variables, FixedEffectModel offers the tools you need to obtain accurate and efficient results. Try it out for yourself and experience the benefits of faster model estimation and robust inference.

Have you used FixedEffectModel in your research or data analysis? Share your thoughts and experiences with us in the comments below.

Group Sum