如何解决R - 面板数据 FE,没有唯一的时间 ID 匹配,创建新的时间变量
我正在处理一个关于汽车销售的长格式的高度分解的不平衡面板数据集,并且想要运行一个有限元回归模型。数据结构是这样给出的(当然包含更多信息,但这与这种情况无关):
undefined
我希望你能明白。
所以基本上我有面板数据,修剪和时间是索引变量。我确实想研究税收对销售的影响。为此,我想对模型和时间进行 FE 回归,因为我只想考虑汽车模型和时间段内的销售和税收变化。 FE 捕获了我不感兴趣的剩余变化。为了做到这一点,我想雇用
cars <- data.frame(make = c("Audi","Audi","Opel","Opel"),model = c("a1","a1","a3","Corsa","Corsa"),trim = c("Sport","Business","Sport","Cross","Street","Corss","O1","O2","O2"),tax = c(100,200,100,500,600,50,30,30),sales = c(1000,1500,800,1300,1100,1000,70,20,5000,2000,3000,3000),time = c(1,1,2,3,4,5,5))
但这行不通,因为数据结构的vehcile定义较窄(根据trim变量)。因此,我在一个时期内有多个具有相同模型的行(没有唯一的 id-time 匹配)。为了克服这个问题,我考虑创建一个新的时间变量,因为我不想在模型级别的修剪级别聚合我的数据。我对新时间变量的要求以及如何创建它缺乏一点想象力。这将最佳地导致
plm(sales ~ tax,data = cars,model = "within",index = c("time","model"),effect = "twoways")
但我也想知道我是否可以轻松解决我的问题
plm(sales ~ tax,index = c("new time variable",effect = "twoways")
有人对我克服这个问题的简单想法提出建议,或者创建一个新的时间变量来运行 plm 命令的想法(甚至是一个全新的想法或另一个包)?此外,使用时间模型 FE 和修剪 FE 运行 FE 回归是否有意义?
解决方法
请添加一些可重复的数据集,并尝试更清楚地了解您想要的内容。
无论如何我都想回答,如果您的变量随时间变化,固定效应模型是正确的,如果您有时不变变量,则可能更喜欢池化 OLS 或随机变量(请在此处查看:Econometrics Academy)。
plm 包是正确的,但是,对于进行回归,我认为存在错误,低于我的建议:
library(plm)
p.data <- pdata.frame(data,index=c("trim","time"))
attach(p.data)
y <- cbind(sales)
X <- cbind(lt_tax)
model1 <- plm(y~X+factor(time),data=p.data,model = "within",effect = "twoways")
summary(model1)
希望这有用。
,既然您要求展示如何从 model
和 trim
构造新的单个索引,那么这里是如何做到的。然而,并不是说您的变量 tax
不会因模型修剪组合而变化(可以通过例如 pvar
或通过在转换后查看模型矩阵来检查)。因此,您的内部模型(model-trim 是单个索引)是不可估计的。
cars <- data.frame(make = c("Audi","Audi","Opel","Opel"),model = c("a1","a1","a3","Corsa","Corsa"),trim = c("Sport","Business","Sport","Cross","Street","Corss","O1","O2","O2"),tax = c(100,200,100,500,600,50,30,30),sales = c(1000,1500,800,1300,1100,1000,70,20,5000,2000,3000,3000),time = c(1,1,2,3,4,5,5))
# See NA coefficient in two-way LSDV model
summary(lm(sales ~ tax + factor(time) + factor(model),data = cars))
#>
#> Call:
#> lm(formula = sales ~ tax + factor(time) + factor(model),data = cars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1470.3 -135.1 0.0 131.3 1470.3
#>
#> Coefficients: (1 not defined because of singularities)
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 762.884 904.730 0.843 0.4270
#> tax 2.972 5.056 0.588 0.5750
#> factor(time)2 -117.500 569.739 -0.206 0.8425
#> factor(time)3 -158.750 753.695 -0.211 0.8392
#> factor(time)4 2618.219 936.655 2.795 0.0267 *
#> factor(time)5 2118.219 936.655 2.261 0.0582 .
#> factor(model)a3 -2296.476 2100.974 -1.093 0.3106
#> factor(model)Corsa NA NA NA NA
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 805.7 on 7 degrees of freedom
#> Multiple R-squared: 0.8291,Adjusted R-squared: 0.6827
#> F-statistic: 5.662 on 6 and 7 DF,p-value: 0.01921
# make new individual index from model and trim
cars$modeltrim <- paste(cars$model,cars$trim,sep = "_")
# formula one-way within via LSDV
form <- sales ~ tax + factor(modeltrim)
summary(lm(form,data = cars))
#>
#> Call:
#> lm(formula = form,data = cars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1000.0 -131.2 12.5 108.3 1000.0
#>
#> Coefficients: (1 not defined because of singularities)
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 2717.647 518.058 5.246 0.00119 **
#> tax -7.255 3.319 -2.186 0.06509 .
#> factor(modeltrim)a1_Sport -1025.490 463.745 -2.211 0.06267 .
#> factor(modeltrim)a3_Corss 939.804 1396.610 0.673 0.52259
#> factor(modeltrim)a3_Cross 959.804 1396.610 0.687 0.51405
#> factor(modeltrim)a3_Street 1680.294 1637.233 1.026 0.33890
#> factor(modeltrim)Corsa_O1 1645.098 584.414 2.815 0.02596 *
#> factor(modeltrim)Corsa_O2 NA NA NA NA
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 618.1 on 7 degrees of freedom
#> Multiple R-squared: 0.8994,Adjusted R-squared: 0.8132
#> F-statistic: 10.44 on 6 and 7 DF,p-value: 0.003392
# one-way within model via plm
library(plm)
plm(sales ~ tax,data = cars,index = c("modeltrim","time"),effect = "individual")
#> Error in plm.fit(data,model,effect,random.method,random.models,random.dfcor,: empty model
plm(sales ~ tax,effect = "twoways")
#> Error in plm.fit(data,: empty model
# tax does not vary per modeltrim (does not vary per individual) - within model non-estimable
pvar(cars,"time"))
#> no time variation: make model trim tax modeltrim
#> no individual variation: make time
#
# look at variable tax after one-way within transformation
pcars <- pdata.frame(cars,"time"))
mf <- model.frame(pcars,sales ~ tax)
model.matrix(mf,model = "within")[,"tax"]
#> 1 2 3 4 5 6 7 8 9 10 11 12 13 14
#> 0 0 0 0 0 0 0 0 0 0 0 0 0 0
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。