使用 stargazer 分析包含时间序列的数据框
Analysing a data frame that contains a time series using stargazer
我有一个 面板数据集,包含 10 个 obs。和 3 个变量。
(# of obs.30 = 10 行(= 国家)* 2 列(= 迁移参数)* 1col 对应的年份。
我的数据框由 3 个年度数据框组成,可以这么说。
考虑到它是面板数据集(因此最大 N=10),我如何在整个时间段应用 stargazer?也就是说,R 应该在每 11 行之后重新开始。我想要漂亮的 table 用于描述性统计
前三年的数据集:
structure(list(Population = c(21759420, 8696916, 1946351, 14689726,
8212264, 491723, 18907008, 4345386, 11133861, 657229, 22549547,
8944706, 1979882, 15141099, 8489031, 496963, 19432541, 4404230,
11502786, 673252, 23369131, 9199259, 2014866, 15605217, 8766930,
502384, 19970495, 4448525, 11887202, 689692), Distance..km. = c(7243L,
4290L, 9500L, 3789L, 6452L, 2211L, 4667L, 5036L, 4047L, 9140L,
7243L, 4290L, 9500L, 3789L, 6452L, 2211L, 4667L, 5036L, 4047L,
9140L, 7243L, 4290L, 9500L, 3789L, 6452L, 2211L, 4667L, 5036L,
4047L, 9140L), year = c(2008, 2008, 2008, 2008, 2008, 2008, 2008,
2008, 2008, 2008, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009,
2009, 2009, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010,
2010)), .Names = c("Population", "Distance..km.", "year"), row.names = c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 50L, 51L, 52L, 53L, 54L,
55L, 56L, 57L, 58L, 59L, 99L, 100L, 101L, 102L, 103L, 104L, 105L,
106L, 107L, 108L), class = "data.frame")
我仍然从 N=30 得到描述性统计数据,但它应该 N=10,因为我正在寻找整个三年期间的描述性统计数据,并且每个年度数据框都需要被视为孤立的.
希望我能全面地表达问题
您可以使用 split
+ lapply
来自基础 R:
library(stargazer)
lapply(split(df, df$year), stargazer, type = "text")
或by
:
by(df, df$year, stargazer, type = 'text')
结果:
===============================================================
Statistic N Mean St. Dev. Min Max
---------------------------------------------------------------
Population 10 9,083,988.000 7,541,970.000 491,723 21,759,420
Distance..km. 10 5,637.500 2,385.941 2,211 9,500
year 10 2,008.000 0.000 2,008 2,008
---------------------------------------------------------------
===============================================================
Statistic N Mean St. Dev. Min Max
---------------------------------------------------------------
Population 10 9,361,404.000 7,798,880.000 496,963 22,549,547
Distance..km. 10 5,637.500 2,385.941 2,211 9,500
year 10 2,009.000 0.000 2,009 2,009
---------------------------------------------------------------
===============================================================
Statistic N Mean St. Dev. Min Max
---------------------------------------------------------------
Population 10 9,645,370.000 8,065,676.000 502,384 23,369,131
Distance..km. 10 5,637.500 2,385.941 2,211 9,500
year 10 2,010.000 0.000 2,010 2,010
---------------------------------------------------------------
df$year: 2008
[1] ""
[2] "==============================================================="
[3] "Statistic N Mean St. Dev. Min Max "
[4] "---------------------------------------------------------------"
[5] "Population 10 9,083,988.000 7,541,970.000 491,723 21,759,420"
[6] "Distance..km. 10 5,637.500 2,385.941 2,211 9,500 "
[7] "year 10 2,008.000 0.000 2,008 2,008 "
[8] "---------------------------------------------------------------"
--------------------------------------------------------------------------
df$year: 2009
[1] ""
[2] "==============================================================="
[3] "Statistic N Mean St. Dev. Min Max "
[4] "---------------------------------------------------------------"
[5] "Population 10 9,361,404.000 7,798,880.000 496,963 22,549,547"
[6] "Distance..km. 10 5,637.500 2,385.941 2,211 9,500 "
[7] "year 10 2,009.000 0.000 2,009 2,009 "
[8] "---------------------------------------------------------------"
--------------------------------------------------------------------------
df$year: 2010
[1] ""
[2] "==============================================================="
[3] "Statistic N Mean St. Dev. Min Max "
[4] "---------------------------------------------------------------"
[5] "Population 10 9,645,370.000 8,065,676.000 502,384 23,369,131"
[6] "Distance..km. 10 5,637.500 2,385.941 2,211 9,500 "
[7] "year 10 2,010.000 0.000 2,010 2,010 "
[8] "---------------------------------------------------------------"
这两种方法的缺点是它们将表格打印两次(一次来自 stargazer
输出,另一次来自 lapply
/by
)。为了解决这个问题,您可以使用 walk
形式 purrr
只调用 stargazer
因为它的副作用:
library(dplyr)
library(purrr)
df %>%
split(.$year) %>%
walk(~ stargazer(., type = "text"))
结果:
===============================================================
Statistic N Mean St. Dev. Min Max
---------------------------------------------------------------
Population 10 9,083,988.000 7,541,970.000 491,723 21,759,420
Distance..km. 10 5,637.500 2,385.941 2,211 9,500
year 10 2,008.000 0.000 2,008 2,008
---------------------------------------------------------------
===============================================================
Statistic N Mean St. Dev. Min Max
---------------------------------------------------------------
Population 10 9,361,404.000 7,798,880.000 496,963 22,549,547
Distance..km. 10 5,637.500 2,385.941 2,211 9,500
year 10 2,009.000 0.000 2,009 2,009
---------------------------------------------------------------
===============================================================
Statistic N Mean St. Dev. Min Max
---------------------------------------------------------------
Population 10 9,645,370.000 8,065,676.000 502,384 23,369,131
Distance..km. 10 5,637.500 2,385.941 2,211 9,500
year 10 2,010.000 0.000 2,010 2,010
---------------------------------------------------------------
注:
以上所有方法都适用于 Latex 输出 (type = "latex"
)。我设置 type = "text"
只是为了演示目的。
我有一个 面板数据集,包含 10 个 obs。和 3 个变量。 (# of obs.30 = 10 行(= 国家)* 2 列(= 迁移参数)* 1col 对应的年份。 我的数据框由 3 个年度数据框组成,可以这么说。
考虑到它是面板数据集(因此最大 N=10),我如何在整个时间段应用 stargazer?也就是说,R 应该在每 11 行之后重新开始。我想要漂亮的 table 用于描述性统计
前三年的数据集:
structure(list(Population = c(21759420, 8696916, 1946351, 14689726,
8212264, 491723, 18907008, 4345386, 11133861, 657229, 22549547,
8944706, 1979882, 15141099, 8489031, 496963, 19432541, 4404230,
11502786, 673252, 23369131, 9199259, 2014866, 15605217, 8766930,
502384, 19970495, 4448525, 11887202, 689692), Distance..km. = c(7243L,
4290L, 9500L, 3789L, 6452L, 2211L, 4667L, 5036L, 4047L, 9140L,
7243L, 4290L, 9500L, 3789L, 6452L, 2211L, 4667L, 5036L, 4047L,
9140L, 7243L, 4290L, 9500L, 3789L, 6452L, 2211L, 4667L, 5036L,
4047L, 9140L), year = c(2008, 2008, 2008, 2008, 2008, 2008, 2008,
2008, 2008, 2008, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009,
2009, 2009, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010,
2010)), .Names = c("Population", "Distance..km.", "year"), row.names = c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 50L, 51L, 52L, 53L, 54L,
55L, 56L, 57L, 58L, 59L, 99L, 100L, 101L, 102L, 103L, 104L, 105L,
106L, 107L, 108L), class = "data.frame")
我仍然从 N=30 得到描述性统计数据,但它应该 N=10,因为我正在寻找整个三年期间的描述性统计数据,并且每个年度数据框都需要被视为孤立的. 希望我能全面地表达问题
您可以使用 split
+ lapply
来自基础 R:
library(stargazer)
lapply(split(df, df$year), stargazer, type = "text")
或by
:
by(df, df$year, stargazer, type = 'text')
结果:
===============================================================
Statistic N Mean St. Dev. Min Max
---------------------------------------------------------------
Population 10 9,083,988.000 7,541,970.000 491,723 21,759,420
Distance..km. 10 5,637.500 2,385.941 2,211 9,500
year 10 2,008.000 0.000 2,008 2,008
---------------------------------------------------------------
===============================================================
Statistic N Mean St. Dev. Min Max
---------------------------------------------------------------
Population 10 9,361,404.000 7,798,880.000 496,963 22,549,547
Distance..km. 10 5,637.500 2,385.941 2,211 9,500
year 10 2,009.000 0.000 2,009 2,009
---------------------------------------------------------------
===============================================================
Statistic N Mean St. Dev. Min Max
---------------------------------------------------------------
Population 10 9,645,370.000 8,065,676.000 502,384 23,369,131
Distance..km. 10 5,637.500 2,385.941 2,211 9,500
year 10 2,010.000 0.000 2,010 2,010
---------------------------------------------------------------
df$year: 2008
[1] ""
[2] "==============================================================="
[3] "Statistic N Mean St. Dev. Min Max "
[4] "---------------------------------------------------------------"
[5] "Population 10 9,083,988.000 7,541,970.000 491,723 21,759,420"
[6] "Distance..km. 10 5,637.500 2,385.941 2,211 9,500 "
[7] "year 10 2,008.000 0.000 2,008 2,008 "
[8] "---------------------------------------------------------------"
--------------------------------------------------------------------------
df$year: 2009
[1] ""
[2] "==============================================================="
[3] "Statistic N Mean St. Dev. Min Max "
[4] "---------------------------------------------------------------"
[5] "Population 10 9,361,404.000 7,798,880.000 496,963 22,549,547"
[6] "Distance..km. 10 5,637.500 2,385.941 2,211 9,500 "
[7] "year 10 2,009.000 0.000 2,009 2,009 "
[8] "---------------------------------------------------------------"
--------------------------------------------------------------------------
df$year: 2010
[1] ""
[2] "==============================================================="
[3] "Statistic N Mean St. Dev. Min Max "
[4] "---------------------------------------------------------------"
[5] "Population 10 9,645,370.000 8,065,676.000 502,384 23,369,131"
[6] "Distance..km. 10 5,637.500 2,385.941 2,211 9,500 "
[7] "year 10 2,010.000 0.000 2,010 2,010 "
[8] "---------------------------------------------------------------"
这两种方法的缺点是它们将表格打印两次(一次来自 stargazer
输出,另一次来自 lapply
/by
)。为了解决这个问题,您可以使用 walk
形式 purrr
只调用 stargazer
因为它的副作用:
library(dplyr)
library(purrr)
df %>%
split(.$year) %>%
walk(~ stargazer(., type = "text"))
结果:
===============================================================
Statistic N Mean St. Dev. Min Max
---------------------------------------------------------------
Population 10 9,083,988.000 7,541,970.000 491,723 21,759,420
Distance..km. 10 5,637.500 2,385.941 2,211 9,500
year 10 2,008.000 0.000 2,008 2,008
---------------------------------------------------------------
===============================================================
Statistic N Mean St. Dev. Min Max
---------------------------------------------------------------
Population 10 9,361,404.000 7,798,880.000 496,963 22,549,547
Distance..km. 10 5,637.500 2,385.941 2,211 9,500
year 10 2,009.000 0.000 2,009 2,009
---------------------------------------------------------------
===============================================================
Statistic N Mean St. Dev. Min Max
---------------------------------------------------------------
Population 10 9,645,370.000 8,065,676.000 502,384 23,369,131
Distance..km. 10 5,637.500 2,385.941 2,211 9,500
year 10 2,010.000 0.000 2,010 2,010
---------------------------------------------------------------
注:
以上所有方法都适用于 Latex 输出 (type = "latex"
)。我设置 type = "text"
只是为了演示目的。