为 r 中的年份整数创建因子变量
Create factor variables for year integers in r
我有如下面板数据集。但实际数据集有数千个观察值。我想为 1984-1998 年(15 年)创建 14 个因子作为新列 "Year_dum"。我搜索了在 r 中创建虚拟变量的方法,但找不到使用年份整数的方法。任何人都可以帮我在 r 中做这个吗?
+--------+------+------+------+----------+
| Time | year | Firm | Prod | Year_dum |
+--------+------+------+------+----------+
| Jan-84 | 1984 | A | 28.2 | 0 |
| Feb-84 | 1984 | A | 26.6 | 0 |
| Mar-84 | 1984 | A | 30.3 | 0 |
| Apr-85 | 1985 | A | 33.2 | 1 |
| May-85 | 1985 | A | 30.1 | 1 |
| Jun-85 | 1985 | A | 28.3 | 1 |
| Jan-84 | 1984 | B | 28.6 | 0 |
| Feb-84 | 1984 | B | 28.9 | 0 |
| Mar-84 | 1984 | B | 28.1 | 0 |
| Oct-84 | 1984 | C | 28.8 | 0 |
| Nov-85 | 1985 | C | 31.6 | 1 |
| Dec-86 | 1986 | C | 26.9 | 2 |
| Jan-89 | 1989 | C | 28.6 | 5 |
| Feb-98 | 1998 | C | 29.6 | 14 |
+--------+------+------+------+----------+
可以使用以下输入访问这个简单的数据集。
structure(list(Time = structure(c(6L, 4L, 9L, 2L, 10L, 8L, 6L,
4L, 9L, 12L, 11L, 3L, 7L, 5L, 1L, 1L, 1L), .Label = c("", "Apr-85",
"Dec-86", "Feb-84", "Feb-98", "Jan-84", "Jan-89", "Jun-85", "Mar-84",
"May-85", "Nov-85", "Oct-84"), class = "factor"), year = c(1984L,
1984L, 1984L, 1985L, 1985L, 1985L, 1984L, 1984L, 1984L, 1984L,
1985L, 1986L, 1989L, 1998L, NA, NA, NA), Firm = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L
), .Label = c("", "A", "B", "C"), class = "factor"), Prod = c(28.2,
26.6, 30.3, 33.2, 30.1, 28.3, 28.6, 28.9, 28.1, 28.8, 31.6, 26.9,
28.6, 29.6, NA, NA, NA), Year_dum = c(0L, 0L, 0L, 1L, 1L, 1L,
0L, 0L, 0L, 0L, 1L, 2L, 5L, 14L, NA, NA, NA)), .Names = c("Time",
"year", "Firm", "Prod", "Year_dum"), class = "data.frame", row.names = c(NA,
-17L))
例如,您可以使用 dummies
包(首先使用 install.packages("dummies")
安装它)。一个例子:
library(dummies)
df <- data.frame("val" = 1:5, "year" = c(1984, 1984, 1985, 1985, 1986))
# after creating the dummies, column-bind it to the original dataframe
df <- cbind(df, dummy("year", df, sep = "_"))
> df
val year year_1984 year_1985 year_1986
1 1 1984 1 0 0
2 2 1984 1 0 0
3 3 1985 0 1 0
4 4 1985 0 1 0
5 5 1986 0 0 1
这是一个仅使用 base
的示例:
for(i in 1:nrow(x)) assign(paste("year", x$year[i], sep="_"), x$year == x$year[i])
我们可以试试
df$Year_dum <- df$year-min(df$year)
df$Year_dum
#[1] 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
或使用match
with(df, match(year, unique(year))-1)
#[1] 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
我有如下面板数据集。但实际数据集有数千个观察值。我想为 1984-1998 年(15 年)创建 14 个因子作为新列 "Year_dum"。我搜索了在 r 中创建虚拟变量的方法,但找不到使用年份整数的方法。任何人都可以帮我在 r 中做这个吗?
+--------+------+------+------+----------+ | Time | year | Firm | Prod | Year_dum | +--------+------+------+------+----------+ | Jan-84 | 1984 | A | 28.2 | 0 | | Feb-84 | 1984 | A | 26.6 | 0 | | Mar-84 | 1984 | A | 30.3 | 0 | | Apr-85 | 1985 | A | 33.2 | 1 | | May-85 | 1985 | A | 30.1 | 1 | | Jun-85 | 1985 | A | 28.3 | 1 | | Jan-84 | 1984 | B | 28.6 | 0 | | Feb-84 | 1984 | B | 28.9 | 0 | | Mar-84 | 1984 | B | 28.1 | 0 | | Oct-84 | 1984 | C | 28.8 | 0 | | Nov-85 | 1985 | C | 31.6 | 1 | | Dec-86 | 1986 | C | 26.9 | 2 | | Jan-89 | 1989 | C | 28.6 | 5 | | Feb-98 | 1998 | C | 29.6 | 14 | +--------+------+------+------+----------+
可以使用以下输入访问这个简单的数据集。
structure(list(Time = structure(c(6L, 4L, 9L, 2L, 10L, 8L, 6L,
4L, 9L, 12L, 11L, 3L, 7L, 5L, 1L, 1L, 1L), .Label = c("", "Apr-85",
"Dec-86", "Feb-84", "Feb-98", "Jan-84", "Jan-89", "Jun-85", "Mar-84",
"May-85", "Nov-85", "Oct-84"), class = "factor"), year = c(1984L,
1984L, 1984L, 1985L, 1985L, 1985L, 1984L, 1984L, 1984L, 1984L,
1985L, 1986L, 1989L, 1998L, NA, NA, NA), Firm = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L
), .Label = c("", "A", "B", "C"), class = "factor"), Prod = c(28.2,
26.6, 30.3, 33.2, 30.1, 28.3, 28.6, 28.9, 28.1, 28.8, 31.6, 26.9,
28.6, 29.6, NA, NA, NA), Year_dum = c(0L, 0L, 0L, 1L, 1L, 1L,
0L, 0L, 0L, 0L, 1L, 2L, 5L, 14L, NA, NA, NA)), .Names = c("Time",
"year", "Firm", "Prod", "Year_dum"), class = "data.frame", row.names = c(NA,
-17L))
例如,您可以使用 dummies
包(首先使用 install.packages("dummies")
安装它)。一个例子:
library(dummies)
df <- data.frame("val" = 1:5, "year" = c(1984, 1984, 1985, 1985, 1986))
# after creating the dummies, column-bind it to the original dataframe
df <- cbind(df, dummy("year", df, sep = "_"))
> df
val year year_1984 year_1985 year_1986
1 1 1984 1 0 0
2 2 1984 1 0 0
3 3 1985 0 1 0
4 4 1985 0 1 0
5 5 1986 0 0 1
这是一个仅使用 base
的示例:
for(i in 1:nrow(x)) assign(paste("year", x$year[i], sep="_"), x$year == x$year[i])
我们可以试试
df$Year_dum <- df$year-min(df$year)
df$Year_dum
#[1] 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
或使用match
with(df, match(year, unique(year))-1)
#[1] 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1