对变量进行分类并使其成为 R 中的虚拟变量
Categorizing variable and make it a dummy in R
古腾标签社区:)
我目前正在研究一些公司的 ESGscore。
ESGscore 在 0 到 100 之间变化。
我想将 ESGscore 分为 4 个部分:
0 - 25 --> poor --> 4
>25 - 50 --> medium --> 3
>50 - 75 --> good --> 2
75 - 100 --> excellent --> 1
dummy.code
的问题在于它正在重新排列 ESGscore。因此,例如 AIR PRODUCTS & CHEMICALS INC 的 ESGscore 始终是 'excellent',但输出显示我只是中等。
这就是 CODE 的样子:
Datensatz_final_so$ESG.Kategorien <- ifelse(Datensatz_final_so$ESGscore <= 25, "4",
ifelse(Datensatz_final_so$ESGscore > 25 & Datensatz_final_so$ESGscore <= 50, "3",
ifelse(Datensatz_final_so$ESGscore > 50 & Datensatz_final_so$ESGscore <= 75, "2",
ifelse(Datensatz_final_so > 75, "1", 0))))``
# Create ESGscore dummy #
Dummy.ESG <- dummy.code(Datensatz_final_so$ESG.Kategorien)
colnames(Dummy.ESG) <- c("poor", "medium", "good", "excellent")
# Connect data and dummy #
Datensatz_final <- cbind(Datensatz_final, Dummy.ESG)
你知道怎么解决吗?
一种方法是将 colnames
重新排列为
colnames(Dummy.ESG) <- c("good", "excellent", "poor", "medium")
但它正在产生问题,即 R 在分析中选择介质作为参考。
提前致谢! :)
数据示例:
structure(list(Company = c("AIR PRODUCTS & CHEMICALS INC", "AIR PRODUCTS & CHEMICALS INC",
"AIR PRODUCTS & CHEMICALS INC", "AIR PRODUCTS & CHEMICALS INC",
"AIR PRODUCTS & CHEMICALS INC", "AIR PRODUCTS & CHEMICALS INC",
"AIR PRODUCTS & CHEMICALS INC", "HESS CORP", "HESS CORP", "HESS CORP",
"HESS CORP", "HESS CORP", "HESS CORP", "HESS CORP", "APACHE CORP",
"APACHE CORP", "APACHE CORP", "APACHE CORP", "APACHE CORP", "APACHE CORP",
"APACHE CORP", "AVERY DENNISON CORP", "AVERY DENNISON CORP",
"AVERY DENNISON CORP", "AVERY DENNISON CORP", "AVERY DENNISON CORP",
"AVERY DENNISON CORP", "AVERY DENNISON CORP", "BALL CORP", "BALL CORP",
"BALL CORP", "BALL CORP", "BALL CORP", "BALL CORP", "BALL CORP",
"CHEVRON CORP", "CHEVRON CORP", "CHEVRON CORP", "CHEVRON CORP",
"CHEVRON CORP", "CHEVRON CORP", "CHEVRON CORP", "ECOLAB INC",
"ECOLAB INC", "ECOLAB INC", "ECOLAB INC", "ECOLAB INC", "ECOLAB INC",
"ECOLAB INC", "EXXON MOBIL CORP", "EXXON MOBIL CORP", "EXXON MOBIL CORP",
"EXXON MOBIL CORP", "EXXON MOBIL CORP", "EXXON MOBIL CORP", "EXXON MOBIL CORP",
"FMC CORP", "FMC CORP", "FMC CORP", "FMC CORP", "FMC CORP", "FMC CORP",
"FMC CORP", "HALLIBURTON CO", "HALLIBURTON CO", "HALLIBURTON CO",
"HALLIBURTON CO", "HALLIBURTON CO", "HALLIBURTON CO", "HALLIBURTON CO",
"HELMERICH & PAYNE", "HELMERICH & PAYNE", "HELMERICH & PAYNE",
"HELMERICH & PAYNE", "HELMERICH & PAYNE", "HELMERICH & PAYNE",
"HELMERICH & PAYNE"), Year = c(2011, 2012, 2013, 2014, 2015,
2016, 2017, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2011, 2012,
2013, 2014, 2015, 2016, 2017, 2011, 2012, 2013, 2014, 2015, 2016,
2017, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2011, 2012, 2013,
2014, 2015, 2016, 2017, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
2011, 2012, 2013, 2014, 2015, 2016, 2017, 2011, 2012, 2013, 2014,
2015, 2016, 2017, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2011,
2012, 2013, 2014, 2015, 2016, 2017), gvkey = c(1209, 1209, 1209,
1209, 1209, 1209, 1209, 1380, 1380, 1380, 1380, 1380, 1380, 1380,
1678, 1678, 1678, 1678, 1678, 1678, 1678, 1913, 1913, 1913, 1913,
1913, 1913, 1913, 1988, 1988, 1988, 1988, 1988, 1988, 1988, 2991,
2991, 2991, 2991, 2991, 2991, 2991, 4213, 4213, 4213, 4213, 4213,
4213, 4213, 4503, 4503, 4503, 4503, 4503, 4503, 4503, 4510, 4510,
4510, 4510, 4510, 4510, 4510, 5439, 5439, 5439, 5439, 5439, 5439,
5439, 5581, 5581, 5581, 5581, 5581, 5581, 5581), ESGscore = c(84.2750015258789,
81.9225006103516, 77.4024963378906, 80.1125030517578, 78.6449966430664,
76.3775024414062, 79.2699966430664, 69.4899978637695, 65.8300018310547,
64.4300003051758, 74.3000030517578, 75.7600021362305, 71.4599990844727,
74.6900024414062, 55.8300018310547, 56.0900001525879, 57.5, 60.75,
60.8800010681152, 67.379997253418, 71.9899978637695, 82.9000015258789,
77.3899993896484, 76.9300003051758, 78.7399978637695, 76.2283325195312,
74.2125015258789, 68.3600006103516, 64.4100036621094, 65.6600036621094,
63.75, 67.7300033569336, 67.5699996948242, 74.4300003051758,
68.5699996948242, 86.5100021362305, 84.3099975585938, 82.6600036621094,
82.3399963378906, 88.4100036621094, 90.0800018310547, 92.25,
74.6999969482422, 72.3600006103516, 68.3899993896484, 67.9300003051758,
65.629997253418, 74.9000015258789, 74.8600006103516, 81.6999969482422,
79.370002746582, 79.0899963378906, 75.25, 81.9499969482422, 81.0199966430664,
88.3399963378906, 59.8199996948242, 55.6500015258789, 52.2999992370605,
51.8499984741211, 56.9199981689453, 66.620002746582, 65.3300018310547,
85.9800033569336, 83.9499969482422, 85.1100006103516, 67.4300003051758,
76.4400024414062, 69.9199981689453, 78.4599990844727, 19.0599994659424,
17.5200004577637, 18.1200008392334, 23.5025005340576, 35.5349998474121,
36.7350006103516, 41.1725006103516)), row.names = c(NA, -77L), class = c("tbl_df",
"tbl", "data.frame"))
让我们将您的数据作为 df:
df<- structure(
list(
Company = c(
"AIR PRODUCTS & CHEMICALS INC",
...
...
),
row.names = c(NA,-77L),
class = c("tbl_df",
"tbl", "data.frame")
)
让我们进行分类并构建一个小型数据框,然后是一些 dplyr
ESG <- c("poor", "medium", "good", "excellent")
da <- data.frame(ESGColumn = 1:4,FlatESG = ESG)
df <- df |> dplyr::mutate(ESGColumn = floor(ESGscore/25)+1) |>
dplyr::left_join(da, by="ESGColumn") |>
dplyr::select(-"ESGColumn")
head(df)
# A tibble: 6 × 5
Company Year gvkey ESGscore FlatESG
<chr> <dbl> <dbl> <dbl> <chr>
1 AIR PRODUCTS & CHEMICALS INC 2011 1209 84.3 excellent
2 AIR PRODUCTS & CHEMICALS INC 2012 1209 81.9 excellent
3 AIR PRODUCTS & CHEMICALS INC 2013 1209 77.4 excellent
4 AIR PRODUCTS & CHEMICALS INC 2014 1209 80.1 excellent
5 AIR PRODUCTS & CHEMICALS INC 2015 1209 78.6 excellent
6 AIR PRODUCTS & CHEMICALS INC 2016 1209 76.4 excellent
格热戈日
您的示例数据不包含名为 ESG.Kategorien
的变量,但它包含 ESGscore
。以下应该给你你想要的:
Datensatz_final_so$Dummy <- cut(Datensatz_final_so$ESGscore, breaks=c(0, 25, 50, 75, 100), labels=c("poor", "medium", "good", "excellent"))
table(Datensatz_final_so$Dummy)
#
# poor medium good excellent
# 4 3 38 32
levels(Datensatz_final_so$Dummy)
# [1] "poor" "medium" "good" "excellent"
请注意您的原始分类将 75 列为良好和优秀。
古腾标签社区:)
我目前正在研究一些公司的 ESGscore。 ESGscore 在 0 到 100 之间变化。
我想将 ESGscore 分为 4 个部分:
0 - 25 --> poor --> 4
>25 - 50 --> medium --> 3
>50 - 75 --> good --> 2
75 - 100 --> excellent --> 1
dummy.code
的问题在于它正在重新排列 ESGscore。因此,例如 AIR PRODUCTS & CHEMICALS INC 的 ESGscore 始终是 'excellent',但输出显示我只是中等。
这就是 CODE 的样子:
Datensatz_final_so$ESG.Kategorien <- ifelse(Datensatz_final_so$ESGscore <= 25, "4",
ifelse(Datensatz_final_so$ESGscore > 25 & Datensatz_final_so$ESGscore <= 50, "3",
ifelse(Datensatz_final_so$ESGscore > 50 & Datensatz_final_so$ESGscore <= 75, "2",
ifelse(Datensatz_final_so > 75, "1", 0))))``
# Create ESGscore dummy #
Dummy.ESG <- dummy.code(Datensatz_final_so$ESG.Kategorien)
colnames(Dummy.ESG) <- c("poor", "medium", "good", "excellent")
# Connect data and dummy #
Datensatz_final <- cbind(Datensatz_final, Dummy.ESG)
你知道怎么解决吗?
一种方法是将 colnames
重新排列为
colnames(Dummy.ESG) <- c("good", "excellent", "poor", "medium")
但它正在产生问题,即 R 在分析中选择介质作为参考。
提前致谢! :)
数据示例:
structure(list(Company = c("AIR PRODUCTS & CHEMICALS INC", "AIR PRODUCTS & CHEMICALS INC",
"AIR PRODUCTS & CHEMICALS INC", "AIR PRODUCTS & CHEMICALS INC",
"AIR PRODUCTS & CHEMICALS INC", "AIR PRODUCTS & CHEMICALS INC",
"AIR PRODUCTS & CHEMICALS INC", "HESS CORP", "HESS CORP", "HESS CORP",
"HESS CORP", "HESS CORP", "HESS CORP", "HESS CORP", "APACHE CORP",
"APACHE CORP", "APACHE CORP", "APACHE CORP", "APACHE CORP", "APACHE CORP",
"APACHE CORP", "AVERY DENNISON CORP", "AVERY DENNISON CORP",
"AVERY DENNISON CORP", "AVERY DENNISON CORP", "AVERY DENNISON CORP",
"AVERY DENNISON CORP", "AVERY DENNISON CORP", "BALL CORP", "BALL CORP",
"BALL CORP", "BALL CORP", "BALL CORP", "BALL CORP", "BALL CORP",
"CHEVRON CORP", "CHEVRON CORP", "CHEVRON CORP", "CHEVRON CORP",
"CHEVRON CORP", "CHEVRON CORP", "CHEVRON CORP", "ECOLAB INC",
"ECOLAB INC", "ECOLAB INC", "ECOLAB INC", "ECOLAB INC", "ECOLAB INC",
"ECOLAB INC", "EXXON MOBIL CORP", "EXXON MOBIL CORP", "EXXON MOBIL CORP",
"EXXON MOBIL CORP", "EXXON MOBIL CORP", "EXXON MOBIL CORP", "EXXON MOBIL CORP",
"FMC CORP", "FMC CORP", "FMC CORP", "FMC CORP", "FMC CORP", "FMC CORP",
"FMC CORP", "HALLIBURTON CO", "HALLIBURTON CO", "HALLIBURTON CO",
"HALLIBURTON CO", "HALLIBURTON CO", "HALLIBURTON CO", "HALLIBURTON CO",
"HELMERICH & PAYNE", "HELMERICH & PAYNE", "HELMERICH & PAYNE",
"HELMERICH & PAYNE", "HELMERICH & PAYNE", "HELMERICH & PAYNE",
"HELMERICH & PAYNE"), Year = c(2011, 2012, 2013, 2014, 2015,
2016, 2017, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2011, 2012,
2013, 2014, 2015, 2016, 2017, 2011, 2012, 2013, 2014, 2015, 2016,
2017, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2011, 2012, 2013,
2014, 2015, 2016, 2017, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
2011, 2012, 2013, 2014, 2015, 2016, 2017, 2011, 2012, 2013, 2014,
2015, 2016, 2017, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2011,
2012, 2013, 2014, 2015, 2016, 2017), gvkey = c(1209, 1209, 1209,
1209, 1209, 1209, 1209, 1380, 1380, 1380, 1380, 1380, 1380, 1380,
1678, 1678, 1678, 1678, 1678, 1678, 1678, 1913, 1913, 1913, 1913,
1913, 1913, 1913, 1988, 1988, 1988, 1988, 1988, 1988, 1988, 2991,
2991, 2991, 2991, 2991, 2991, 2991, 4213, 4213, 4213, 4213, 4213,
4213, 4213, 4503, 4503, 4503, 4503, 4503, 4503, 4503, 4510, 4510,
4510, 4510, 4510, 4510, 4510, 5439, 5439, 5439, 5439, 5439, 5439,
5439, 5581, 5581, 5581, 5581, 5581, 5581, 5581), ESGscore = c(84.2750015258789,
81.9225006103516, 77.4024963378906, 80.1125030517578, 78.6449966430664,
76.3775024414062, 79.2699966430664, 69.4899978637695, 65.8300018310547,
64.4300003051758, 74.3000030517578, 75.7600021362305, 71.4599990844727,
74.6900024414062, 55.8300018310547, 56.0900001525879, 57.5, 60.75,
60.8800010681152, 67.379997253418, 71.9899978637695, 82.9000015258789,
77.3899993896484, 76.9300003051758, 78.7399978637695, 76.2283325195312,
74.2125015258789, 68.3600006103516, 64.4100036621094, 65.6600036621094,
63.75, 67.7300033569336, 67.5699996948242, 74.4300003051758,
68.5699996948242, 86.5100021362305, 84.3099975585938, 82.6600036621094,
82.3399963378906, 88.4100036621094, 90.0800018310547, 92.25,
74.6999969482422, 72.3600006103516, 68.3899993896484, 67.9300003051758,
65.629997253418, 74.9000015258789, 74.8600006103516, 81.6999969482422,
79.370002746582, 79.0899963378906, 75.25, 81.9499969482422, 81.0199966430664,
88.3399963378906, 59.8199996948242, 55.6500015258789, 52.2999992370605,
51.8499984741211, 56.9199981689453, 66.620002746582, 65.3300018310547,
85.9800033569336, 83.9499969482422, 85.1100006103516, 67.4300003051758,
76.4400024414062, 69.9199981689453, 78.4599990844727, 19.0599994659424,
17.5200004577637, 18.1200008392334, 23.5025005340576, 35.5349998474121,
36.7350006103516, 41.1725006103516)), row.names = c(NA, -77L), class = c("tbl_df",
"tbl", "data.frame"))
让我们将您的数据作为 df:
df<- structure(
list(
Company = c(
"AIR PRODUCTS & CHEMICALS INC",
...
...
),
row.names = c(NA,-77L),
class = c("tbl_df",
"tbl", "data.frame")
)
让我们进行分类并构建一个小型数据框,然后是一些 dplyr
ESG <- c("poor", "medium", "good", "excellent")
da <- data.frame(ESGColumn = 1:4,FlatESG = ESG)
df <- df |> dplyr::mutate(ESGColumn = floor(ESGscore/25)+1) |>
dplyr::left_join(da, by="ESGColumn") |>
dplyr::select(-"ESGColumn")
head(df)
# A tibble: 6 × 5
Company Year gvkey ESGscore FlatESG
<chr> <dbl> <dbl> <dbl> <chr>
1 AIR PRODUCTS & CHEMICALS INC 2011 1209 84.3 excellent
2 AIR PRODUCTS & CHEMICALS INC 2012 1209 81.9 excellent
3 AIR PRODUCTS & CHEMICALS INC 2013 1209 77.4 excellent
4 AIR PRODUCTS & CHEMICALS INC 2014 1209 80.1 excellent
5 AIR PRODUCTS & CHEMICALS INC 2015 1209 78.6 excellent
6 AIR PRODUCTS & CHEMICALS INC 2016 1209 76.4 excellent
格热戈日
您的示例数据不包含名为 ESG.Kategorien
的变量,但它包含 ESGscore
。以下应该给你你想要的:
Datensatz_final_so$Dummy <- cut(Datensatz_final_so$ESGscore, breaks=c(0, 25, 50, 75, 100), labels=c("poor", "medium", "good", "excellent"))
table(Datensatz_final_so$Dummy)
#
# poor medium good excellent
# 4 3 38 32
levels(Datensatz_final_so$Dummy)
# [1] "poor" "medium" "good" "excellent"
请注意您的原始分类将 75 列为良好和优秀。