如何将变量的 1 个以上因子关联到数据框中的同一条目?
How to associate more than 1 factor of a variable to the same entry into a data frame?
这是我向 Whosebug 社区提出的第一个问题。首先,非常感谢过去 5 年来我在这里设法找到的所有答案。你们都非常有帮助,但现在我找不到答案了。
所以,这是我的情况。在一个更大的数据框架中,有一个变量给我带来了麻烦:天气。它由定义天气的因素组成,例如:"Rainy"、"Cloudy"、"Sunny" 等。我的问题是某些条目由多个因素定义(例如 "rainy,foggy").因此,R 将这些因素的组合视为新的独立因素,这是我不想要的。
这是数据框的示例:
df <- read.table(text =
'"Date.Time","Year","Month","Day","Weekday","Hour","Temperature","Rel.humidity","Wind.dir","Wind.dir2","Wind.speed","Atm.pressure","Weather"
2015-04-01 00:00:00,"2015","4","1","Wednesday","00:00",-3.4,44,30,"NW",10,100.83,"Clear"
2015-04-02 23:00:00,"2015","4","2","Thursday","23:00",3.4,94,36,"N",2,99.8,"Rain,Fog"
2015-05-11 12:00:00,"2015","5","11","Monday","12:00",9.5,93,3,"NE",27,101.5,"Mist,Shower,Fog"',
header = TRUE, stringsAsFactors = FALSE, sep = ",")
我的最终目标是能够,例如,select 仅标记为 Fog 的条目,包括同时具有 Rain 和 Fog 的条目。
我的解决方案想法是应用字符拆分并将结果插入将放入 Weather 变量的列表中,但我还不能这样做,也许有更简单、更高级的方法。
这是我天真的尝试:
for (i in dim(df)[1]){
df[i,] <- as.factor(list(strsplit(dda[i,], ",")))
}
tldr;我想将一个因子如"A,B,C"转化为多个因子"A"、"B"、"C"转化为同一个元素(数据框的同一列、同一行)
提前感谢您的宝贵时间,请随时评论我的问题格式。
df <- read.table(text =
'"Date.Time","Year","Month","Day","Weekday","Hour","Temperature","Rel.humidity","Wind.dir","Wind.dir2","Wind.speed","Atm.pressure","Weather"
2015-04-01 00:00:00,"2015","4","1","Wednesday","00:00",-3.4,44,30,"NW",10,100.83,"Clear"
2015-04-02 23:00:00,"2015","4","2","Thursday","23:00",3.4,94,36,"N",2,99.8,"Rain,Fog"
2015-05-11 12:00:00,"2015","5","11","Monday","12:00",9.5,93,3,"NE",27,101.5,"Mist,Shower,Fog"',
header = TRUE, stringsAsFactors = FALSE, sep = ",")
修复你的 for 循环:
df[["Weather_split"]] <- as.list(rep(NA, nrow(df)))
for (i in seq_len(nrow(df))) {
df[["Weather_split"]][[i]] <- strsplit(df[["Weather"]][[i]], ",")[[1]]
}
同样的事情,更简单:
df[["Weather_split"]] <- strsplit(df[["Weather"]], ",")
str(df$Weather)
# chr [1:3] "Clear" "Rain,Fog" "Mist,Shower,Fog"
str(df$Weather_split)
# List of 3
# $ : chr "Clear"
# $ : chr [1:2] "Rain" "Fog"
# $ : chr [1:3] "Mist" "Shower" "Fog"
@Stephen Henderson 的想法更进一步:
Weather_levels <- unique(unlist(df[["Weather_split"]]))
for (lvl in Weather_levels) {
df[[lvl]] <- unlist(lapply(df$Weather_split, "%in%", x = lvl))
}
df
# Date.Time Year Month Day Weekday Hour Temperature Rel.humidity Wind.dir Wind.dir2 Wind.speed Atm.pressure Weather Weather_split Clear Rain Fog Mist Shower
# 1 2015-04-01 00:00:00 2015 4 1 Wednesday 00:00 -3.4 44 30 NW 10 100.83 Clear Clear TRUE FALSE FALSE FALSE FALSE
# 2 2015-04-02 23:00:00 2015 4 2 Thursday 23:00 3.4 94 36 N 2 99.80 Rain,Fog Rain, Fog FALSE TRUE TRUE FALSE FALSE
# 3 2015-05-11 12:00:00 2015 5 11 Monday 12:00 9.5 93 3 NE 27 101.50 Mist,Shower,Fog Mist, Shower, Fog FALSE FALSE TRUE TRUE TRUE
编辑:
如果按照你的问题,你真的需要因子而不是字符向量,那是完全可行的:
df$Weather_split <- lapply(df$Weather_split, factor, levels = Weather_levels)
df$Weather_split
# [[1]]
# [1] Clear
# Levels: Clear Rain Fog Mist Shower
#
# [[2]]
# [1] Rain Fog
# Levels: Clear Rain Fog Mist Shower
#
# [[3]]
# [1] Mist Shower Fog
# Levels: Clear Rain Fog Mist Shower
str(df$Weather_split)
# List of 3
# $ : Factor w/ 5 levels "Clear","Rain",..: 1
# $ : Factor w/ 5 levels "Clear","Rain",..: 2 3
# $ : Factor w/ 5 levels "Clear","Rain",..: 4 5 3
这是我向 Whosebug 社区提出的第一个问题。首先,非常感谢过去 5 年来我在这里设法找到的所有答案。你们都非常有帮助,但现在我找不到答案了。
所以,这是我的情况。在一个更大的数据框架中,有一个变量给我带来了麻烦:天气。它由定义天气的因素组成,例如:"Rainy"、"Cloudy"、"Sunny" 等。我的问题是某些条目由多个因素定义(例如 "rainy,foggy").因此,R 将这些因素的组合视为新的独立因素,这是我不想要的。
这是数据框的示例:
df <- read.table(text =
'"Date.Time","Year","Month","Day","Weekday","Hour","Temperature","Rel.humidity","Wind.dir","Wind.dir2","Wind.speed","Atm.pressure","Weather"
2015-04-01 00:00:00,"2015","4","1","Wednesday","00:00",-3.4,44,30,"NW",10,100.83,"Clear"
2015-04-02 23:00:00,"2015","4","2","Thursday","23:00",3.4,94,36,"N",2,99.8,"Rain,Fog"
2015-05-11 12:00:00,"2015","5","11","Monday","12:00",9.5,93,3,"NE",27,101.5,"Mist,Shower,Fog"',
header = TRUE, stringsAsFactors = FALSE, sep = ",")
我的最终目标是能够,例如,select 仅标记为 Fog 的条目,包括同时具有 Rain 和 Fog 的条目。
我的解决方案想法是应用字符拆分并将结果插入将放入 Weather 变量的列表中,但我还不能这样做,也许有更简单、更高级的方法。 这是我天真的尝试:
for (i in dim(df)[1]){
df[i,] <- as.factor(list(strsplit(dda[i,], ",")))
}
tldr;我想将一个因子如"A,B,C"转化为多个因子"A"、"B"、"C"转化为同一个元素(数据框的同一列、同一行)
提前感谢您的宝贵时间,请随时评论我的问题格式。
df <- read.table(text =
'"Date.Time","Year","Month","Day","Weekday","Hour","Temperature","Rel.humidity","Wind.dir","Wind.dir2","Wind.speed","Atm.pressure","Weather"
2015-04-01 00:00:00,"2015","4","1","Wednesday","00:00",-3.4,44,30,"NW",10,100.83,"Clear"
2015-04-02 23:00:00,"2015","4","2","Thursday","23:00",3.4,94,36,"N",2,99.8,"Rain,Fog"
2015-05-11 12:00:00,"2015","5","11","Monday","12:00",9.5,93,3,"NE",27,101.5,"Mist,Shower,Fog"',
header = TRUE, stringsAsFactors = FALSE, sep = ",")
修复你的 for 循环:
df[["Weather_split"]] <- as.list(rep(NA, nrow(df)))
for (i in seq_len(nrow(df))) {
df[["Weather_split"]][[i]] <- strsplit(df[["Weather"]][[i]], ",")[[1]]
}
同样的事情,更简单:
df[["Weather_split"]] <- strsplit(df[["Weather"]], ",")
str(df$Weather)
# chr [1:3] "Clear" "Rain,Fog" "Mist,Shower,Fog"
str(df$Weather_split)
# List of 3
# $ : chr "Clear"
# $ : chr [1:2] "Rain" "Fog"
# $ : chr [1:3] "Mist" "Shower" "Fog"
@Stephen Henderson 的想法更进一步:
Weather_levels <- unique(unlist(df[["Weather_split"]]))
for (lvl in Weather_levels) {
df[[lvl]] <- unlist(lapply(df$Weather_split, "%in%", x = lvl))
}
df
# Date.Time Year Month Day Weekday Hour Temperature Rel.humidity Wind.dir Wind.dir2 Wind.speed Atm.pressure Weather Weather_split Clear Rain Fog Mist Shower
# 1 2015-04-01 00:00:00 2015 4 1 Wednesday 00:00 -3.4 44 30 NW 10 100.83 Clear Clear TRUE FALSE FALSE FALSE FALSE
# 2 2015-04-02 23:00:00 2015 4 2 Thursday 23:00 3.4 94 36 N 2 99.80 Rain,Fog Rain, Fog FALSE TRUE TRUE FALSE FALSE
# 3 2015-05-11 12:00:00 2015 5 11 Monday 12:00 9.5 93 3 NE 27 101.50 Mist,Shower,Fog Mist, Shower, Fog FALSE FALSE TRUE TRUE TRUE
编辑:
如果按照你的问题,你真的需要因子而不是字符向量,那是完全可行的:
df$Weather_split <- lapply(df$Weather_split, factor, levels = Weather_levels)
df$Weather_split
# [[1]]
# [1] Clear
# Levels: Clear Rain Fog Mist Shower
#
# [[2]]
# [1] Rain Fog
# Levels: Clear Rain Fog Mist Shower
#
# [[3]]
# [1] Mist Shower Fog
# Levels: Clear Rain Fog Mist Shower
str(df$Weather_split)
# List of 3
# $ : Factor w/ 5 levels "Clear","Rain",..: 1
# $ : Factor w/ 5 levels "Clear","Rain",..: 2 3
# $ : Factor w/ 5 levels "Clear","Rain",..: 4 5 3