删除 r 中多个文件中的 NA 列
Removing NA columns in multiple files in r
我有一个大型数据集,我使用 splitting
使数据更易于理解。我最终得到了大约 250 次拆分。因此,每个拆分都有不同数量的空列。我想删除空列并写入更新的文件。我可以手动完成,但正如我提到的,我有大约 250 个拆分,所以我无法对所有拆分进行拆分。
下面是一个可重现的例子:
df <- data.frame(Size= c(800, 850, 1100, 1200, 1000),
Value= c(900, NA, 1300, 1100, NA),
Location= c(NA, 'midcity', 'uptown', NA, 'Lakeview'),
Num1 = c(2, NA, 3, 2, NA),
Num2 = c(2,3,3,1,2),
Rent= c('y', 'y', 'n', 'y', 'n'))
这是我目前所拥有的。
拆分:
index <- apply(is.na(df)*1, 1,paste, collapse = "")
s <- split(df, index)
split(df, index)
for (i in 1:length(s))
{write.csv(s[i], file = paste0("Splits/", i, "splits.csv"), row.names=FALSE, na = "")}
正在删除空列:
split <- read.csv("Splits/3splits.csv")
updated_split <- split[,colSums(is.na(split))<nrow(split)]
write.csv(updated_split, file = "updated_3split.csv", row.names=FALSE)
split <- read.csv("Splits/2splits.csv")
updated_split <- split[,colSums(is.na(split))<nrow(split)]
write.csv(updated_split, file = "updated_2split.csv", row.names=FALSE)
split <- read.csv("Splits/1splits.csv")
updated_split <- split[,colSums(is.na(split))<nrow(split)]
write.csv(updated_split, file = "updated_1split.csv", row.names=FALSE)
有没有办法使上述过程自动化?我所说的自动化是指找到一种方法来删除这三个文件中的空列,而无需一遍又一遍地重复相同的三行(对 250 个文件执行此操作并不是一个真正的选择)。
编辑 1:
像这样?
for (i in 1:length(s))
{
lapply(s, function(x) x[,colSums(is.na(x))<nrow(x)])
write.csv(s[i], file = paste0("Splits/", i, "splits.csv"), row.names=FALSE, na = "")
}
也许是这样:
df <- data.frame(Size= c(800, 850, 1100, 1200, 1000),
Value= c(900, NA, 1300, 1100, NA),
Location= c(NA, 'midcity', 'uptown', NA, 'Lakeview'),
Num1 = c(2, NA, 3, 2, NA),
Num2 = c(2,3,3,1,2),
Rent= c('y', 'y', 'n', 'y', 'n'))
index <- apply(is.na(df)*1, 1,paste, collapse = "")
s <- split(df, index)
split(df, index)
for (i in 1:length(s))
{
write.csv(s[i], file = paste0("Splits/", i, "splits.csv"), row.names=FALSE, na = "")
sdf <- data.frame(s[i])
updated_split <- sdf[,colSums(is.na(sdf))<nrow(sdf)]
write.csv(updated_split, file = paste0("updated", i, "split.csv"), row.names=FALSE)
}
我有一个大型数据集,我使用 splitting
使数据更易于理解。我最终得到了大约 250 次拆分。因此,每个拆分都有不同数量的空列。我想删除空列并写入更新的文件。我可以手动完成,但正如我提到的,我有大约 250 个拆分,所以我无法对所有拆分进行拆分。
下面是一个可重现的例子:
df <- data.frame(Size= c(800, 850, 1100, 1200, 1000),
Value= c(900, NA, 1300, 1100, NA),
Location= c(NA, 'midcity', 'uptown', NA, 'Lakeview'),
Num1 = c(2, NA, 3, 2, NA),
Num2 = c(2,3,3,1,2),
Rent= c('y', 'y', 'n', 'y', 'n'))
这是我目前所拥有的。
拆分:
index <- apply(is.na(df)*1, 1,paste, collapse = "")
s <- split(df, index)
split(df, index)
for (i in 1:length(s))
{write.csv(s[i], file = paste0("Splits/", i, "splits.csv"), row.names=FALSE, na = "")}
正在删除空列:
split <- read.csv("Splits/3splits.csv")
updated_split <- split[,colSums(is.na(split))<nrow(split)]
write.csv(updated_split, file = "updated_3split.csv", row.names=FALSE)
split <- read.csv("Splits/2splits.csv")
updated_split <- split[,colSums(is.na(split))<nrow(split)]
write.csv(updated_split, file = "updated_2split.csv", row.names=FALSE)
split <- read.csv("Splits/1splits.csv")
updated_split <- split[,colSums(is.na(split))<nrow(split)]
write.csv(updated_split, file = "updated_1split.csv", row.names=FALSE)
有没有办法使上述过程自动化?我所说的自动化是指找到一种方法来删除这三个文件中的空列,而无需一遍又一遍地重复相同的三行(对 250 个文件执行此操作并不是一个真正的选择)。
编辑 1:
像这样?
for (i in 1:length(s))
{
lapply(s, function(x) x[,colSums(is.na(x))<nrow(x)])
write.csv(s[i], file = paste0("Splits/", i, "splits.csv"), row.names=FALSE, na = "")
}
也许是这样:
df <- data.frame(Size= c(800, 850, 1100, 1200, 1000),
Value= c(900, NA, 1300, 1100, NA),
Location= c(NA, 'midcity', 'uptown', NA, 'Lakeview'),
Num1 = c(2, NA, 3, 2, NA),
Num2 = c(2,3,3,1,2),
Rent= c('y', 'y', 'n', 'y', 'n'))
index <- apply(is.na(df)*1, 1,paste, collapse = "")
s <- split(df, index)
split(df, index)
for (i in 1:length(s))
{
write.csv(s[i], file = paste0("Splits/", i, "splits.csv"), row.names=FALSE, na = "")
sdf <- data.frame(s[i])
updated_split <- sdf[,colSums(is.na(sdf))<nrow(sdf)]
write.csv(updated_split, file = paste0("updated", i, "split.csv"), row.names=FALSE)
}