为初学者循环遍历 nans 的数据帧
looping though data frames for na's for a beginner
我是初学者,我的环境中有 42 个数据框,每个数据框都有多个带有 na 的列。
我可以遍历每个数据帧并删除 na 吗?
代码会是什么样子?
附上2张环境和每个数据框的图片,
全球
数据框
您最好先将所有数据框读入列表。如果你所有的数据框都是单独的对象,你可以这样做:
# Character vector of all objects in the current environment
# (including all the data frames)
dfs = ls()
# Filter to keep only names of the data frames
dfs = dfs[grep("df_.*", dfs)]
# Add names (so that the elements of the list we create below will
# be named with the name of the source data frame)
names(dfs) = dfs
# Return a list where each element is a data frame.
# In each data frame, all rows with at least one NA will be removed.
df.na.remove = lapply(dfs, function(x) na.omit(get(x)))
# Or this
df.na.remove = lapply(dfs, function(x) {
d = get(x)
d[complete.cases(d), ]
})
您现在有一个包含所有数据框的列表,但如果它们有任何 NA
值,则删除行。
如果你想从全局环境中移除原始数据框,你可以这样做:
rm(list=dfs)
如果您想首先将所有数据读入列表,这里有一些代码。下面的示例切换到 tidyverse
函数。
library(tidyverse)
# Save two data frames, just to have something to work with
write_csv(mtcars[1:5, ], "df_1.csv")
write_csv(mtcars[6:10, ], "df_2.csv")
# Create a character vector with names of our data files
f = list.files(pattern="df_.*") %>% set_names()
# Read each data frame into a single list
d1 = map(f, read_csv)
# Remove NA values
d1 = map(d1, na.omit)
作为另一种选择,您可以读入所有数据文件,删除具有至少一个 NA 值的任何行,并将所有数据帧堆叠到一个数据帧中,全部在一个操作中完成:
d = map_df(f, ~ {
x = read_csv(.x)
na.omit(x)
}, .id="source")
我是初学者,我的环境中有 42 个数据框,每个数据框都有多个带有 na 的列。
我可以遍历每个数据帧并删除 na 吗?
代码会是什么样子?
附上2张环境和每个数据框的图片,
全球
数据框
您最好先将所有数据框读入列表。如果你所有的数据框都是单独的对象,你可以这样做:
# Character vector of all objects in the current environment
# (including all the data frames)
dfs = ls()
# Filter to keep only names of the data frames
dfs = dfs[grep("df_.*", dfs)]
# Add names (so that the elements of the list we create below will
# be named with the name of the source data frame)
names(dfs) = dfs
# Return a list where each element is a data frame.
# In each data frame, all rows with at least one NA will be removed.
df.na.remove = lapply(dfs, function(x) na.omit(get(x)))
# Or this
df.na.remove = lapply(dfs, function(x) {
d = get(x)
d[complete.cases(d), ]
})
您现在有一个包含所有数据框的列表,但如果它们有任何 NA
值,则删除行。
如果你想从全局环境中移除原始数据框,你可以这样做:
rm(list=dfs)
如果您想首先将所有数据读入列表,这里有一些代码。下面的示例切换到 tidyverse
函数。
library(tidyverse)
# Save two data frames, just to have something to work with
write_csv(mtcars[1:5, ], "df_1.csv")
write_csv(mtcars[6:10, ], "df_2.csv")
# Create a character vector with names of our data files
f = list.files(pattern="df_.*") %>% set_names()
# Read each data frame into a single list
d1 = map(f, read_csv)
# Remove NA values
d1 = map(d1, na.omit)
作为另一种选择,您可以读入所有数据文件,删除具有至少一个 NA 值的任何行,并将所有数据帧堆叠到一个数据帧中,全部在一个操作中完成:
d = map_df(f, ~ {
x = read_csv(.x)
na.omit(x)
}, .id="source")