为初学者循环遍历 nans 的数据帧

looping though data frames for na's for a beginner

我是初学者,我的环境中有 42 个数据框,每个数据框都有多个带有 na 的列。

我可以遍历每个数据帧并删除 na 吗?

代码会是什么样子?

附上2张环境和每个数据框的图片,

全球

数据框

您最好先将所有数据框读入列表。如果你所有的数据框都是单独的对象,你可以这样做:

# Character vector of all objects in the current environment 
# (including all the data frames)
dfs = ls()

# Filter to keep only names of the data frames
dfs = dfs[grep("df_.*", dfs)]

# Add names (so that the elements of the list we create below will
# be named with the name of the source data frame)
names(dfs) = dfs

# Return a list where each element is a data frame.
# In each data frame, all rows with at least one NA will be removed.
df.na.remove = lapply(dfs, function(x) na.omit(get(x)))

# Or this
df.na.remove = lapply(dfs, function(x) {
  d = get(x)
  d[complete.cases(d), ]
})

您现在有一个包含所有数据框的列表,但如果它们有任何 NA 值,则删除行。

如果你想从全局环境中移除原始数据框,你可以这样做:

rm(list=dfs)

如果您想首先将所有数据读入列表,这里有一些代码。下面的示例切换到 tidyverse 函数。

library(tidyverse)

# Save two data frames, just to have something to work with
write_csv(mtcars[1:5, ], "df_1.csv")
write_csv(mtcars[6:10, ], "df_2.csv")

# Create a character vector with names of our data files
f = list.files(pattern="df_.*") %>% set_names()

# Read each data frame into a single list
d1 = map(f, read_csv)

# Remove NA values
d1 = map(d1, na.omit)

作为另一种选择,您可以读入所有数据文件,删除具有至少一个 NA 值的任何行,并将所有数据帧堆叠到一个数据帧中,全部在一个操作中完成:

d = map_df(f, ~ {
  x = read_csv(.x)
  na.omit(x)
  }, .id="source")