如何在 r 中以正确的格式输出数据帧？

Question

我必须编写一个函数来读取一个充满文件的目录并报告每个数据文件中完全观察到的案例数（每个可观察实例中没有 NA 值）。该函数应该 return 一个数据框，其中第一列是文件的名称，第二列是文件的数量完整案例。下面是我的初稿，希望大家多提意见！

complete <- function (directory, id = 1:332){
  nobs = numeric() #currently blank
    # nobs is the number of complete cases in each file
  data = data.frame() #currently blank dataframe
  for (i in id){
    #get the right filepath
    newread = read.csv(paste(directory,"/",formatC(i,width=3,flag="0"),".csv",sep=""))
    my_na <- is.na(newread) #let my_na be the logic vector of true and false na values 
    nobs = sum(!my_na) #sum up all the not na values (1 is not na, 0 is na, due to inversion). 
    #this returns # of true values
    #add on to the existing dataframe
    data = c(data, i, nobs, row.names=i)
  }
  data # return the updated data frame for the specified id range
}

样本运行complete("specdata",1)的输出是

[[1]]
[1] 1

[[2]]
[1] 3161

$row.names
[1] 1

我不确定为什么它没有以常规数据帧格式显示。我也很确定我的数字也不正确。我假设在每个第 i 个实例中，newread 会在继续 my_na 之前读取该文件中的所有数据。那是错误的来源吗？或者是别的什么？请解释。谢谢！

Answer 1

您应该考虑将值添加到向量的其他方法。该功能目前正在覆盖所有地方。您询问了 id=1 时，当您向函数提供多个 id 时情况会更糟。它只会 return 最后一个。原因如下：

#Simple function that takes ids and adds 2 to them
myFun <- function(id) {

  nobs = c()

  for(i in id) {

    nobs = 2 + i
  }

  return(nobs)
}

myFun(c(2,3,4))
[1] 6

我告诉它每个id return值加2，但它只给了我最后一个。我应该这样写：

myFun2 <- function(id) {

  nobs = c()

  for(i in 1:length(id)) {

    nobs[i] <- 2 + id[i]
  }

  return(nobs)
}

myFun2(c(2,3,4))
[1] 4 5 6

现在它给出了正确的输出。有什么不同？首先 nobs object 不再被覆盖，它被追加。请注意 for 循环 header.

中的子集括号和新计数器

此外，构建 objects 并不是使用 R 的最佳方式。构建它是为了事半功倍：

complete <- function(directory, id=1:332) {
  nobs <- sapply(id, function(i) {
    sum(complete.cases(read.csv(list.files(path=directory, full.names=TRUE)[i]) )) } )
  data.frame(id, nobs)
}

如果您想修复代码，请尝试以下操作：

complete <- function (directory, id = 1:332){
  nobs = numeric(length(id)) #currently blank
    # nobs is the number of complete cases in each file
  for (i in 1:length(id)) {
    #get the right filepath
    newread = read.csv(paste(directory,"/",formatC( id[i] ,width=3,flag="0"),".csv",sep=""))
    my_na <- is.na(newread) #let my_na be the logic vector of true and false na values 
    nobs[i] = sum(!my_na) #sum up all the not na values (1 is not na, 0 is na, due to inversion). 
    #this returns # of true values
  }
  data.frame(id, nobs) # return the updated data frame for the specified id range
}

Answer 2

由于我不知道你指的是什么数据，而且没有给出样本，我可以想出这个作为对你的函数的编辑 -

complete <- function (directory, id = 1:332){
  data = data.frame()
  for (i in id){
    newread = read.csv(paste(directory,"/",formatC(i,width=3,flag="0"),".csv",sep=""))
    newread = newread[complete.cases(newread),]
    nobs = nrow(newread)
    data[nrow(data)+1,] = c(i,nobs)
  }
  names(data) <- c("Name","NotNA")
  return(data)
}

如何在 r 中以正确的格式输出数据帧？

how to output a dataframe in the correct format in r?

r

dataframe

data-science