R:从数据中提取数据进行分析

R: Extracting data from a data from for analysis

我正在尝试从数据框中提取数据进行分析。

heightweight <- function(person, health) {
    ## Read in data
    data <- read.csv("heightweight.csv", header = TRUE,
                     colClasses = "character")
    ## Check that the outcomes are valid
    measure = c("height", "weight")
    if(health %in% measure == FALSE){
        stop("Valid inputs are height and weight")
    }
    ## Truncate the data matrix to only what columns are needed
    data <- data[c(1, 5, 7)]
    ## Rename columns
    names(data)[1] <- "Name"
    names(data)[2] <- "Height"
    names(data)[3] <- "Weight"
    ## Convert numeric columns to numeric
    data[, 2] <- as.numeric(data[, 3])
    data[, 3] <- as.numeric(data[, 4])
    ## Convert NAs to 0 after coercion
    data[is.na(data)] <- 0
    ## Check that the name is valid
    name <- data[, 1]
    name <- unique(name)
    if(person %in% name == FALSE){
        stop("Invalid person")
    }
    ## Return person with lowest height or weight
    list <- data[data$name == person & data[health],]
    outcomes <- list[, health]
    minumum <- which.min(outcomes)
    ## Min Rate
    minimum[rowNum, ]$name
}

我遇到的问题是

list <- data[data$name == person & data[health],]

也就是说,我运行heightweight("Bob", "weight"),我得到如下信息

Error in matrix(unlist(value, recursive = FALSE, use.names = FALSE), nrow = nr,  : 
  length of 'dimnames' [2] not equal to array extent

我用 Google 搜索了此消息并查看了此处的一些主题,但无法确定问题所在。

这是您函数的简化模拟:

heightweight <- function(person,health) {
  data.set <- data.frame(names=rep(letters[1:5],each=3),height=171:185,weight=seq(95,81,by=-1))
  d1 <- data.set[data.set$name == person,]
  d2 <- d1[d1[,health]==min(d1[,health]),]
  d2[,c('names',health)]    
}

第一行生成样本数据集。第二行选择给定 person 的所有记录。最后一行找到health.

最小值对应的记录
heightweight('b','height')
#   names height
# 4     b    174

除非我遗漏了什么,如果您只需要给定名称的最低体重或身高,最后三行代码有点多余。

这是获取给定人员的最低健康测量值的简单方法:

min(data[data$name==person, "height"])

第一部分仅选择与该人对应的数据行,它充当行索引。第二部分,在逗号之后,只选择所需的变量(列)。选择所需数据后,您将在该数据子集中寻找最小值。

举例说明结果:

data<-data.frame(name=as.character(c(rep("carlos",2),rep("marta",3),rep("johny",2),"sara")))
set.seed(1)
data$height <- rnorm(8,68,3)
data$weight <- rnorm(8,160,10)

对应的数据框:

   name   height   weight
1 carlos 66.12064 165.7578
2 carlos 68.55093 156.9461
3  marta 65.49311 175.1178
4  marta 72.78584 163.8984
5  marta 68.98852 153.7876
6  johny 65.53859 137.8530
7  johny 69.46229 171.2493
8   sara 70.21497 159.5507

假设我们想要 marta 的最小重量:

person <- "marta"
health <- "weight"

"marta" 的最小 "weight" 是,

min(data[data$name==person,health])

给出了想要的结果:

[1] 153.7876