R:从数据中提取数据进行分析
R: Extracting data from a data from for analysis
我正在尝试从数据框中提取数据进行分析。
heightweight <- function(person, health) {
## Read in data
data <- read.csv("heightweight.csv", header = TRUE,
colClasses = "character")
## Check that the outcomes are valid
measure = c("height", "weight")
if(health %in% measure == FALSE){
stop("Valid inputs are height and weight")
}
## Truncate the data matrix to only what columns are needed
data <- data[c(1, 5, 7)]
## Rename columns
names(data)[1] <- "Name"
names(data)[2] <- "Height"
names(data)[3] <- "Weight"
## Convert numeric columns to numeric
data[, 2] <- as.numeric(data[, 3])
data[, 3] <- as.numeric(data[, 4])
## Convert NAs to 0 after coercion
data[is.na(data)] <- 0
## Check that the name is valid
name <- data[, 1]
name <- unique(name)
if(person %in% name == FALSE){
stop("Invalid person")
}
## Return person with lowest height or weight
list <- data[data$name == person & data[health],]
outcomes <- list[, health]
minumum <- which.min(outcomes)
## Min Rate
minimum[rowNum, ]$name
}
我遇到的问题是
list <- data[data$name == person & data[health],]
也就是说,我运行heightweight("Bob", "weight")
,我得到如下信息
Error in matrix(unlist(value, recursive = FALSE, use.names = FALSE), nrow = nr, :
length of 'dimnames' [2] not equal to array extent
我用 Google 搜索了此消息并查看了此处的一些主题,但无法确定问题所在。
这是您函数的简化模拟:
heightweight <- function(person,health) {
data.set <- data.frame(names=rep(letters[1:5],each=3),height=171:185,weight=seq(95,81,by=-1))
d1 <- data.set[data.set$name == person,]
d2 <- d1[d1[,health]==min(d1[,health]),]
d2[,c('names',health)]
}
第一行生成样本数据集。第二行选择给定 person
的所有记录。最后一行找到health
.
最小值对应的记录
heightweight('b','height')
# names height
# 4 b 174
除非我遗漏了什么,如果您只需要给定名称的最低体重或身高,最后三行代码有点多余。
这是获取给定人员的最低健康测量值的简单方法:
min(data[data$name==person, "height"])
第一部分仅选择与该人对应的数据行,它充当行索引。第二部分,在逗号之后,只选择所需的变量(列)。选择所需数据后,您将在该数据子集中寻找最小值。
举例说明结果:
data<-data.frame(name=as.character(c(rep("carlos",2),rep("marta",3),rep("johny",2),"sara")))
set.seed(1)
data$height <- rnorm(8,68,3)
data$weight <- rnorm(8,160,10)
对应的数据框:
name height weight
1 carlos 66.12064 165.7578
2 carlos 68.55093 156.9461
3 marta 65.49311 175.1178
4 marta 72.78584 163.8984
5 marta 68.98852 153.7876
6 johny 65.53859 137.8530
7 johny 69.46229 171.2493
8 sara 70.21497 159.5507
假设我们想要 marta 的最小重量:
person <- "marta"
health <- "weight"
"marta" 的最小 "weight" 是,
min(data[data$name==person,health])
给出了想要的结果:
[1] 153.7876
我正在尝试从数据框中提取数据进行分析。
heightweight <- function(person, health) {
## Read in data
data <- read.csv("heightweight.csv", header = TRUE,
colClasses = "character")
## Check that the outcomes are valid
measure = c("height", "weight")
if(health %in% measure == FALSE){
stop("Valid inputs are height and weight")
}
## Truncate the data matrix to only what columns are needed
data <- data[c(1, 5, 7)]
## Rename columns
names(data)[1] <- "Name"
names(data)[2] <- "Height"
names(data)[3] <- "Weight"
## Convert numeric columns to numeric
data[, 2] <- as.numeric(data[, 3])
data[, 3] <- as.numeric(data[, 4])
## Convert NAs to 0 after coercion
data[is.na(data)] <- 0
## Check that the name is valid
name <- data[, 1]
name <- unique(name)
if(person %in% name == FALSE){
stop("Invalid person")
}
## Return person with lowest height or weight
list <- data[data$name == person & data[health],]
outcomes <- list[, health]
minumum <- which.min(outcomes)
## Min Rate
minimum[rowNum, ]$name
}
我遇到的问题是
list <- data[data$name == person & data[health],]
也就是说,我运行heightweight("Bob", "weight")
,我得到如下信息
Error in matrix(unlist(value, recursive = FALSE, use.names = FALSE), nrow = nr, :
length of 'dimnames' [2] not equal to array extent
我用 Google 搜索了此消息并查看了此处的一些主题,但无法确定问题所在。
这是您函数的简化模拟:
heightweight <- function(person,health) {
data.set <- data.frame(names=rep(letters[1:5],each=3),height=171:185,weight=seq(95,81,by=-1))
d1 <- data.set[data.set$name == person,]
d2 <- d1[d1[,health]==min(d1[,health]),]
d2[,c('names',health)]
}
第一行生成样本数据集。第二行选择给定 person
的所有记录。最后一行找到health
.
heightweight('b','height')
# names height
# 4 b 174
除非我遗漏了什么,如果您只需要给定名称的最低体重或身高,最后三行代码有点多余。
这是获取给定人员的最低健康测量值的简单方法:
min(data[data$name==person, "height"])
第一部分仅选择与该人对应的数据行,它充当行索引。第二部分,在逗号之后,只选择所需的变量(列)。选择所需数据后,您将在该数据子集中寻找最小值。
举例说明结果:
data<-data.frame(name=as.character(c(rep("carlos",2),rep("marta",3),rep("johny",2),"sara")))
set.seed(1)
data$height <- rnorm(8,68,3)
data$weight <- rnorm(8,160,10)
对应的数据框:
name height weight
1 carlos 66.12064 165.7578
2 carlos 68.55093 156.9461
3 marta 65.49311 175.1178
4 marta 72.78584 163.8984
5 marta 68.98852 153.7876
6 johny 65.53859 137.8530
7 johny 69.46229 171.2493
8 sara 70.21497 159.5507
假设我们想要 marta 的最小重量:
person <- "marta"
health <- "weight"
"marta" 的最小 "weight" 是,
min(data[data$name==person,health])
给出了想要的结果:
[1] 153.7876