在 "R" 中迭代特定列
In "R" iterate over specific columns
更新: 谢谢 jason 和 Buckminster - 我使用了你们建议的变体
我使用了下面的然后调整了我的 function/data 再次感谢
myFun<-function(x) {
myDF$multiple[grep(" Mbps",myDF[,x])] <- 1000000
myDF[,x] <- gsub(" Mbps","",myDF[,x])
myDF$multiple[grep(" Kbps",myDF[,x])] <- 1000
myDF[,x] <- gsub(" Kbps","",myDF[,x])
myDF$multiple[grep(" bps",myDF[,x])] <- 1
myDF[,x] <- gsub(" bps","",myDF[,x])
myDF[,x] <- as.numeric(myDF[,x]) * myDF$multiple
}
cols<-c('MaximumIn','MaximumOut','AverageIn','AverageOut')
myDF[ ,2:5]<-lapply(cols,myFun)
更新了 dput()。我要感谢您的回复,并意识到我本可以更轻松地获得帮助。我不得不回去清理并使数据变小,以便我可以 dput()。
我想创建一种优化的方法来迭代我关心的 4 列,可能使用 lappy。
下面是我的几行数据,有 6 列,我只想操作第 2-5 列。此代码段已与我认为与我的问题无关的其他代码一起处理。
Host MaximumIn MaximumOut AverageIn AverageOut Site Name Date
device1 30.63 Kbps 0 bps 24.60 Kbps 0 bps SiteA 3/7/15
device12 1.13 Mbps 24.89 Kbps 21.76 Kbps 461 bps SiteA 3/8/15
device1 698.44 Kbps 37.71 Kbps 17.49 Kbps 3.37 Kbps SiteB 3/7/15
这是数据框的 deput() 见上面的片段。我有一个 .csv 文件 dput() 但还不知道如何将其上传到这个问题。
structure(list(Host = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L,
7L, 8L, 9L, 10L), .Label = c("DeviceS1", "DeviceS2", "DeviceS3",
"DeviceS4", "DeviceS5", "deviceS2a", "deviceS2b", "devices5a",
"devices5b", "devices5c"), class = "factor"), MaximumIn = structure(c(5L,
2L, 3L, 1L, 4L, 6L, 7L, 8L, 11L, 10L, 9L), .Label = c("121.02 Kbps",
"27.11 Kbps", "39.08 Kbps", "62.22 Kbps", "698.44 Kbps", "1.21 Mbps",
"3.52 Mbps", "606.44 Kbps", "16.19 Mbps", "34.04 Mbps", "34.21 Mbps"
), class = "factor"), MaximumOut = structure(c(5L, 1L, 2L, 4L,
3L, 6L, 7L, 8L, 9L, 11L, 10L), .Label = c("0 bps", "10.58 Kbps",
"18.94 Kbps", "33.26 Kbps", "37.71 Kbps", "4.08 Mbps", "405.38 Kbps",
"930.44 Kbps", "15.35 Mbps", "192.88 Kbps", "2.98 Mbps"), class = "factor"),
AverageIn = structure(c(4L, 2L, 1L, 5L, 3L, 8L, 7L, 6L, 10L,
9L, 11L), .Label = c("10.83 Kbps", "24.57 Kbps", "3.87 Kbps",
"30.36 Kbps", "9.76 Kbps", "170.21 Kbps", "210.04 Kbps",
"312.39 Kbps", "20.08 Mbps", "21.60 Mbps", "5.95 Mbps"), class = "factor"),
AverageOut = structure(c(5L, 1L, 4L, 3L, 2L, 8L, 7L, 6L,
11L, 10L, 9L), .Label = c("0 bps", "1.54 Kbps", "2.28 Kbps",
"5.01 Kbps", "5.08 Kbps", "124.78 Kbps", "26.42 Kbps", "599.09 Kbps",
"21.38 Kbps", "576.77 Kbps", "6.16 Mbps"), class = "factor"),
`Site Name` = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L), .Label = c("site1", "site2", "site5"), class = "factor"),
Date = structure(c(16475, 16475, 16475, 16475, 16475, 16476,
16476, 16476, 16476, 16476, 16476), class = "Date")), .Names = c("Host",
"MaximumIn", "MaximumOut", "AverageIn", "AverageOut", "Site Name",
"Date"), row.names = c(NA, 11L), class = "data.frame")
在下面的代码中,我为每一列(MaximumIn、MaximumOut、AverageIn、AverageOut)手动编辑 运行。我所做的工作有效,但不符合 R(任何语言)的简洁标准。所以下面是基于列的函数的候选?
myDF$multiple <- 1
myDF$multiple[grep(" Mbps",myDF$MaximumOut)] <- 1000000
myDF$MaximumOut <- gsub(" Mbps","",myDF$MaximumOut)
myDF$multiple[grep(" Kbps",myDF$MaximumOut)] <- 1000
myDF$MaximumOut <- gsub(" Kbps","",myDF$MaximumOut)
myDF$multiple[grep(" bps",myDF$MaximumOut)] <- 1
myDF$MaximumOut <- gsub(" bps","",myDF$MaximumOut)
myDF$MaximumOut <- as.numeric(myDF$MaximumOut) * myDF$multiple
这看起来很简单。我没有测试,因为我们没有提供数据,但你应该明白要点。
myFun<-function(col) {
myDF$multiple[grep(" Mbps",myDF[,col])] <- 1000000
myDF[,col] <- gsub(" Mbps","",myDF[,col])
myDF$multiple[grep(" Kbps",myDF[,col])] <- 1000
myDF[,col] <- gsub(" Kbps","",myDF[,col])
myDF$multiple[grep(" bps",myDF[,col])] <- 1
myDF[,col] <- gsub(" bps","",myDF[,col])
myDF[,col] <- as.numeric(myDF[,col]) * myDF$multiple
}
cols<-c('MaximumOut','AverageIn','AverageOut','MaximumIn')
lapply(cols,myFun)
更新: 谢谢 jason 和 Buckminster - 我使用了你们建议的变体
我使用了下面的然后调整了我的 function/data 再次感谢
myFun<-function(x) {
myDF$multiple[grep(" Mbps",myDF[,x])] <- 1000000
myDF[,x] <- gsub(" Mbps","",myDF[,x])
myDF$multiple[grep(" Kbps",myDF[,x])] <- 1000
myDF[,x] <- gsub(" Kbps","",myDF[,x])
myDF$multiple[grep(" bps",myDF[,x])] <- 1
myDF[,x] <- gsub(" bps","",myDF[,x])
myDF[,x] <- as.numeric(myDF[,x]) * myDF$multiple
}
cols<-c('MaximumIn','MaximumOut','AverageIn','AverageOut')
myDF[ ,2:5]<-lapply(cols,myFun)
更新了 dput()。我要感谢您的回复,并意识到我本可以更轻松地获得帮助。我不得不回去清理并使数据变小,以便我可以 dput()。
我想创建一种优化的方法来迭代我关心的 4 列,可能使用 lappy。
下面是我的几行数据,有 6 列,我只想操作第 2-5 列。此代码段已与我认为与我的问题无关的其他代码一起处理。
Host MaximumIn MaximumOut AverageIn AverageOut Site Name Date
device1 30.63 Kbps 0 bps 24.60 Kbps 0 bps SiteA 3/7/15
device12 1.13 Mbps 24.89 Kbps 21.76 Kbps 461 bps SiteA 3/8/15
device1 698.44 Kbps 37.71 Kbps 17.49 Kbps 3.37 Kbps SiteB 3/7/15
这是数据框的 deput() 见上面的片段。我有一个 .csv 文件 dput() 但还不知道如何将其上传到这个问题。
structure(list(Host = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L,
7L, 8L, 9L, 10L), .Label = c("DeviceS1", "DeviceS2", "DeviceS3",
"DeviceS4", "DeviceS5", "deviceS2a", "deviceS2b", "devices5a",
"devices5b", "devices5c"), class = "factor"), MaximumIn = structure(c(5L,
2L, 3L, 1L, 4L, 6L, 7L, 8L, 11L, 10L, 9L), .Label = c("121.02 Kbps",
"27.11 Kbps", "39.08 Kbps", "62.22 Kbps", "698.44 Kbps", "1.21 Mbps",
"3.52 Mbps", "606.44 Kbps", "16.19 Mbps", "34.04 Mbps", "34.21 Mbps"
), class = "factor"), MaximumOut = structure(c(5L, 1L, 2L, 4L,
3L, 6L, 7L, 8L, 9L, 11L, 10L), .Label = c("0 bps", "10.58 Kbps",
"18.94 Kbps", "33.26 Kbps", "37.71 Kbps", "4.08 Mbps", "405.38 Kbps",
"930.44 Kbps", "15.35 Mbps", "192.88 Kbps", "2.98 Mbps"), class = "factor"),
AverageIn = structure(c(4L, 2L, 1L, 5L, 3L, 8L, 7L, 6L, 10L,
9L, 11L), .Label = c("10.83 Kbps", "24.57 Kbps", "3.87 Kbps",
"30.36 Kbps", "9.76 Kbps", "170.21 Kbps", "210.04 Kbps",
"312.39 Kbps", "20.08 Mbps", "21.60 Mbps", "5.95 Mbps"), class = "factor"),
AverageOut = structure(c(5L, 1L, 4L, 3L, 2L, 8L, 7L, 6L,
11L, 10L, 9L), .Label = c("0 bps", "1.54 Kbps", "2.28 Kbps",
"5.01 Kbps", "5.08 Kbps", "124.78 Kbps", "26.42 Kbps", "599.09 Kbps",
"21.38 Kbps", "576.77 Kbps", "6.16 Mbps"), class = "factor"),
`Site Name` = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L), .Label = c("site1", "site2", "site5"), class = "factor"),
Date = structure(c(16475, 16475, 16475, 16475, 16475, 16476,
16476, 16476, 16476, 16476, 16476), class = "Date")), .Names = c("Host",
"MaximumIn", "MaximumOut", "AverageIn", "AverageOut", "Site Name",
"Date"), row.names = c(NA, 11L), class = "data.frame")
在下面的代码中,我为每一列(MaximumIn、MaximumOut、AverageIn、AverageOut)手动编辑 运行。我所做的工作有效,但不符合 R(任何语言)的简洁标准。所以下面是基于列的函数的候选?
myDF$multiple <- 1
myDF$multiple[grep(" Mbps",myDF$MaximumOut)] <- 1000000
myDF$MaximumOut <- gsub(" Mbps","",myDF$MaximumOut)
myDF$multiple[grep(" Kbps",myDF$MaximumOut)] <- 1000
myDF$MaximumOut <- gsub(" Kbps","",myDF$MaximumOut)
myDF$multiple[grep(" bps",myDF$MaximumOut)] <- 1
myDF$MaximumOut <- gsub(" bps","",myDF$MaximumOut)
myDF$MaximumOut <- as.numeric(myDF$MaximumOut) * myDF$multiple
这看起来很简单。我没有测试,因为我们没有提供数据,但你应该明白要点。
myFun<-function(col) {
myDF$multiple[grep(" Mbps",myDF[,col])] <- 1000000
myDF[,col] <- gsub(" Mbps","",myDF[,col])
myDF$multiple[grep(" Kbps",myDF[,col])] <- 1000
myDF[,col] <- gsub(" Kbps","",myDF[,col])
myDF$multiple[grep(" bps",myDF[,col])] <- 1
myDF[,col] <- gsub(" bps","",myDF[,col])
myDF[,col] <- as.numeric(myDF[,col]) * myDF$multiple
}
cols<-c('MaximumOut','AverageIn','AverageOut','MaximumIn')
lapply(cols,myFun)