在 Excel 或 R 中是否可以同时保留数值并将所有值更改为分钟?

Is it possible to keep only numeric values and change all values into minutes at the same time in Excel or R?

我正在尝试使用从 google 表单收集的数据编辑 table。在几列中,有关于每周体力 activity 的数据,以分钟和小时为单位。除了数值之外,单元格中还有表示物理时间分钟或小时的文本 activity。目标是在这些单元格中只有数值而没有文本,并且只有分钟(即需要转换小时)。

Example of a column with respective data

我已经尝试使用以下公式来删除适用于这部分问题的文本:=SUBSTITUTE(SUBSTITUTE(J52;"min";"");"h";" ")

当我尝试嵌套公式并在单元格中只获取分钟时出现问题,因为有分钟(表示为:min)、小时(表示为:h)和一些带有 > 符号的值前面(例如 > 4 小时被先验确定为 4:30,即 270 分钟)。

为了在 R 中潜在地解决这个问题,我提供了会话信息以及一个可重现的示例: R 版本 4.0.4 (2021-02-15) 平台:x86_64-w64-mingw32/x64(64 位) 运行 下:Windows 10 x64(内部版本 19042)

矩阵产品:默认

语言环境: 1 LC_COLLATE=Croatian_Croatia.1250 LC_CTYPE=Croatian_Croatia.1250 LC_MONETARY=Croatian_Croatia.1250 [4] LC_NUMERIC=C LC_TIME=Croatian_Croatia.1250
系统代码页:1252

附加基础包: 1 stats graphics grDevices utils 数据集方法基础

通过名称空间加载(未附加): 1 rstudioapi_0.13 knitr_1.33 magrittr_2.0.1 tidyselect_1.1.1 munsell_0.5.0 colorspace_2.0-0 [7] R6_2.5.0 rlang_0.4.11 fansi_0.4.2 dplyr_1.0.5 tools_4.0.4 grid_4.0.4
[13] gtable_0.3.0 xfun_0.22 tinytex_0.31 utf8_1.2.1 cli_2.5.0 DBI_1.1.1
[19] htmltools_0.5.1.1 ellipsis_0.3.1 digest_0.6.27 yaml_2.2.1 assertthat_0.2.1 tibble_3.1.1
[25] lifecycle_1.0.0 crayon_1.4.1 purrr_0.3.4 ggplot2_3.3.3 vctrs_0.3.7 evaluate_0.14
[31] glue_1.4.2 rmarkdown_2.7 compiler_4.0.4 pillar_1.6.0 generics_0.1.0 scales_1.1.1
[37] pkgconfig_2.0.3

data <- structure(list(id = c("100213", "100998", "100494", "100758", 
"100984", "100461", "100374", "140859", "150950", "140433", "200481", 
"200892", "300487", "300319", "400258", "400075", "200712", "120114", 
"500046", "500649", "600614", "700001", "300812", "700986", "800665", 
"700882", "800470", "400090", "900350", "700283", "600455", "800400", 
"100814", "700027", "600612", "30076", "500671", "200052", "600675", 
"800786", "110092", "800193", "800947", "500749", "800665", "400562", 
"150066", "200013", "700419", "400468", "600584", "ID 600366", 
"300379", "300035", "150346", "140860", "900508", "600751", "110825", 
"300848", "200990", "100237", "500137", "400751", "600347", "700289", 
"700000", "150320", "700319", "300643", "400871", "600963", "400295", 
"120120", "200875", "500209", "700380", "600646", "150415", "150415", 
"600640", "120999", "200693", "600626", "700003", "400493", "700849", 
"200544", "200001", "700801", "200084", "110951", "100371", "120114"
), vpa = c("0 min", "", "15 min", "0 min", "0 min", "45 min", 
"15 min", "0 min", "30 min", "30 min", "0 min", "0 min", "0 min", 
"0 min", "45 min", "3:00 h", "0 min", "", "15 min", "0 min", 
"1:00 h", "0 min", "45 min", "0 min", "0 min", "30 min", "1:00 h", 
"0 min", "1:00 h", "2:15 h", "30 min", "15 min", "0 min", "0 min", 
"0 min", "0 min", "0 min", "45 min", "1:00 h", "0 min", "1:30 h", 
"1:00 h", "0 min", "45 min", "0 min", "0 min", "0 min", "0 min", 
"15 min", "", "1:00 h", "0 min", "1:00 h", "30 min", "30 min", 
"30 min", "0 min", "30 min", "30 min", "45 min", "0 min", ">4 h", 
"15 min", "0 min", "45 min", "30 min", "30 min", "0 min", "1:30 h", 
"15 min", "45 min", "0 min", "15 min", "0 min", "0 min", "0 min", 
"0 min", "30 min", "2:00 h", "2:00 h", "15 min", "", "0 min", 
"0 min", "0 min", "0 min", "0 min", "0 min", "0 min", "0 min", 
"1:30 h", "15 min", "0 min", "4:00 h")), class = "data.frame", row.names = c(NA, 
-94L))

预期的输出需要如下所示:

data <- structure(list(id = c("100213", "100998", "100494", "100758", 
"100984", "100461", "100374", "140859", "150950", "140433", "200481", 
"200892", "300487", "300319", "400258", "400075", "200712", "120114", 
"500046", "500649", "600614", "700001", "300812", "700986", "800665", 
"700882", "800470", "400090", "900350", "700283", "600455", "800400", 
"100814", "700027", "600612", "30076", "500671", "200052", "600675", 
"800786", "110092", "800193", "800947", "500749", "800665", "400562", 
"150066", "200013", "700419", "400468", "600584", "ID 600366", 
"300379", "300035", "150346", "140860", "900508", "600751", "110825", 
"300848", "200990", "100237", "500137", "400751", "600347", "700289", 
"700000", "150320", "700319", "300643", "400871", "600963", "400295", 
"120120", "200875", "500209", "700380", "600646", "150415", "150415", 
"600640", "120999", "200693", "600626", "700003", "400493", "700849", 
"200544", "200001", "700801", "200084", "110951", "100371", "120114"
), vpa_2 = c("0", "", "15", "0", "0", "45", 
"15", "0", "30", "30", "0", "0", "0", 
"0", "45", "180", "0", "", "15", "0", 
"60", "0", "45", "0", "0", "30", "60", 
"0", "60", "135", "30", "15", "0", "0", 
"0", "0", "0", "45", "60", "0", "90", 
"60", "0", "45", "0", "0", "0", "0", 
"15", "", "60", "0", "60", "30", "30", 
"30", "0", "30", "30", "45", "0", "270", 
"15", "0", "45", "30", "30", "0", "90", 
"15", "45", "0", "15", "0", "0", "0", 
"0", "30", "120", "120", "15", "", "0", 
"0", "0", "0", "0", "0", "0", "0", 
"90", "15", "0", "240")), class = "data.frame", row.names = c(NA, -94L))

使用 base R 是可行的。我把解决方案放在第一位:

# make a copy of your input data
test <- data

# remove the letters from the time string
for (i in 1:nrow(test)){
  test$vpa[[i]] <- gsub("min|h"," ",test$vpa[[i]])
}

# if a string is in the hour unit (strings appearing as "xx:yy"), replace its value with the calculated minutes
for (i in 1:nrow(test)){
  if (grepl(":",test$vpa[[i]]) == TRUE){
    test$vpa[[i]] <- as.numeric(unlist(strsplit(test$vpa[[i]],":"))[1])*60 + as.numeric(unlist(strsplit(test$vpa[[i]],":"))[2])
  }
}

# check results
# The result "test" will look like your expected output
print(test)

我在这里没有使用任何包,但您可能需要一些 gsubgreplstr_split 的知识才能理解我所做的。基本上,我的逻辑是玩 strings.