基于某些参考点对数据框进行多次计算

Question

嗨，我对此有点陌生，但我想弄清楚如何在 R 上做到这一点。

我正在尝试将一堆数据集除以某些基准值，然后取它的 log()，但是数据非常大，除了使用 for 循环之外，我不知道还有什么其他方法可以处理它.

比如我有这样一个数据：

Name	Reference	Lap1	Lap2	Lap3	Lap4	Lap5
Craig	attempt1	34	21	33	21	32
Craig	attempt2	29	28	29	30	29
Craig	attempt3	25	25	24	21	26
Craig	attempt4	20	21	21	22	24
Jeff	attempt1	43	41	44	40	41
Jeff	attempt2	38	38	37	36	35
Jeff	attempt3	33	32	31	29	34
Jeff	attempt4	29	27	26	25	27

我想能把Craig中的attempt 1的每个部分都划分出来，除以其他Craig attempts，然后取log，以第一次attempt作为参考进行比较。但我也想为每个单独的专栏和 Jeff 做这个，所以最终结果变成：

Name	Reference	Lap1	Lap2	Lap3	Lap4	Lap5
Craig	attempt1	log(34/34)	log(21/21)	log(33/33)	log(21/21)	log(32/32)
Craig	attempt2	log(29/34)	log(28/21)	log(29/33)	log(30/21)	log(29/32)
Craig	attempt3	log(25/34)	log(25/21)	log(24/33)	log(21/21)	log(26/32)
Craig	attempt4	log(20/34)	log(21/21)	log(21/33)	log(22/21)	log(24/32)
Jeff	attempt1	43	41	44	40	41
Jeff	attempt2	38	38	37	36	35
Jeff	attempt3	33	32	31	29	34
Jeff	attempt4	29	27	26	25	27

我也会为 Jeff 做同样的事情，他对其他尝试的尝试 1 的参考也是如此。请记住，列数会更多，而且我会比其他人涉及的更多。

完成此计算的最佳方法是什么？

如果有帮助，我尝试添加一些起始代码。我不擅长抱歉。

row1 <- c("Name", "Reference", "Lap1", "Lap2", "Lap3", "Lap4", "Lap5")
row2 <- c("Craig", "attempt1", 34, 21, 33, 21, 32)
row3 <- c("Craig", "attempt2", 29, 28, 29, 30, 29)
row4 <- c("Craig", "attempt3", 25, 25, 24, 21, 26)
row5 <- c("Craig", "attempt4", 20, 21, 21, 22, 24)
row6 <- c("Jeff", "attempt1", 43, 41, 44, 40, 41)
row7 <- c("Jeff", "attempt2", 38, 38, 37, 36, 35)
row8 <- c("Jeff", "attempt3", 33, 32, 31, 29, 34)
row9 <- c("Jeff", "attempt4", 29, 27, 26, 25, 27)
df <- t(data.frame(row1, row2, row3, row4, row5, row6, row7, row8, row9))

Answer 1

这是一种方法，使用 dplyr::group_by 分别为每个 Name 进行计算，并使用 dplyr::across 将计算应用于所有以“lap”开头的列。末尾的有趣位 ~log(.x/first(.x)) 意味着对于我们指定的每一列，我们要应用一个公式，该公式采用值 (.x) 并将其除以组中的第一个值 (first(.x)) 然后取该比率的对数。

library(dplyr)
df %>%
  group_by(Name) %>%
  mutate(across(starts_with("lap"), ~log(.x/first(.x)))) %>%
  ungroup()

或者，如果您的数据尚未对每个名称按 attempt1 排序，您可以换行：

...
mutate(across(starts_with("lap"), ~log(.x/.x[Reference == "attempt1"]))) %>%
...

或者如果您要操作的列有其他名称，但您知道它们是（或不是）哪一列#，您可以根据位置计算它们：

mutate(across(-(1:2), ~log(.x/first(.x)))) %>%

结果

# A tibble: 8 × 7
  Name  Reference   Lap1    Lap2   Lap3    Lap4    Lap5
  <chr> <chr>      <dbl>   <dbl>  <dbl>   <dbl>   <dbl>
1 Craig attempt1   0      0       0      0       0     
2 Craig attempt2  -0.159  0.288  -0.129  0.357  -0.0984  # -0.159 = ln(29/34)
3 Craig attempt3  -0.307  0.174  -0.318  0      -0.208 
4 Craig attempt4  -0.531  0      -0.452  0.0465 -0.288 
5 Jeff  attempt1   0      0       0      0       0     
6 Jeff  attempt2  -0.124 -0.0760 -0.173 -0.105  -0.158 
7 Jeff  attempt3  -0.265 -0.248  -0.350 -0.322  -0.187 
8 Jeff  attempt4  -0.394 -0.418  -0.526 -0.470  -0.418

您的示例数据不是标准数据框，如果您指定列而不是行，将更容易处理。

df <- data.frame(
  stringsAsFactors = FALSE,
              Name = c("Craig","Craig","Craig",
                       "Craig","Jeff","Jeff","Jeff","Jeff"),
         Reference = c("attempt1","attempt2",
                       "attempt3","attempt4","attempt1","attempt2","attempt3",
                       "attempt4"),
              Lap1 = c(34L, 29L, 25L, 20L, 43L, 38L, 33L, 29L),
              Lap2 = c(21L, 28L, 25L, 21L, 41L, 38L, 32L, 27L),
              Lap3 = c(33L, 29L, 24L, 21L, 44L, 37L, 31L, 26L),
              Lap4 = c(21L, 30L, 21L, 22L, 40L, 36L, 29L, 25L),
              Lap5 = c(32L, 29L, 26L, 24L, 41L, 35L, 34L, 27L)
)

基于某些参考点对数据框进行多次计算

Multiple calculations on a dataframe based on certain reference points

r

data-wrangling