编写函数以引用特定列

Writing functions to reference specific columns

我必须定期从同一个 API 中提取不同的数据集,但出于不同的原因,所以我必须为许多不同的提取编写代码。我想创建一些函数来帮助解决这个问题,但我需要一些帮助。

我一直无法弄清楚如何设置该函数,以便我可以更改数据集但每次仍然从同一列中提取数据。在这个例子中,我有 3 列时间戳表示不同的东西(由这个数据组成)。我需要将此处的时区更改为我当地的时区。列名称在我的所有数据集中都将保持不变,但数据集的名称会发生​​变化。我的代码中有几个地方需要执行此操作,但我一直无法弄清楚,因此非常感谢任何建议!

此示例代码的第二部分未包含在实际代码中,但用于正确设置数据。数据来自 API,格式为 GMT。

df <- data.frame(col_1 = c(1, 2, 3, 4), 
                 time_1 = c("2021-01-20 23:58:21", "2021-01-20 21:21:00", "2021-01-20 17:14:04", "2021-01-20 01:05:18"),
                 time_2 = c("2021-01-19 23:58:21", "2021-01-19 21:21:00", "2021-01-19 17:14:04", "2021-01-19 01:05:18"),
                 time_3 = c("2021-01-18 23:46:21", "2021-01-18 36:21:00", "2021-01-18 15:14:04", "2021-01-18 01:05:18"),
                 time_4 = c("2021-01-17 23:58:21", "2021-01-17 20:21:00", "2021-01-17 18:14:04", "2021-01-17 02:05:18"))

# Not part of actual code 
df$time_1 <- as.POSIXlt(df$time_1, tz = "GMT")
df$time_2 <- as.POSIXlt(df$time_2, tz = "GMT")
df$time_3 <- as.POSIXlt(df$time_3, tz = "GMT")
df$time_4 <- as.POSIXlt(df$time_4, tz = "GMT")

# What I want it to do
# df$time_1 <- lubridate::with_tz(df$time_1, tz = "America/Los_Angeles")
# df$time_2 <- lubridate::with_tz(df$time_2, tz = "America/Los_Angeles")
# df$time_3 <- lubridate::with_tz(df$time_3, tz = "America/Los_Angeles")
# df$time_4 <- lubridate::with_tz(df$time_4, tz = "America/Los_Angeles")

# Attempted function
timezone_cleanup <- function(my_df){
  my_df$time_1 <- lubridate::with_tz(my_df$time_1, tz = "America/Los_Angeles")
  my_df$time_2 <- lubridate::with_tz(my_df$time_2, tz = "America/Los_Angeles")
  my_df$time_3 <- lubridate::with_tz(my_df$time_3, tz = "America/Los_Angeles")
  my_df$time_4 <- lubridate::with_tz(my_df$time_4, tz = "America/Los_Angeles")
}

# how I'd like to use this function.  Not working now.  Even if I wrap it with data.frame(), it's not what I wanted.
new_df <- timezone_cleanup(df)

我认为您需要在函数中 return my_df 才能取回更改后的数据框。但是,您可以使用 lapplyacross 将同一函数应用于多个列。

library(dplyr)

timezone_cleanup <- function(my_df){
  my_df %>%
     mutate(across(starts_with('time'), 
            lubridate::with_tz, tz = "America/Los_Angeles"))
}

new_df <- timezone_cleanup(df)

顺便说一下,我在使用这个 Unrecognized time zone 'America/Los_Angeles' 时确实收到了一条警告消息。您确定使用的 tz 值正确吗?