如何读取格式为 %Y-%m-%d %H:%M:%OS3 的时间戳(并用它做数学运算)?
How to read in timestamps of format %Y-%m-%d %H:%M:%OS3 (and do math with it)?
我有一个 .txt 文件(没有任何明确的列分隔符),其中每一行都包含格式为 %H-%m-%d %H:%M:%OS3 的时间戳(例如“2019 -09-26 07:29:22,778") 和一个事件字符串。
我想读入数据并制作一个 table ,它在一列中显示完整的时间戳,在一列中显示事件,在第三列中显示 OS3 时间格式的时间跨度(例如“1.230”或“1,230”秒)在第 1 行中的事件和第 2 行中的事件之间,然后是第 1 行中的事件和第 3 行中的事件之间的那个等等。
我尝试在 Excel 中使用“[”作为分隔符并以 .tsv 格式保存后读取文件,这是一个不令人满意的解决方法。但是,进一步使用 dplyr difftime 函数不会导致包含毫秒的结果,尽管全局选项已设置为 3 位数秒 ("options(digits.secs=3)")。
.txt 的样子:
2019-09-26 17:54:24,406 [218] INFO - [1] - Event X
2019-09-26 17:54:24,431 [207] INFO - [1] - Event Y
2019-09-26 17:54:24,438 [218] INFO - [1] - Event Z
...
.
.
我想得到什么:
timestamp event timediff in sec
2019-09-26 17:54:24,406 Event X
2019-09-26 17:54:24,431 Event Y 0.025
2019-09-26 17:54:24,438 Event Z 0.032
...
.
.
给你:
df <- data.table::fread(text = "2019-09-26 17:54:24,406 [218] INFO - [1] - Event X
2019-09-26 17:54:24,431 [207] INFO - [1] - Event Y
2019-09-26 17:54:24,438 [218] INFO - [1] - Event Z", sep = "[", header = FALSE) # [ seems most convenient to use as sep
colnames(df) <- c("timestamp", "garbage", "event")
df
#> timestamp garbage event
#> 1: 2019-09-26 17:54:24,406 218] INFO - 1] - Event X
#> 2: 2019-09-26 17:54:24,431 207] INFO - 1] - Event Y
#> 3: 2019-09-26 17:54:24,438 218] INFO - 1] - Event Z
library(dplyr)
library(stringr)
df_clean <- df %>%
select(-garbage) %>%
mutate(timestamp = str_replace(timestamp, ",", ".")) %>% # comma must be replaced so milliseconds are recognised
mutate(timestamp = as.POSIXct(timestamp, format = "%Y-%m-%d %H:%M:%OS"),
event = str_extract(event, "Event.*"),
start_time = min(timestamp), # adding the first timestamp as new column, could be removed later
"timediff in sec" = as.numeric(timestamp - start_time, units = "secs")) # this converts difftime to numeric
df_clean
#> timestamp event start_time timediff in sec
#> 1 2019-09-26 17:54:24 Event X 2019-09-26 17:54:24 0.00000000
#> 2 2019-09-26 17:54:24 Event Y 2019-09-26 17:54:24 0.02500010
#> 3 2019-09-26 17:54:24 Event Z 2019-09-26 17:54:24 0.03200006
由 reprex package (v0.3.0)
于 2019-10-10 创建
您可以使用 [ 作为分隔符并使用 read.delim
读取 txt 文件。 3 位数字的问题是由于您使用逗号而不是点作为分隔符。这可以使用 str_replace
(或 gsub
)
修复
library(dplyr)
library(stringr)
my_df <- read.delim(text = "
2019-09-26 17:54:24,406 [218] INFO - [1] - Event X
2019-09-26 17:54:24,431 [207] INFO - [1] - Event Y
2019-09-26 17:54:24,438 [218] INFO - [1] - Event Z",
sep = "[", header = FALSE, col.names = c("timestamp", "info", "event"))
my_df
# timestamp info event
# 1 2019-09-26 17:54:24,406 218] INFO - 1] - Event X
# 2 2019-09-26 17:54:24,431 207] INFO - 1] - Event Y
# 3 2019-09-26 17:54:24,438 218] INFO - 1] - Event Z
my_df %>%
# drop the info column
select(-info) %>%
mutate(# remove anything not related to the Event
event = str_remove(event, ".*Event"),
# replace , with .
timestamp = str_replace_all(timestamp, ",", "."),
# transform to a proper timestamp
timestamp = as.POSIXct(timestamp, format="%Y-%m-%d %H:%M:%OS"),
# calculate difftime (as proposed in your previous question [1])
difftime = difftime(timestamp, timestamp[1], unit = 'sec'))
# timestamp event difftime
# 1 2019-09-26 17:54:24.405 X 0.00000000 secs
# 2 2019-09-26 17:54:24.430 Y 0.02500010 secs
# 3 2019-09-26 17:54:24.437 Z 0.03200006 secs
[1]
我有一个 .txt 文件(没有任何明确的列分隔符),其中每一行都包含格式为 %H-%m-%d %H:%M:%OS3 的时间戳(例如“2019 -09-26 07:29:22,778") 和一个事件字符串。 我想读入数据并制作一个 table ,它在一列中显示完整的时间戳,在一列中显示事件,在第三列中显示 OS3 时间格式的时间跨度(例如“1.230”或“1,230”秒)在第 1 行中的事件和第 2 行中的事件之间,然后是第 1 行中的事件和第 3 行中的事件之间的那个等等。
我尝试在 Excel 中使用“[”作为分隔符并以 .tsv 格式保存后读取文件,这是一个不令人满意的解决方法。但是,进一步使用 dplyr difftime 函数不会导致包含毫秒的结果,尽管全局选项已设置为 3 位数秒 ("options(digits.secs=3)")。
.txt 的样子:
2019-09-26 17:54:24,406 [218] INFO - [1] - Event X
2019-09-26 17:54:24,431 [207] INFO - [1] - Event Y
2019-09-26 17:54:24,438 [218] INFO - [1] - Event Z
...
.
.
我想得到什么:
timestamp event timediff in sec
2019-09-26 17:54:24,406 Event X
2019-09-26 17:54:24,431 Event Y 0.025
2019-09-26 17:54:24,438 Event Z 0.032
...
.
.
给你:
df <- data.table::fread(text = "2019-09-26 17:54:24,406 [218] INFO - [1] - Event X
2019-09-26 17:54:24,431 [207] INFO - [1] - Event Y
2019-09-26 17:54:24,438 [218] INFO - [1] - Event Z", sep = "[", header = FALSE) # [ seems most convenient to use as sep
colnames(df) <- c("timestamp", "garbage", "event")
df
#> timestamp garbage event
#> 1: 2019-09-26 17:54:24,406 218] INFO - 1] - Event X
#> 2: 2019-09-26 17:54:24,431 207] INFO - 1] - Event Y
#> 3: 2019-09-26 17:54:24,438 218] INFO - 1] - Event Z
library(dplyr)
library(stringr)
df_clean <- df %>%
select(-garbage) %>%
mutate(timestamp = str_replace(timestamp, ",", ".")) %>% # comma must be replaced so milliseconds are recognised
mutate(timestamp = as.POSIXct(timestamp, format = "%Y-%m-%d %H:%M:%OS"),
event = str_extract(event, "Event.*"),
start_time = min(timestamp), # adding the first timestamp as new column, could be removed later
"timediff in sec" = as.numeric(timestamp - start_time, units = "secs")) # this converts difftime to numeric
df_clean
#> timestamp event start_time timediff in sec
#> 1 2019-09-26 17:54:24 Event X 2019-09-26 17:54:24 0.00000000
#> 2 2019-09-26 17:54:24 Event Y 2019-09-26 17:54:24 0.02500010
#> 3 2019-09-26 17:54:24 Event Z 2019-09-26 17:54:24 0.03200006
由 reprex package (v0.3.0)
于 2019-10-10 创建您可以使用 [ 作为分隔符并使用 read.delim
读取 txt 文件。 3 位数字的问题是由于您使用逗号而不是点作为分隔符。这可以使用 str_replace
(或 gsub
)
library(dplyr)
library(stringr)
my_df <- read.delim(text = "
2019-09-26 17:54:24,406 [218] INFO - [1] - Event X
2019-09-26 17:54:24,431 [207] INFO - [1] - Event Y
2019-09-26 17:54:24,438 [218] INFO - [1] - Event Z",
sep = "[", header = FALSE, col.names = c("timestamp", "info", "event"))
my_df
# timestamp info event
# 1 2019-09-26 17:54:24,406 218] INFO - 1] - Event X
# 2 2019-09-26 17:54:24,431 207] INFO - 1] - Event Y
# 3 2019-09-26 17:54:24,438 218] INFO - 1] - Event Z
my_df %>%
# drop the info column
select(-info) %>%
mutate(# remove anything not related to the Event
event = str_remove(event, ".*Event"),
# replace , with .
timestamp = str_replace_all(timestamp, ",", "."),
# transform to a proper timestamp
timestamp = as.POSIXct(timestamp, format="%Y-%m-%d %H:%M:%OS"),
# calculate difftime (as proposed in your previous question [1])
difftime = difftime(timestamp, timestamp[1], unit = 'sec'))
# timestamp event difftime
# 1 2019-09-26 17:54:24.405 X 0.00000000 secs
# 2 2019-09-26 17:54:24.430 Y 0.02500010 secs
# 3 2019-09-26 17:54:24.437 Z 0.03200006 secs
[1]