在 R 中拆分字符串 - 提取时间戳
Splitting character string in R - Extracting the timestamp
提前感谢您的任何反馈。
我正在尝试清理 R 中的一些数据,其中时间戳和文本字符串一起包含在同一单元格中。我没有得到预期的结果。我知道正则表达式需要验证工作,但只是测试这个特定的功能
预计:
"04/05/2018 17:14:35" " -(补充评论)更新"
实际:
"04/05/2018 17:14:35 -(补充评论)更新"
我尝试了什么:
string <- "04/05/2018 17:14:35 -(Additional comments) update"
pattern <- "[:digit:][:digit:][:punct:]
[:digit:][:digit:][:punct:]
[:digit:][:digit:][:digit:][:digit:]
[[:space:]]
[:digit:][:digit:]
[:punct:]
[:digit:][:digit:]
[:punct:]
[:digit:][:digit:]"
strsplit(string, pattern)
我也试过这个变体,结果一样
pattern <- "[:digit:][:digit:]\/
[:digit:][:digit:]\/
[:digit:][:digit:][:digit:][:digit:]
[[:space:]]
[:digit:][:digit:]
\:
[:digit:][:digit:]
\:
[:digit:][:digit:]"
你可以试试:
string <- "04/05/2018 17:14:35 -(Additional comments) update"
gsub("(\d{2}/\d{2}/\d{4} \d{2}:\d{2}:\d{2}).*","\1", string)
#[1] "04/05/2018 17:14:35"
#RHS part
gsub("(\d{2}/\d{2}/\d{4} \d{2}:\d{2}:\d{2})(.*)","\2", string)
#" -(Additional comments) update"
正则表达式解释:
\d{2}
- 2 位数
\d{4}
- 4 位数
/
- 分隔符
:
- 分隔符
()
- 选择组
.*
- 后跟任何内容
看来 OP 很热衷于使用 strsplit
。一种选择可以是:
strsplit(gsub("(\d{2}/\d{2}/\d{4} \d{2}:\d{2}:\d{2})(.*)",
paste("\1","####","\2",sep=""), string), split = "####")
# [[1]]
# [1] "04/05/2018 17:14:35" " -(Additional comments) update"
试试这个:
sub('-.*','',string)
[1] "04/05/2018 17:14:35 "
提前感谢您的任何反馈。
我正在尝试清理 R 中的一些数据,其中时间戳和文本字符串一起包含在同一单元格中。我没有得到预期的结果。我知道正则表达式需要验证工作,但只是测试这个特定的功能
预计:
"04/05/2018 17:14:35" " -(补充评论)更新"
实际:
"04/05/2018 17:14:35 -(补充评论)更新"
我尝试了什么:
string <- "04/05/2018 17:14:35 -(Additional comments) update"
pattern <- "[:digit:][:digit:][:punct:]
[:digit:][:digit:][:punct:]
[:digit:][:digit:][:digit:][:digit:]
[[:space:]]
[:digit:][:digit:]
[:punct:]
[:digit:][:digit:]
[:punct:]
[:digit:][:digit:]"
strsplit(string, pattern)
我也试过这个变体,结果一样
pattern <- "[:digit:][:digit:]\/
[:digit:][:digit:]\/
[:digit:][:digit:][:digit:][:digit:]
[[:space:]]
[:digit:][:digit:]
\:
[:digit:][:digit:]
\:
[:digit:][:digit:]"
你可以试试:
string <- "04/05/2018 17:14:35 -(Additional comments) update"
gsub("(\d{2}/\d{2}/\d{4} \d{2}:\d{2}:\d{2}).*","\1", string)
#[1] "04/05/2018 17:14:35"
#RHS part
gsub("(\d{2}/\d{2}/\d{4} \d{2}:\d{2}:\d{2})(.*)","\2", string)
#" -(Additional comments) update"
正则表达式解释:
\d{2}
- 2 位数\d{4}
- 4 位数/
- 分隔符:
- 分隔符()
- 选择组.*
- 后跟任何内容
看来 OP 很热衷于使用 strsplit
。一种选择可以是:
strsplit(gsub("(\d{2}/\d{2}/\d{4} \d{2}:\d{2}:\d{2})(.*)",
paste("\1","####","\2",sep=""), string), split = "####")
# [[1]]
# [1] "04/05/2018 17:14:35" " -(Additional comments) update"
试试这个:
sub('-.*','',string)
[1] "04/05/2018 17:14:35 "