R Read.table 一列有多个词
R Read.table with multiple words in a column
我有一个这种类型的日志文件要在 R 中处理:
2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Start entimICE Application Command Line Parameters ******
2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Config-File: E:/Program Files (x86)/conf/storages.dsconfig
2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Datasource: datasource
2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Application: App
2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Ignore : false
2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Plugin: com.plug
2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Logging: E:/Program Files (x86)/conf/log4j.properties
2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** End Application Command Line Parameters ******
2015-11-23 11:51:02,129 INFO BaseRuntime - Runtime created in mode: RichClient
我试图将它放在一个数据框中,并读取 table,但它把我的每个单词都放在一列中,我想要一个包含 5 列的数据框:
date time type element text
2015-11-23 11:25::02,082 info FrameworkAplication - ****** Start entimICE Application Command Line Parameters ******
问题是我的字段分隔符是 space 以及我不希望在不同字段中使用的单词分隔符
是否可以通过 read.table 或扫描,或者我应该执行我自己的功能?
谢谢,
@ma33kael 你有没有试过一式两份的解决方案?
因为它按预期工作
library(readr)
a <- read_fwf(text, fwf_widths(c(10,13,6,1)))
给你:
X1 X2 X3 X4
1 2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Start entimICE Application Command Line Parameters ******
2 2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Config-File: E:/Program Files (x86)/conf/storages.dsconfig
3 2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Datasource: datasource
4 2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Application: App
5 2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Ignore : false
6 2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Plugin: com.plug
7 2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Logging: E:/Program Files (x86)/conf/log4j.properties
8 2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** End Application Command Line Parameters ******
9 2015-11-23 11:51:02,129 INFO BaseRuntime - Runtime created in mode: RichClient
数据:
text <- "2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Start entimICE Application Command Line Parameters ******
2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Config-File: E:/Program Files (x86)/conf/storages.dsconfig
2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Datasource: datasource
2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Application: App
2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Ignore : false
2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Plugin: com.plug
2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Logging: E:/Program Files (x86)/conf/log4j.properties
2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** End Application Command Line Parameters ******
2015-11-23 11:51:02,129 INFO BaseRuntime - Runtime created in mode: RichClient"
我有一个这种类型的日志文件要在 R 中处理:
2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Start entimICE Application Command Line Parameters ******
2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Config-File: E:/Program Files (x86)/conf/storages.dsconfig
2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Datasource: datasource
2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Application: App
2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Ignore : false
2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Plugin: com.plug
2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Logging: E:/Program Files (x86)/conf/log4j.properties
2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** End Application Command Line Parameters ******
2015-11-23 11:51:02,129 INFO BaseRuntime - Runtime created in mode: RichClient
我试图将它放在一个数据框中,并读取 table,但它把我的每个单词都放在一列中,我想要一个包含 5 列的数据框:
date time type element text
2015-11-23 11:25::02,082 info FrameworkAplication - ****** Start entimICE Application Command Line Parameters ******
问题是我的字段分隔符是 space 以及我不希望在不同字段中使用的单词分隔符
是否可以通过 read.table 或扫描,或者我应该执行我自己的功能?
谢谢,
@ma33kael 你有没有试过一式两份的解决方案? 因为它按预期工作
library(readr)
a <- read_fwf(text, fwf_widths(c(10,13,6,1)))
给你:
X1 X2 X3 X4
1 2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Start entimICE Application Command Line Parameters ******
2 2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Config-File: E:/Program Files (x86)/conf/storages.dsconfig
3 2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Datasource: datasource
4 2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Application: App
5 2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Ignore : false
6 2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Plugin: com.plug
7 2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Logging: E:/Program Files (x86)/conf/log4j.properties
8 2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** End Application Command Line Parameters ******
9 2015-11-23 11:51:02,129 INFO BaseRuntime - Runtime created in mode: RichClient
数据:
text <- "2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Start entimICE Application Command Line Parameters ******
2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Config-File: E:/Program Files (x86)/conf/storages.dsconfig
2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Datasource: datasource
2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Application: App
2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Ignore : false
2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Plugin: com.plug
2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** Logging: E:/Program Files (x86)/conf/log4j.properties
2015-11-23 11:51:02,082 INFO FrameworkApplication - ****** End Application Command Line Parameters ******
2015-11-23 11:51:02,129 INFO BaseRuntime - Runtime created in mode: RichClient"