将带有时间戳的 csv 读取到 R。在 table.read 中定义 colClass

Question

我正在尝试读取 table（.CSV 120K x 21 宽）将对象类分配给具有以下内容的列：

read.table(file = "G1to21jan2015.csv", 
           header = TRUE, 
           colClasses = c (rep("POSICXct", 6), 
                           rep("numeric", 2), 
                           rep("POSICXct", 2),  
                           "numeric", 
                           NULL, 
                           "numeric", 
                           NULL, 
                           rep("character", 2), 
                           rep("numeric", 5))
)

我收到以下错误：

Error in read.table(file = "G1to21jan2015.csv", header = TRUE, colClasses = c(rep("POSICXct",  : 
  more columns than column names

我已经确认 csv 有 21 列，所以（我相信）符合我的要求。

通过删除第二个参数 header = TRUE，我得到了一个不同的错误：

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
  line 1 did not have 19 elements

备注我正在使用 POSICXct 以以下格式读取数据： 1/5/2015 15:00:00 其中 m/d/Y H:M、numeric 将 1559、NULL 等数据读取到列是空的，我想跳过 character for text

Answer 1

对于非常规的日期时间格式，可以导入为字符（第 1 步），然后通过 strp 强制转换列（第 2 步）

第 1 步

df <- read.table(file = "data.csv",
                        header = TRUE,
                        sep = "," ,
                        dec = "." ,
                        colClasses = "character",
                        comment.char = ""
                  )

第 2 步

strptime(df$v1, "%m/%d/%y  %H:%M")

v1 是要强制转换的列的名称（在本例中为非常规格式的日期时间 12/13/2014 15:16:17）

备注使用参数 sep 是必要的，因为 read.table 默认为 sep = "".
使用 read.csv 时无需使用 sep 参数，默认为 ",".
使用 comment.char = ""（如果可能）可以缩短阅读时间。
http://cran.r-project.org/doc/manuals/r-release/R-data.pdf

中的有用信息

将带有时间戳的 csv 读取到 R。在 table.read 中定义 colClass

Read csv with timestamp to R. Define colClass in table.read

csv

r

read.table