data.table 的 fread() 可以跳过第二个空行并保留第一行 headers 吗?
Can data.table's fread() skip second empty row and keep first row headers?
我正在尝试读取 CSV,其中第 headers 列位于第 1 行,但第 2 行为空,数据从第 3 行开始。我尝试了下面的各种选项,但总是以通用结束V#
列名称。关于如何保留列 headers 的任何想法?
fread("https://s3.amazonaws.com/nyc-tlc/trip+data/green_tripdata_2013-08.csv",
header = F)
fread("https://s3.amazonaws.com/nyc-tlc/trip+data/green_tripdata_2013-08.csv",
skip = 0)
fread("https://s3.amazonaws.com/nyc-tlc/trip+data/green_tripdata_2013-08.csv",
skip = 1)
fread("https://s3.amazonaws.com/nyc-tlc/trip+data/green_tripdata_2013-08.csv",
blank.lines.skip = T)
fread("https://s3.amazonaws.com/nyc-tlc/trip+data/green_tripdata_2013-08.csv",
skip = 0, blank.lines.skip = T)
fread("https://s3.amazonaws.com/nyc-tlc/trip+data/green_tripdata_2013-08.csv",
header = F, skip = 0, blank.lines.skip = T)
url ="https://s3.amazonaws.com/nyc-tlc/trip+data/green_tripdata_2013-08.csv"
df = fread(url,header=F)
headers = names(fread(url, nrows=0))
setnames(df, old=1:length(headers), new = headers)
我注意到有 20 headers,但返回了 22 列。因此,我将前 20 列命名为 headers.
中的 20 个名称
正如 r2evans 在评论中所建议的那样,要避免双重 download/read,您可以这样做:
url ="https://s3.amazonaws.com/nyc-tlc/trip+data/green_tripdata_2013-08.csv"
# download file
tfile = tempfile()
curl::curl_download(url,destfile = tfile)
# read to get headers
headers = names(fread(tfile, nrows=0))
# read to get data
df = fread(tfile, header=F)
# set the names based on `headers`
setnames(df, old=1:length(headers), new = headers)
# remove the file
file.remove(tfile)
我正在尝试读取 CSV,其中第 headers 列位于第 1 行,但第 2 行为空,数据从第 3 行开始。我尝试了下面的各种选项,但总是以通用结束V#
列名称。关于如何保留列 headers 的任何想法?
fread("https://s3.amazonaws.com/nyc-tlc/trip+data/green_tripdata_2013-08.csv",
header = F)
fread("https://s3.amazonaws.com/nyc-tlc/trip+data/green_tripdata_2013-08.csv",
skip = 0)
fread("https://s3.amazonaws.com/nyc-tlc/trip+data/green_tripdata_2013-08.csv",
skip = 1)
fread("https://s3.amazonaws.com/nyc-tlc/trip+data/green_tripdata_2013-08.csv",
blank.lines.skip = T)
fread("https://s3.amazonaws.com/nyc-tlc/trip+data/green_tripdata_2013-08.csv",
skip = 0, blank.lines.skip = T)
fread("https://s3.amazonaws.com/nyc-tlc/trip+data/green_tripdata_2013-08.csv",
header = F, skip = 0, blank.lines.skip = T)
url ="https://s3.amazonaws.com/nyc-tlc/trip+data/green_tripdata_2013-08.csv"
df = fread(url,header=F)
headers = names(fread(url, nrows=0))
setnames(df, old=1:length(headers), new = headers)
我注意到有 20 headers,但返回了 22 列。因此,我将前 20 列命名为 headers.
中的 20 个名称正如 r2evans 在评论中所建议的那样,要避免双重 download/read,您可以这样做:
url ="https://s3.amazonaws.com/nyc-tlc/trip+data/green_tripdata_2013-08.csv"
# download file
tfile = tempfile()
curl::curl_download(url,destfile = tfile)
# read to get headers
headers = names(fread(tfile, nrows=0))
# read to get data
df = fread(tfile, header=F)
# set the names based on `headers`
setnames(df, old=1:length(headers), new = headers)
# remove the file
file.remove(tfile)