read_tsv returns 1 列 df 需要很多列
read_tsv returns a 1 column df expected many columns
我正在尝试将 tsv 文件读入 r。使用 rstudio 的查看文件实用程序,我的原始文件如下所示:
nzid | converted | logins_cnt | shootypes_cnt | galleries_cnt | photos_cnt | favorite_images_cnt | image_downloaded_cnt | gallery_visitors_cnt | storage_used | shared_gallery_cnt | password_set | site_created | site_published | pricelist_created | used_desktop | custom_domain | added_watermark | added_galley | added_logo | added_social_link
--------------------------------------+------------+------------+---------------+---------------+------------+---------------------+----------------------+----------------------+---------------------+--------------------+--------------+--------------+----------------+-------------------+--------------+---------------+-----------------+--------------+------------+-------------------
abc123 | | 0 | 4 | 0 | 31 | 0.000000 | 0.000000 | 4.000000 | 278895839.000000 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0
jhgfdfghj543454 | | 1 | 9 | 0 | 140 | 2.000000 | 1127.000000 | 137.000000 | 1077768195.000000 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0
ijhgfdrfgh765456 | | 0 | 4 | 0 | 30 | 0 | 0 | 0 | 278796703.000000 | 0 | 1 |
我尝试了什么:
rawd <- read_tsv('training-data.tsv')
运行但是:
rawd %>% glimpse
Rows: 10,173
Columns: 1
$ `nzid | converted | logins_cnt | shootypes_cnt | galleries_cnt | photos_cnt | favorite_images_cnt | image_downloaded_cnt | gallery_visitors_cnt | storage_used | shared_gallery_cnt | password_set | site_created | site_published | pricelist_created | used_desktop | custom_domain | added_watermark | added_galley | added_logo | added_social_link` <chr> …
所有内容都在一栏中。
从原始 tsv 文件来看,似乎使用竖线分隔字段。尝试过:
rawd <- read_tsv('training-data.tsv', delim = '|')
Error in read_tsv("training-data.tsv", delim = "|") :
unused argument (delim = "|")
意外,因为 delim 是带有帮助的参数 ?read_tsv
。
如何将 'tsv' 文件读入 r?假设它确实是一个 tsv 文件?
最后使用注释中的数据:
L <- readLines('training-data.tsv')
DF <- read.table(text = L[-2], sep = "|", strip.white = TRUE,
header = TRUE, fill = TRUE)
str(DF)
给予:
'data.frame': 3 obs. of 21 variables:
$ nzid : chr "abc123" "jhgfdfghj543454" "ijhgfdrfgh765456"
$ converted : logi NA NA NA
$ logins_cnt : int 0 1 0
$ shootypes_cnt : int 4 9 4
$ galleries_cnt : int 0 0 0
$ photos_cnt : int 31 140 30
$ favorite_images_cnt : num 0 2 0
$ image_downloaded_cnt: num 0 1127 0
$ gallery_visitors_cnt: num 4 137 0
$ storage_used : num 2.79e+08 1.08e+09 2.79e+08
$ shared_gallery_cnt : int 0 1 0
$ password_set : int 1 1 1
$ site_created : int 0 0 NA
$ site_published : int 0 0 NA
$ pricelist_created : int 0 0 NA
$ used_desktop : int 1 0 NA
$ custom_domain : int 0 0 NA
$ added_watermark : int 0 0 NA
$ added_galley : int 1 1 NA
$ added_logo : int 0 0 NA
$ added_social_link : int 0 0 NA
备注
Lines <- " nzid | converted | logins_cnt | shootypes_cnt | galleries_cnt | photos_cnt | favorite_images_cnt | image_downloaded_cnt | gallery_visitors_cnt | storage_used | shared_gallery_cnt | password_set | site_created | site_published | pricelist_created | used_desktop | custom_domain | added_watermark | added_galley | added_logo | added_social_link
--------------------------------------+------------+------------+---------------+---------------+------------+---------------------+----------------------+----------------------+---------------------+--------------------+--------------+--------------+----------------+-------------------+--------------+---------------+-----------------+--------------+------------+-------------------
abc123 | | 0 | 4 | 0 | 31 | 0.000000 | 0.000000 | 4.000000 | 278895839.000000 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0
jhgfdfghj543454 | | 1 | 9 | 0 | 140 | 2.000000 | 1127.000000 | 137.000000 | 1077768195.000000 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0
ijhgfdrfgh765456 | | 0 | 4 | 0 | 30 | 0 | 0 | 0 | 278796703.000000 | 0 | 1 | "
writeLines(Lines, "training-data.tsv")
我正在尝试将 tsv 文件读入 r。使用 rstudio 的查看文件实用程序,我的原始文件如下所示:
nzid | converted | logins_cnt | shootypes_cnt | galleries_cnt | photos_cnt | favorite_images_cnt | image_downloaded_cnt | gallery_visitors_cnt | storage_used | shared_gallery_cnt | password_set | site_created | site_published | pricelist_created | used_desktop | custom_domain | added_watermark | added_galley | added_logo | added_social_link
--------------------------------------+------------+------------+---------------+---------------+------------+---------------------+----------------------+----------------------+---------------------+--------------------+--------------+--------------+----------------+-------------------+--------------+---------------+-----------------+--------------+------------+-------------------
abc123 | | 0 | 4 | 0 | 31 | 0.000000 | 0.000000 | 4.000000 | 278895839.000000 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0
jhgfdfghj543454 | | 1 | 9 | 0 | 140 | 2.000000 | 1127.000000 | 137.000000 | 1077768195.000000 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0
ijhgfdrfgh765456 | | 0 | 4 | 0 | 30 | 0 | 0 | 0 | 278796703.000000 | 0 | 1 |
我尝试了什么:
rawd <- read_tsv('training-data.tsv')
运行但是:
rawd %>% glimpse
Rows: 10,173
Columns: 1
$ `nzid | converted | logins_cnt | shootypes_cnt | galleries_cnt | photos_cnt | favorite_images_cnt | image_downloaded_cnt | gallery_visitors_cnt | storage_used | shared_gallery_cnt | password_set | site_created | site_published | pricelist_created | used_desktop | custom_domain | added_watermark | added_galley | added_logo | added_social_link` <chr> …
所有内容都在一栏中。
从原始 tsv 文件来看,似乎使用竖线分隔字段。尝试过:
rawd <- read_tsv('training-data.tsv', delim = '|')
Error in read_tsv("training-data.tsv", delim = "|") :
unused argument (delim = "|")
意外,因为 delim 是带有帮助的参数 ?read_tsv
。
如何将 'tsv' 文件读入 r?假设它确实是一个 tsv 文件?
最后使用注释中的数据:
L <- readLines('training-data.tsv')
DF <- read.table(text = L[-2], sep = "|", strip.white = TRUE,
header = TRUE, fill = TRUE)
str(DF)
给予:
'data.frame': 3 obs. of 21 variables:
$ nzid : chr "abc123" "jhgfdfghj543454" "ijhgfdrfgh765456"
$ converted : logi NA NA NA
$ logins_cnt : int 0 1 0
$ shootypes_cnt : int 4 9 4
$ galleries_cnt : int 0 0 0
$ photos_cnt : int 31 140 30
$ favorite_images_cnt : num 0 2 0
$ image_downloaded_cnt: num 0 1127 0
$ gallery_visitors_cnt: num 4 137 0
$ storage_used : num 2.79e+08 1.08e+09 2.79e+08
$ shared_gallery_cnt : int 0 1 0
$ password_set : int 1 1 1
$ site_created : int 0 0 NA
$ site_published : int 0 0 NA
$ pricelist_created : int 0 0 NA
$ used_desktop : int 1 0 NA
$ custom_domain : int 0 0 NA
$ added_watermark : int 0 0 NA
$ added_galley : int 1 1 NA
$ added_logo : int 0 0 NA
$ added_social_link : int 0 0 NA
备注
Lines <- " nzid | converted | logins_cnt | shootypes_cnt | galleries_cnt | photos_cnt | favorite_images_cnt | image_downloaded_cnt | gallery_visitors_cnt | storage_used | shared_gallery_cnt | password_set | site_created | site_published | pricelist_created | used_desktop | custom_domain | added_watermark | added_galley | added_logo | added_social_link
--------------------------------------+------------+------------+---------------+---------------+------------+---------------------+----------------------+----------------------+---------------------+--------------------+--------------+--------------+----------------+-------------------+--------------+---------------+-----------------+--------------+------------+-------------------
abc123 | | 0 | 4 | 0 | 31 | 0.000000 | 0.000000 | 4.000000 | 278895839.000000 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0
jhgfdfghj543454 | | 1 | 9 | 0 | 140 | 2.000000 | 1127.000000 | 137.000000 | 1077768195.000000 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0
ijhgfdrfgh765456 | | 0 | 4 | 0 | 30 | 0 | 0 | 0 | 278796703.000000 | 0 | 1 | "
writeLines(Lines, "training-data.tsv")