读取包含多个数字的 CSV 文件

Read CSV file it include several numbers

我有这样的数据,对于男性和女性列,我只需要第一行(突出显示的行)。请问我怎样才能只读这些数字并排除休息。

df <- structure(list(
  col1 = c("First", "Frequency\nPercent", "CA", "TX"),
  col2 = c("Sex_3585", "Male", "298026\n5\n9", "45678\n15\n89"),
  col3 = c("", "Female", "57039\n10\n25", "64290\n100\n258")
),
class = "data.frame",
row.names = c(NA,-4L))

                col1          col2            col3
1              First      Sex_3585                
2 Frequency\nPercent          Male          Female
3                 CA  298026\n5\n9   57039\n10\n25
4                 TX 45678\n15\n89 64290\n100\n258

首先,我创建了一个简单的数据示例。

df <- structure(list(
  col1 = c("First", "Frequency\nPercent", "CA", "TX"),
  col2 = c("Sex_3585", "Male", "298026\n5\n9", "45678\n15\n89"),
  col3 = c("", "Female", "57039\n10\n25", "64290\n100\n258")
),
class = "data.frame",
row.names = c(NA,-4L))

                col1          col2            col3
1              First      Sex_3585                
2 Frequency\nPercent          Male          Female
3                 CA  298026\n5\n9   57039\n10\n25
4                 TX 45678\n15\n89 64290\n100\n258

其次,在使用 read.csv 读取文件后,一种选择是分隔具有回车符 returns 的行(即 \n)。然后,我们可以按第一列分组,每组只保留第一行。

library(tidyverse)

df %>% 
  separate_rows(everything(), sep = "\n") %>% 
  group_by(col1) %>% 
  filter(row_number()==1)

输出

  col1      col2     col3    
  <chr>     <chr>    <chr>   
1 First     Sex_3585 ""      
2 Frequency Male     "Female"
3 Percent   Male     "Female"
4 CA        298026   "57039" 
5 TX        45678    "64290"