处理在 R 括号中的文本内具有分隔符值的数据集

Question

我在使用书面文本的数据集上遇到了一个简单的问题，你会看到很多社交媒体，人们在他们的写作过程中明智地使用逗号。整个文本位于数据集的第 1 列中，后跟日期列，依此类推。数据为.xls格式，以逗号分隔，然后将每个单元格放在pa运行theses中。它看起来像这样：

"Come and get around, we have ice cream!", "2021-02-02", "lorem ipsum"

使用逗号作为分隔符会多出一列。

我使用了正常的读取 table 函数，如果我需要使用正则表达式或者我应该把它放在哪里，我就无法理解。

感谢任何提示！

编辑：

这是数据集的示例和我的简单代码运行

这些是原始 xls 的前两行：

"Text","Time of posting","Reach","Comments"
"Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur?","2020-11-15T18:23:32","28360","5689"

使用 Rstudio for xls 的导入工具没有分隔符选项，所以我使用 read.table 并在 .csv 上得到了相同的数据集，代码如下：

                 header = TRUE,
                 sep=',',
                 skip= 5)´´´

It resulted in every single comma generating a new a new column, when what i actually want is just for commas outside the parentheses to generat new columns.

Answer 1

如果您有 excel 格式的数据

默认情况下，您可以使用 readxl 包中的 read_excel() 函数，它将括号内的所有内容视为字符串

library(readxl)
read_excel("C:/Users/User/Google Drive/Trading/Test.xls") # do not use ```sep``` argument

# A tibble: 4 x 4
  A             B                   C     D       
  <chr>         <chr>               <chr> <chr>   
1 awsdf         (Alternativa, hoje) Tod   XLLLsss 
2 hoj           as                  aqwe  was     
3 hey           hello               world hurry up
4 (trust, code) check               hoj   hun

然后就可以使用gsub函数去掉文本中的逗号了

如果您有 .csv 格式的数据，则需要使用 read.csv 而不是 read.table，并且不要指定 sep 参数

read.table("C:/Users/User/Google Drive/Trading/Test.csv")
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  line 2 did not have 2 elements
read.csv("C:/Users/User/Google Drive/Trading/Test.csv")
              A                   B     C        D
1         awsdf (Alternativa, hoje)   Tod  XLLLsss
2           hoj                  as  aqwe      was
3           hey               hello world hurry up
4 (trust, code)               check   hoj      hun

处理在 R 括号中的文本内具有分隔符值的数据集

Dealing with Dataset that has separator value inside text on parentheses on R

r

data-wrangling