按定界符解析sqldf

Question

我有一个如下所示的数据框：

                 Col1     Col2
123,bnh12,1242,mdmdmd        8
0923,3mdn42,76,ieieie       10

如何用逗号 , 解析此数据集并在 sqldf 中获得如下所示的预期输出？

                 Col1     Col2    NewCol    NewCol2   
123,bnh12,1242,mdmdmd        8       123        123
0923,3mdn42,76,ieieie       10      0923         76

我能够得到 NewCol 的第一个数字，但无法弄清楚 NewCol2：

df1 <- sqldf("SELECT *, SUBSTR([Col1], 1, INSTR([Col1],',')-1) [NewCol] FROM df")

Answer 1

df <- sqldf("SELECT *, SUBSTR([Col1], 1, INSTR([Col1],',')-1) [NewCol] FROM df")

df<- sqldf("SELECT *, replace([Col1], [NewCol], '') [Removal of NewCol] from df")

df <- sqldf("select *, substr([Removal of NewCol], 2) as [Removal of NewCol without comma] from df")

df <- sqldf("SELECT *, SUBSTR([Removal of NewCol without comma], 1, INSTR([Removal of NewCol without comma],',')-1) [Middle_UnImportant] FROM df")

df <- sqldf("SELECT *, replace([Removal of NewCol without comma], [Middle_UnImportant], '') [Anything After] from df")

df <- sqldf("select *, substr([Anything After], 2) as [Anything After without comma] from df")

df <- sqldf("SELECT *, SUBSTR([Anything After without comma], 1, INSTR([Anything After without comma],',')-1) [NewCol2] FROM df")

Answer 2

对于 NewCol1 使用问题中的代码，对于 NewCol2 使用 strFilter 删除所有不是逗号或数字的字符。然后 trim 数字关闭两端，然后 trim 逗号关闭两端。然后左侧 trim 更多数字，然后左侧 trim 逗号。

library(sqldf)

sqldf("select *,
 SUBSTR(Col1, 1, INSTR([Col1], ',') - 1) NewCol1,
 ltrim(ltrim(trim(trim(strFilter(Col1, ',0123456789'), '0123456789'), ','), 
   '0123456789'), ',') NewCol2
 from df")

给予：

                   Col1 Col2 NewCol1 NewCol2
1 123,bnh12,1242,mdmdmd    8     123    1242
2 0923,3mdn42,76,ieieie   10    0923      76

h2 数据库

以上使用默认的 RSQLite 后端，但如果我们使用 RH2 后端，那么我们有更多的字符串操作函数可以使用：

library(sqldf)
library(RH2)  # sqldf will notice this is loaded and use it

sqldf("SELECT *, 
       regexp_replace(Col1, ',.*', '') NewCol1,
       regexp_replace(Col1, '^[^,]*,[^,]*,|,[^,]*$', '') NewCol2
       FROM df")

按定界符解析sqldf

Parsing by Delimitor sqldf

r

sqldf

h2 数据库