通过分隔符将列拆分为多列
Splitting column into multiple columns by separator
我有一个 data.frame 列 "offence"。每项违法行为都由一篇文章 (Art)、一段 (Abs) 和一段 (Ziff) 组成:
df<-data.frame(offence=c("Art. 110 Abs. 3 StGB","Art. 10 Abs. 1 StGB", "Art. 122 SVG", "Art. 1 Ziff. 2 UWG"))
> df
offence
1 Art. 110 Abs. 3 StGB
2 Art. 10 Abs. 1 StGB
3 Art. 122 SVG
4 Art. 1 Ziff. 2 UWG
但我需要以这种形式获得它:
Art Ziff Abs Law
1 110 NA 3 StGB
2 10 NA 1 StGB
3 122 NA NA SVG
4 1 2 NA UWG
获得此结果的最佳方法是什么? lapply?
谢谢!
您可以使用 str_extract
来自 stringr
:
library(stringr)
library(dplyr)
df$offence %>%
{data.frame(Art = str_extract(., "(?<=Art[.]\s)\d+"),
Ziff = str_extract(., "(?<=Ziff[.]\s)\d+"),
Abs = str_extract(., "(?<=Abs[.]\s)\d+"),
Law = str_extract(., "\w+$"))}
结果:
Art Ziff Abs Law
1 110 <NA> 3 StGB
2 10 <NA> 1 StGB
3 122 <NA> <NA> SVG
4 1 2 <NA> UWG
使用gsub
将其转换为dcf形式(即关键字:value),然后使用read.dcf
读取。最后将 read.dcf
生成的矩阵转换为数据框,并将任何数字列转换为数字。没有使用包。
s <- gsub("(\S+)[.] (\d+)", "\1: \2\n", df[[1]]) # convert to keyword: value
s <- sub(" (\D+)$", "Law: \1\n\n", s) # handle Law column
us <- trimws(unlist(strsplit(s, "\n"))) # split into separate components
DF <- as.data.frame(read.dcf(textConnection(us)), stringsAsFactors = FALSE)
DF[] <- lapply(DF, type.convert)
给予:
Art Abs Law Ziff
1 110 3 StGB NA
2 10 1 StGB NA
3 122 NA SVG NA
4 1 NA UWG 2
我有一个 data.frame 列 "offence"。每项违法行为都由一篇文章 (Art)、一段 (Abs) 和一段 (Ziff) 组成:
df<-data.frame(offence=c("Art. 110 Abs. 3 StGB","Art. 10 Abs. 1 StGB", "Art. 122 SVG", "Art. 1 Ziff. 2 UWG"))
> df
offence
1 Art. 110 Abs. 3 StGB
2 Art. 10 Abs. 1 StGB
3 Art. 122 SVG
4 Art. 1 Ziff. 2 UWG
但我需要以这种形式获得它:
Art Ziff Abs Law
1 110 NA 3 StGB
2 10 NA 1 StGB
3 122 NA NA SVG
4 1 2 NA UWG
获得此结果的最佳方法是什么? lapply?
谢谢!
您可以使用 str_extract
来自 stringr
:
library(stringr)
library(dplyr)
df$offence %>%
{data.frame(Art = str_extract(., "(?<=Art[.]\s)\d+"),
Ziff = str_extract(., "(?<=Ziff[.]\s)\d+"),
Abs = str_extract(., "(?<=Abs[.]\s)\d+"),
Law = str_extract(., "\w+$"))}
结果:
Art Ziff Abs Law
1 110 <NA> 3 StGB
2 10 <NA> 1 StGB
3 122 <NA> <NA> SVG
4 1 2 <NA> UWG
使用gsub
将其转换为dcf形式(即关键字:value),然后使用read.dcf
读取。最后将 read.dcf
生成的矩阵转换为数据框,并将任何数字列转换为数字。没有使用包。
s <- gsub("(\S+)[.] (\d+)", "\1: \2\n", df[[1]]) # convert to keyword: value
s <- sub(" (\D+)$", "Law: \1\n\n", s) # handle Law column
us <- trimws(unlist(strsplit(s, "\n"))) # split into separate components
DF <- as.data.frame(read.dcf(textConnection(us)), stringsAsFactors = FALSE)
DF[] <- lapply(DF, type.convert)
给予:
Art Abs Law Ziff
1 110 3 StGB NA
2 10 1 StGB NA
3 122 NA SVG NA
4 1 NA UWG 2