R从数据框中的列中提取前两个字符
R Extract first two characters from a column in a dataframe
我有一个包含多个的数据集,我想从 sr
列中提取前两个 characters
。最后,这些字符将存储在一个新列中。
基本上,我想要一个新列 permit_type
,它具有 sr
的前两个字符值,即 AP
、SP
和 MP
.
我该怎么做?
示例数据
structure(list(date_received = c("11/30/2021 ", "11/30/2021 ",
"11/30/2021 ", "11/30/2021 ", "11/30/2021 ", "11/17/2021 ",
"12/3/2021 ", "12/3/2021 ", "12/13/2021 "), date_approved = c("11/30/2021",
"11/30/2021", "11/30/2021", "11/30/2021", "11/30/2021", "11/17/2021",
"12/3/2021", "12/3/2021", "12/3/2021"), sr = c("AP-21-080", "SP-21-081",
"AP-21-082", "SP-21-083", "MP-21-084", "AP-21-085", "AP-21-086",
"MP-21-087", "SP-21-088"), permit = c("AP1766856 Classroom C",
"AP1766858 Classroom A", "AP1766862 Landscape Area", "AP1766864 Classroom B",
"AO1766867", "06-SE-2420566", "06-E-2425187", "", "06-SM-2424110"
)), row.names = c(NA, -9L), class = c("tbl_df", "tbl", "data.frame"
))
方法一
library(tidyverse)
df$permit_type= df%>% str_split_fixed(df$sr, "-", 2)
# Error
Error in str_split_fixed(., df$sr, "-", 2) :
unused argument (2)
方法二
df$permit_type = df%>% str_extract(sr, "^.{2}")
# Error
Error in str_extract(., sr, "^.{2}") : unused argument ("^.{2}")
方法三
df = df %>% mutate(permit_type = str_extract_all(sr, "\b[a-z]{2}"))
# Returns permit_type with `Character(0)` values
对于最后一个选项,它应该是大写字符 ([A-Z]
) 而不是小写字符 ([a-z]
),因为输入 'sr' 列仅显示大写。此外,当模式多次出现时使用 str_extract_all
,它 returns 和 list
(默认情况下为 simplify = FALSE
)。在这里,示例显示了一次出现,因此 str_extract
会更有用,因为它 returns a vector
library(dplyr)
library(stringr)
df %>%
mutate(permit_type = str_extract(sr, "\b[A-Z]{2}"))
# A tibble: 9 × 5
date_received date_approved sr permit permit_type
<chr> <chr> <chr> <chr> <chr>
1 "11/30/2021 " 11/30/2021 AP-21-080 "AP1766856 Classroom C" AP
2 "11/30/2021 " 11/30/2021 SP-21-081 "AP1766858 Classroom A" SP
3 "11/30/2021 " 11/30/2021 AP-21-082 "AP1766862 Landscape Area" AP
4 "11/30/2021 " 11/30/2021 SP-21-083 "AP1766864 Classroom B" SP
5 "11/30/2021 " 11/30/2021 MP-21-084 "AO1766867" MP
6 "11/17/2021 " 11/17/2021 AP-21-085 "06-SE-2420566" AP
7 "12/3/2021 " 12/3/2021 AP-21-086 "06-E-2425187" AP
8 "12/3/2021 " 12/3/2021 MP-21-087 "" MP
9 "12/13/2021 " 12/3/2021 SP-21-088 "06-SM-2424110" SP
通过 str_split_fixed
直接应用于数据,我们可以将调用包装在 {}
内
df%>%
{str_split_fixed(.$sr, "-", 2)[,1]}
[1] "AP" "SP" "AP" "SP" "MP" "AP" "AP" "MP" "SP"
第二种情况类似
df%>%
{str_extract(.$sr, "^.{2}")}
[1] "AP" "SP" "AP" "SP" "MP" "AP" "AP" "MP" "SP"
在 Base R 中,您可以使用:
transform(df, permit_type = substr(sr,1,2))
date_received date_approved sr permit permit_type
1 11/30/2021 11/30/2021 AP-21-080 AP1766856 Classroom C AP
2 11/30/2021 11/30/2021 SP-21-081 AP1766858 Classroom A SP
3 11/30/2021 11/30/2021 AP-21-082 AP1766862 Landscape Area AP
4 11/30/2021 11/30/2021 SP-21-083 AP1766864 Classroom B SP
5 11/30/2021 11/30/2021 MP-21-084 AO1766867 MP
6 11/17/2021 11/17/2021 AP-21-085 06-SE-2420566 AP
7 12/3/2021 12/3/2021 AP-21-086 06-E-2425187 AP
8 12/3/2021 12/3/2021 MP-21-087 MP
9 12/13/2021 12/3/2021 SP-21-088 06-SM-2424110 SP
我有一个包含多个的数据集,我想从 sr
列中提取前两个 characters
。最后,这些字符将存储在一个新列中。
基本上,我想要一个新列 permit_type
,它具有 sr
的前两个字符值,即 AP
、SP
和 MP
.
我该怎么做?
示例数据
structure(list(date_received = c("11/30/2021 ", "11/30/2021 ",
"11/30/2021 ", "11/30/2021 ", "11/30/2021 ", "11/17/2021 ",
"12/3/2021 ", "12/3/2021 ", "12/13/2021 "), date_approved = c("11/30/2021",
"11/30/2021", "11/30/2021", "11/30/2021", "11/30/2021", "11/17/2021",
"12/3/2021", "12/3/2021", "12/3/2021"), sr = c("AP-21-080", "SP-21-081",
"AP-21-082", "SP-21-083", "MP-21-084", "AP-21-085", "AP-21-086",
"MP-21-087", "SP-21-088"), permit = c("AP1766856 Classroom C",
"AP1766858 Classroom A", "AP1766862 Landscape Area", "AP1766864 Classroom B",
"AO1766867", "06-SE-2420566", "06-E-2425187", "", "06-SM-2424110"
)), row.names = c(NA, -9L), class = c("tbl_df", "tbl", "data.frame"
))
方法一
library(tidyverse)
df$permit_type= df%>% str_split_fixed(df$sr, "-", 2)
# Error
Error in str_split_fixed(., df$sr, "-", 2) :
unused argument (2)
方法二
df$permit_type = df%>% str_extract(sr, "^.{2}")
# Error
Error in str_extract(., sr, "^.{2}") : unused argument ("^.{2}")
方法三
df = df %>% mutate(permit_type = str_extract_all(sr, "\b[a-z]{2}"))
# Returns permit_type with `Character(0)` values
对于最后一个选项,它应该是大写字符 ([A-Z]
) 而不是小写字符 ([a-z]
),因为输入 'sr' 列仅显示大写。此外,当模式多次出现时使用 str_extract_all
,它 returns 和 list
(默认情况下为 simplify = FALSE
)。在这里,示例显示了一次出现,因此 str_extract
会更有用,因为它 returns a vector
library(dplyr)
library(stringr)
df %>%
mutate(permit_type = str_extract(sr, "\b[A-Z]{2}"))
# A tibble: 9 × 5
date_received date_approved sr permit permit_type
<chr> <chr> <chr> <chr> <chr>
1 "11/30/2021 " 11/30/2021 AP-21-080 "AP1766856 Classroom C" AP
2 "11/30/2021 " 11/30/2021 SP-21-081 "AP1766858 Classroom A" SP
3 "11/30/2021 " 11/30/2021 AP-21-082 "AP1766862 Landscape Area" AP
4 "11/30/2021 " 11/30/2021 SP-21-083 "AP1766864 Classroom B" SP
5 "11/30/2021 " 11/30/2021 MP-21-084 "AO1766867" MP
6 "11/17/2021 " 11/17/2021 AP-21-085 "06-SE-2420566" AP
7 "12/3/2021 " 12/3/2021 AP-21-086 "06-E-2425187" AP
8 "12/3/2021 " 12/3/2021 MP-21-087 "" MP
9 "12/13/2021 " 12/3/2021 SP-21-088 "06-SM-2424110" SP
通过 str_split_fixed
直接应用于数据,我们可以将调用包装在 {}
df%>%
{str_split_fixed(.$sr, "-", 2)[,1]}
[1] "AP" "SP" "AP" "SP" "MP" "AP" "AP" "MP" "SP"
第二种情况类似
df%>%
{str_extract(.$sr, "^.{2}")}
[1] "AP" "SP" "AP" "SP" "MP" "AP" "AP" "MP" "SP"
在 Base R 中,您可以使用:
transform(df, permit_type = substr(sr,1,2))
date_received date_approved sr permit permit_type
1 11/30/2021 11/30/2021 AP-21-080 AP1766856 Classroom C AP
2 11/30/2021 11/30/2021 SP-21-081 AP1766858 Classroom A SP
3 11/30/2021 11/30/2021 AP-21-082 AP1766862 Landscape Area AP
4 11/30/2021 11/30/2021 SP-21-083 AP1766864 Classroom B SP
5 11/30/2021 11/30/2021 MP-21-084 AO1766867 MP
6 11/17/2021 11/17/2021 AP-21-085 06-SE-2420566 AP
7 12/3/2021 12/3/2021 AP-21-086 06-E-2425187 AP
8 12/3/2021 12/3/2021 MP-21-087 MP
9 12/13/2021 12/3/2021 SP-21-088 06-SM-2424110 SP