如何从R中的列中的每一行中删除前三个字符
How to remove the first three characters from every row in a column in R
我有一个大型数据集,其中有一列文本,20K 行。想要删除该特定列中每行开头的前 x 个字符(例如 3)。感谢您的协助。
您可以使用 gsub
函数和简单的正则表达式来完成。这是代码:
# Fake data frame
df <- data.frame(text_col = c("abcd", "abcde", "abcdef"))
df$text_col <- as.character(df$text_col)
# Replace first 3 chracters with empty string ""
df$text_col <- gsub("^.{0,3}", "", df$text_col)
通过 tidyverse
我们可以使用 str_sub
(和一些样本 fruit
文本字符串)通过直接指定起点和终点来做到这一点:
library(tidyverse)
tbl <- tibble(some_fruit = fruit)
tbl
#> # A tibble: 80 x 1
#> some_fruit
#> <chr>
#> 1 apple
#> 2 apricot
#> 3 avocado
#> 4 banana
#> 5 bell pepper
#> 6 bilberry
#> 7 blackberry
#> 8 blackcurrant
#> 9 blood orange
#> 10 blueberry
#> # … with 70 more rows
tbl %>%
mutate(chopped_fruit = str_sub(fruit, 4, -1))
#> # A tibble: 80 x 2
#> some_fruit chopped_fruit
#> <chr> <chr>
#> 1 apple le
#> 2 apricot icot
#> 3 avocado cado
#> 4 banana ana
#> 5 bell pepper l pepper
#> 6 bilberry berry
#> 7 blackberry ckberry
#> 8 blackcurrant ckcurrant
#> 9 blood orange od orange
#> 10 blueberry eberry
#> # … with 70 more rows
由 reprex package (v0.2.1)
于 2019-02-22 创建
像往常一样..在 R 中做事的方式有很多!
你也可以试试?substring
:
lotsofdata <- data.frame(column.1=c("DataPoint1", "DataPoint2", "DataPoint3", "DataPoint4"),
+ column2=c("MoreData1","MoreData2","MoreData3", "MoreData4"),
+ stringsAsFactors=FALSE)
> head(lotsofdata)
column.1 column2
1 DataPoint1 MoreData1
2 DataPoint2 MoreData2
3 DataPoint3 MoreData3
4 DataPoint4 MoreData4
> substring(lotsofdata[,2],4,nchar(lotsofdata[,2]))
[1] "eData1" "eData2" "eData3" "eData4"
或第 1 列 [,1]
> substring(lotsofdata[,1],4,nchar(lotsofdata[,1]))
[1] "aPoint1" "aPoint2" "aPoint3" "aPoint4"
那就直接替换吧:
x<-substring(lotsofdata[,1],4,nchar(lotsofdata[,1]))
lotsofdata$column.1<-x
> head(lotsofdata)
column.1 column2
1 aPoint1 MoreData1
2 aPoint2 MoreData2
3 aPoint3 MoreData3
4 aPoint4 MoreData4
我有一个大型数据集,其中有一列文本,20K 行。想要删除该特定列中每行开头的前 x 个字符(例如 3)。感谢您的协助。
您可以使用 gsub
函数和简单的正则表达式来完成。这是代码:
# Fake data frame
df <- data.frame(text_col = c("abcd", "abcde", "abcdef"))
df$text_col <- as.character(df$text_col)
# Replace first 3 chracters with empty string ""
df$text_col <- gsub("^.{0,3}", "", df$text_col)
通过 tidyverse
我们可以使用 str_sub
(和一些样本 fruit
文本字符串)通过直接指定起点和终点来做到这一点:
library(tidyverse)
tbl <- tibble(some_fruit = fruit)
tbl
#> # A tibble: 80 x 1
#> some_fruit
#> <chr>
#> 1 apple
#> 2 apricot
#> 3 avocado
#> 4 banana
#> 5 bell pepper
#> 6 bilberry
#> 7 blackberry
#> 8 blackcurrant
#> 9 blood orange
#> 10 blueberry
#> # … with 70 more rows
tbl %>%
mutate(chopped_fruit = str_sub(fruit, 4, -1))
#> # A tibble: 80 x 2
#> some_fruit chopped_fruit
#> <chr> <chr>
#> 1 apple le
#> 2 apricot icot
#> 3 avocado cado
#> 4 banana ana
#> 5 bell pepper l pepper
#> 6 bilberry berry
#> 7 blackberry ckberry
#> 8 blackcurrant ckcurrant
#> 9 blood orange od orange
#> 10 blueberry eberry
#> # … with 70 more rows
由 reprex package (v0.2.1)
于 2019-02-22 创建像往常一样..在 R 中做事的方式有很多!
你也可以试试?substring
:
lotsofdata <- data.frame(column.1=c("DataPoint1", "DataPoint2", "DataPoint3", "DataPoint4"),
+ column2=c("MoreData1","MoreData2","MoreData3", "MoreData4"),
+ stringsAsFactors=FALSE)
> head(lotsofdata)
column.1 column2
1 DataPoint1 MoreData1
2 DataPoint2 MoreData2
3 DataPoint3 MoreData3
4 DataPoint4 MoreData4
> substring(lotsofdata[,2],4,nchar(lotsofdata[,2]))
[1] "eData1" "eData2" "eData3" "eData4"
或第 1 列 [,1]
> substring(lotsofdata[,1],4,nchar(lotsofdata[,1]))
[1] "aPoint1" "aPoint2" "aPoint3" "aPoint4"
那就直接替换吧:
x<-substring(lotsofdata[,1],4,nchar(lotsofdata[,1]))
lotsofdata$column.1<-x
> head(lotsofdata)
column.1 column2
1 aPoint1 MoreData1
2 aPoint2 MoreData2
3 aPoint3 MoreData3
4 aPoint4 MoreData4