使用标点符号在 R 中过滤
Filtering in R with punctuation characters
我有一列数据集如下所示:
$abc.MSFT
$MSFT
$msft
$abcMSFTxyz
我想要以下输出:
$MSFT
$msft
我的过滤尝试:
dplyr::filter(Tweets, grepl("\bMMM$\b", ignore.case = TRUE, V2))
returns:
$abc.MSFT
$MSFT
$msft
或
dplyr::filter(Tweets,grepl("^$MMM$", ignore.case = TRUE, V2))
returns:
一种处理方法:
x <- c("$abc.MSFT", "$MSFT", "$msft", "$abcMSFTxyz")
Tweets <- data.frame(V2=x, stringsAsFactors=F)
Tweets
# V2
#1 $abc.MSFT
#2 $MSFT
#3 $msft
#4 $abcMSFTxyz
#your way
dplyr::filter(Tweets, grepl("\bMMM$\b", ignore.case = TRUE, V2))
[1] V2
<0 rows> (or 0-length row.names)
#another way
dplyr::filter(Tweets, grepl("^\$msft$", ignore.case = TRUE, V2))
V2
1 $MSFT
2 $msft
来自regex help:
..there are 12 characters with special meanings: the backslash \, the
caret ^, the dollar sign $, the period or dot ., the vertical bar or
pipe symbol |, the question mark ?, the asterisk or star *, the plus
sign +, the opening parenthesis (, the closing parenthesis ), and the
opening square bracket [, the opening curly brace {, These special
characters are often called "metacharacters".
修复:
If you want to use any of these characters as a literal in a regex,
you need to escape them with a backslash. If you want to match 1+1=2,
the correct regex is 1\+1=2. Otherwise, the plus sign has a special
meaning.
研究正则表达式。他们值得花时间学习您希望使用的任何语言。
我有一列数据集如下所示:
$abc.MSFT
$MSFT
$msft
$abcMSFTxyz
我想要以下输出:
$MSFT
$msft
我的过滤尝试:
dplyr::filter(Tweets, grepl("\bMMM$\b", ignore.case = TRUE, V2))
returns:
$abc.MSFT
$MSFT
$msft
或
dplyr::filter(Tweets,grepl("^$MMM$", ignore.case = TRUE, V2))
returns:
一种处理方法:
x <- c("$abc.MSFT", "$MSFT", "$msft", "$abcMSFTxyz")
Tweets <- data.frame(V2=x, stringsAsFactors=F)
Tweets
# V2
#1 $abc.MSFT
#2 $MSFT
#3 $msft
#4 $abcMSFTxyz
#your way
dplyr::filter(Tweets, grepl("\bMMM$\b", ignore.case = TRUE, V2))
[1] V2
<0 rows> (or 0-length row.names)
#another way
dplyr::filter(Tweets, grepl("^\$msft$", ignore.case = TRUE, V2))
V2
1 $MSFT
2 $msft
来自regex help:
..there are 12 characters with special meanings: the backslash \, the caret ^, the dollar sign $, the period or dot ., the vertical bar or pipe symbol |, the question mark ?, the asterisk or star *, the plus sign +, the opening parenthesis (, the closing parenthesis ), and the opening square bracket [, the opening curly brace {, These special characters are often called "metacharacters".
修复:
If you want to use any of these characters as a literal in a regex, you need to escape them with a backslash. If you want to match 1+1=2, the correct regex is 1\+1=2. Otherwise, the plus sign has a special meaning.
研究正则表达式。他们值得花时间学习您希望使用的任何语言。