用于数字计数的 Perl 正则表达式 R-lang

Question

我正在使用 R 和新的正则表达式：我需要一个正则表达式来为类似 json 的文本提取 'statuses_count'。数据被组织成一个数据框，每行都有文本。示例数据行：

{'lang': u'en', 'profile_background_tile': False, 'statuses_count': 4414, 'description': u'Progessive,interested in the psychology of politics.

结果应该是：4414。

我正在考虑将 str_extract_all 与 perl 选项一起使用，但我不明白如何只获取 'statuses_count' 后面的数字 (?<=statuses_count.:)(某事)

作为新手，如果能理解“在'statusescount.'之后抢号”怎么说就好了，谢谢！

Answer 1

在这里，我根据 post.

的标题使用 perl 正则表达式

 library(stringr)
 str_extract_all(str1, perl("(?<=statuses_count': )\d+"))[[1]]
#[1] "4414"

可视化

(?<=statuses_count': )\d+

Debuggex Demo

或使用 stringi（大数据集更快）

 library(stringi)
  stri_extract_all_regex(str1, "(?<=statuses_count': )\d+")[[1]]
 #[1] "4414"

数据

str1 <- "{'lang': u'en', 'profile_background_tile': False, 'statuses_count': 4414, 'description': u'Progessive,interested in the psychology of politics."

Answer 2

1) 子。没有包的简单解决方案。

sub(".*'statuses_count': (\d+).*", "\1", x)
## [1] "4414"

正则表达式可视化：

.*'statuses_count': (\d+).*

Debuggex Demo

2) gsub 如果我们知道字符串中没有其他数字（如示例中的情况），则更容易，因为我们可以删除 non-digits:

gsub("\D", "", x)
## [1] "4414"

正则表达式可视化：

\D

Debuggex Demo

3) strapply 或 straplyc 这种方法涉及一个相对简单的正则表达式：

library(gsubfn)
strapplyc(x, "'statuses_count': (\d+)", simplify = TRUE)
## [1] "4414"

或者如果你想要一个数字输出：

strapply(x, "'statuses_count': (\d+)", as.numeric, simplify = TRUE)
## [1] 4414

正则表达式可视化：

'statuses_count': (\d+)

Debuggex Demo

注意:: None 其中需要 Perl 正则表达式扩展。普通的正则表达式就可以了。

用于数字计数的 Perl 正则表达式 R-lang

Perl Regex for number count R-lang

regex

r

stringr

数据