将文本添加到 r 中的原子(字符)向量
add text to atomic (character) vector in r
下午好,我不是原子向量方面的专家,但我想知道一些关于它的想法
我有电影 "Coco" 的剧本,我希望能够获得以 1., 2., ... 形式编号的行(整部电影有 130 个场景)。我想把电影每个场景的一行转换成包含"Scene 1","Scene 2",直到"Scene 130"的一行,依次实现
url <- "https://www.imsdb.com/scripts/Coco.html"
coco <- read_lines("coco2.txt") #after clean
class(coco)
typeof(coco)
" 48."
[782] " arms full of offerings."
[783] " Once the family clears, Miguel is nowhere to be seen."
[784] " INT. NEARBY CORRIDOR"
[785] " Miguel and Dante hide from the patrolman. But Dante wanders"
[786] " off to inspect a side room."
[787] " INT. DEPARTMENT OF CORRECTIONS"
[788] " Miguel catches up to Dante. He overhears an exchange in a"
[789] " nearby cubicle."
[797] " 49."
[798] " And amigos, they help their amigos."
[799] " worth your while."
[800] " workstation."
[801] " Miguel perks at the mention of de la Cruz."
[809] " Miguel follows him."
[810] " 50." # Its scene number
[811] " INT. HALLWAY"
s <- grep(coco, pattern = "[^Level].[0-9].$", value = TRUE)
我的解决方案是错误的,因为它不是顺序的
v <- gsub(s, pattern = "[^Level].[0-9].$", replacement = paste("Scene", sequence(1:130)))
[1] " Scene1"
[2] " Scene1"
[3] " Scene1"
[4] " Scene1"
[5] " Scene1"
[6] " Scene1"
我不清楚 [^Level]
代表什么。但是,如果文本行末尾的数字代表场景编号,则您可以使用 ( ) 捕获数字并将它们替换为替换文本,如下所示:
v <- gsub(s, pattern = " ([0-9]{1,3})\.$", replacement = "Scene \1")
下午好,我不是原子向量方面的专家,但我想知道一些关于它的想法
我有电影 "Coco" 的剧本,我希望能够获得以 1., 2., ... 形式编号的行(整部电影有 130 个场景)。我想把电影每个场景的一行转换成包含"Scene 1","Scene 2",直到"Scene 130"的一行,依次实现
url <- "https://www.imsdb.com/scripts/Coco.html"
coco <- read_lines("coco2.txt") #after clean
class(coco)
typeof(coco)
" 48."
[782] " arms full of offerings."
[783] " Once the family clears, Miguel is nowhere to be seen."
[784] " INT. NEARBY CORRIDOR"
[785] " Miguel and Dante hide from the patrolman. But Dante wanders"
[786] " off to inspect a side room."
[787] " INT. DEPARTMENT OF CORRECTIONS"
[788] " Miguel catches up to Dante. He overhears an exchange in a"
[789] " nearby cubicle."
[797] " 49."
[798] " And amigos, they help their amigos."
[799] " worth your while."
[800] " workstation."
[801] " Miguel perks at the mention of de la Cruz."
[809] " Miguel follows him."
[810] " 50." # Its scene number
[811] " INT. HALLWAY"
s <- grep(coco, pattern = "[^Level].[0-9].$", value = TRUE)
我的解决方案是错误的,因为它不是顺序的
v <- gsub(s, pattern = "[^Level].[0-9].$", replacement = paste("Scene", sequence(1:130)))
[1] " Scene1"
[2] " Scene1"
[3] " Scene1"
[4] " Scene1"
[5] " Scene1"
[6] " Scene1"
我不清楚 [^Level]
代表什么。但是,如果文本行末尾的数字代表场景编号,则您可以使用 ( ) 捕获数字并将它们替换为替换文本,如下所示:
v <- gsub(s, pattern = " ([0-9]{1,3})\.$", replacement = "Scene \1")