将文本添加到 r 中的原子(字符)向量

add text to atomic (character) vector in r

下午好,我不是原子向量方面的专家,但我想知道一些关于它的想法

我有电影 "Coco" 的剧本,我希望能够获得以 1., 2., ... 形式编号的行(整部电影有 130 个场景)。我想把电影每个场景的一行转换成包含"Scene 1","Scene 2",直到"Scene 130"的一行,依次实现

url <- "https://www.imsdb.com/scripts/Coco.html"

coco <- read_lines("coco2.txt") #after clean 
class(coco)
typeof(coco)

"                                                                        48."      
 [782] "     arms full of offerings."                                                     
 [783] "      Once the family clears, Miguel is nowhere to be seen."                      
 [784] "      INT. NEARBY CORRIDOR"                                                       
 [785] "     Miguel and Dante hide from the patrolman.     But Dante wanders"             
 [786] "     off to inspect a side room."                                                 
 [787] "      INT. DEPARTMENT OF CORRECTIONS"                                             
 [788] "     Miguel catches up to Dante.      He overhears an exchange in a"              
 [789] "     nearby cubicle."                                                             

 [797] "                                                          49."                    
 [798] "                 And amigos, they help their amigos."                             
 [799] "                 worth your while."                                               
 [800] "     workstation."                                                                
 [801] "      Miguel perks at the mention of de la Cruz."                                 


 [809] "      Miguel follows him."                                                        
 [810] "                                                                     50." # Its scene number     
 [811] "      INT. HALLWAY"      


s <- grep(coco, pattern = "[^Level].[0-9].$", value = TRUE)

我的解决方案是错误的,因为它不是顺序的

v <- gsub(s, pattern = "[^Level].[0-9].$", replacement = paste("Scene", sequence(1:130)))


[1] "                                                                   Scene1"          
  [2] "                                                                   Scene1"          
  [3] "                                                                  Scene1"           
  [4] "                                                                       Scene1"      
  [5] "                                                                    Scene1"         
  [6] "                                                                   Scene1"          

我不清楚 [^Level] 代表什么。但是,如果文本行末尾的数字代表场景编号,则您可以使用 ( ) 捕获数字并将它们替换为替换文本,如下所示:

 v <- gsub(s, pattern = " ([0-9]{1,3})\.$", replacement = "Scene \1")