Python 中的正则表达式调整 txt 文件
Regex in Python to adjust a txt file
下面是我的文字:
['A1_(group)', 'album: "Here We Come" (1999)', 'Forever In Love', "\n\r\nLove leads to laughter\r\nLove leads to pain\r\nWith you by my side\r\nI feel good times again\n\r\nNever have I felt these feelings before\r\nYou showed me the world\r\nHow could I ask for more\n\r\nAnd although there's confusion\r\nWe'll find a solution to keep my heart close to you\n\r\nAnd I know, yes I know\r\nIf you hold me, believe me\r\nI'll never, never ever leave\n\r\nAnd I know\r\nThere is nothing that I would not do for you\r\nForever be true\r\nAnd I know\r\nAlthough times can be hard\r\nWe will see it through\r\nI'm forever in love with you\n\r\nShow me affection\r\nIn all different ways\r\nGive you my heart\r\nFor the rest of my days\n\r\nWith you all my troubles are left far behind \r\nLike heaven on earth\r\nWhen I look in your eyes\n\r\nAnd although there's confusion\r\nWe'll find a solution\r\nTo keep my heart close to you\n\r\nAnd I know, yes I know\r\nIf you hold me, believe me\r\nI'll never, never ever leave\n\r\nAnd I know\r\nThere is nothing that I would not do for you\r\nForever be true\r\nAnd I know\r\nAlthough times can be hard\r\nWe will see it through\r\nI'm forever in love with you\n\r\nNo need to cry\r\nI'll be right by your side\r\n(Right by your side)\n\r\nLet's take our time\r\nLove won't run dry\r\nIf you hold me, believe me\r\nI'll never, never ever leave\n\r\nAnd I know\r\nThere is nothing that I would not do for you\r\nForever be true\r\nAnd I know\r\nAlthough times can be hard\r\nWe will see it through\r\nI'm forever in love\r\nAnd I know\r\nThere is nothing that I would not do for you\n\r\nForever be true\r\nAnd I know\n\r\nOh I know\r\nAlthough times can be hard\r\nWe will see it through\r\nI'm forever in love with you\n", 'London, United KingdomOslo, Norway', [], '1998–2002, 2009–present']
['A1_(group)', 'album: "Here We Come" (1999)', 'Be The First To Believe', "\n\n[INTRO-HOOK]\n\n[ALL:] JUST ONE ON ONE!\r\nTHAT'S THE WAY WE DO IT (BABY) [x 2]\n\n[BEN:] BABY, I CAN'T ALWAYS SAY WHAT'S ON MY MIND, YEAH NEW SENSATIONS\n\n[ALL:] GOT ME\n\r\nMARK: BREAKING OUT THE LOVE I FEEL INSIDE\r\nYEAH, I'LL TAKE YOU TO A WONDERLAND\n\n[BRIDGE]\n\n[ALL:] YOU HIT ME RIGHT BETWEEN THE EYES\r\nI SHOULDA LISTEN TO MA MAMMA DONE TOLD ME\r\nYOU SENT ME SOARING TO THE SKIES\r\nAIN'T GONNA LISTEN TO MA MAMMA DONE TOLD ME\n\n[CHORUS]\n\n[ALL:] GIRL, THIS PARADISE IS OURS\r\nTHE PLANET MOON AND STARS\r\nBELIEVE IN ME BABY\n\n[MARK:] (YOU'VE GOT TO BELIEVE)\n\n[ALL:] BELIEVE IN ME BABY\n\n[BEN:] BE THE FIRST TO BELIEVE\n\n[VERSE]\n\n[CHRISTIAN:] BABY, ELEVATE OUR LOVE INTO THE SKIES\r\nYEAH, COOL VIBRATIONS\n\n[ALL:] ROCK ME\n\n[PAUL:] FLY ME UP TO HEAVEN IN YOUR EYES\r\nYEAH, ITS MAGIC WHEN YOU HYPNOTISE\n\n[ALL:] REPEAT BRIDGE\r\nYOU HIT ME RIGHT BETWEEN THE EYES!!..\n\n[CHORUS]\n\n[ALL:] GIRL, THIS PARADISE IS OURS\r\nTHE PLANET MOON AND STARS\r\nBELIEVE IN ME BABY\n\n[PAUL:] (YEAH, YOU GOT TO BELIEVE)\n\n[ALL:] BELIEVE IN ME BABY\n\n[CHRISTIAN:] (BE THE FIRST TO BELIEVE)\n\n[ALL:] BELIEVE IN ME BABY\n\n[MARK:] (YOU'VE GOT TO BELIEVE)\n\n[ALL:] BELIEVE IN ME BABY\n\n[BEN:] (SAID, BE THE FIRST TO BELIEVE)\n\n[MUSICAL BREAK FOR FOUR BARS]\n\n[ALL:] JUST ONE ON ONE!\r\nTHAT'S THE WAY WE DO IT (BABY)\r\nJUST ONE ON ONE!\r\nTHAT'S THE WAY WE DO IT [x2]\n\n[CHORUS]\n\n[ALL:] GIRL, THIS PARADISE IS OURS\r\nTHE PLANET MOON AND STARS\r\nBELIEVE IN ME BABY\r\nJUST ONE ON ONE!.\n[CHORUS & HOOK SUNG TOGETHER OVERLAPPING]\n\n[ALL:] BELIEVE IN ME BABY\n\n[ALL:] JUST ONE ON ONE, OOOOH\r\nJUST ONE ON ONE, OOOOH [x2]\n\n[ALL]: BELIEVE IN ME BABY\n", 'London, United KingdomOslo, Norway', [], '1998–2002, 2009–present']
我想用 }
替换所有逗号,但 "lyrics" 列中的逗号太多,我尝试使用正则表达式。
for i in range(0,len(dat)):
dat[i] = re.sub('(['"])," , '}',dat[i])
dat[i] = re.sub('(\])," , ']}',dat[i])
编辑
我使用的是@Wiktor Stribiżew 的解决方案。但我发现了一个新问题。有时这种模式也匹配歌词文本中的逗号,分割得比需要的多。
这是造成问题的行之一:
['A1_(group)', 'album: "The A List" (2000)', "Livin' The Dream", "\n\r\nWhere have you been all my life? \r\nWhere have you come from? \r\nIs this your first time too? \r\nIt's like I've known you in some other lifetime. \r\nWe're part of the great plan. \r\nLike two stars the shine.\n\n[Pre-Chorus:]\r\nI stood here watchin', while it only ever happened to friends. \r\nNow I don't have to pretend.\n\n[Chorus:]\r\nI can't believe we're living the dream\r\nWe're diggin that scene. \r\nWe finally made it through the fire. \r\nSomething 'bout you blows me away, like night over day. \r\nKissing the loneliness goodbye yeah.\n\r\nTrue love, true love. \r\nBaby could this be. \r\nTrue love, true love happenin' to me? \n\r\nI've been waiting for you all my life\r\nAnticipating with ever dream ever night. \r\nDestiny's moment we all share in time love is the message. \r\nAnd I know I've got mine¡-\n\n[Pre-Chorus]\n\n[Chorus]\n\r\nTrue love, true love. \r\nBaby could this be. \r\nTrue love, true love happenin' to me? \r\nTrue love, true love. \r\nBaby could this be. \r\nTrue love, true love happenin' to me?\n\n[Chorus]\n\n[Outro:]\r\nTrue love, true love. Baby could this be. \r\nTrue love, true love happenin' to me? \r\nTrue love, true love. Baby could this be. \r\nTrue love, true love happenin' to me?\n", 'London, United KingdomOslo, Norway', [], '1998–2002, 2009–present']
在这种情况下,有一些词,例如 watchin', 带有撇号和逗号。
我该如何解决?我很确定这需要不止一步,也许删除逗号?但是如果我这样做,我不知道要遵循什么模式来划分字符串..
可以使用str.replace(),这将用想要的字符替换所有字符。
你的情况
dat.replace(',', '}')
或者如果它是一个列表
[x.replace(',', '}') for x in dat]
您没有形成有效的字符串文字和无效的反向引用语法。而不是 </code>,你需要在 Python 中写 <code>
。如果不想使用双反斜杠,请不要忘记使用原始字符串文字 (r''
= '\1'
).
一定要这样写
dat[i] = re.sub(r'''(['"]),''' , r'}',dat[i])
dat[i] = re.sub(r'''(\]),''' , r']}',dat[i])
下面是我的文字:
['A1_(group)', 'album: "Here We Come" (1999)', 'Forever In Love', "\n\r\nLove leads to laughter\r\nLove leads to pain\r\nWith you by my side\r\nI feel good times again\n\r\nNever have I felt these feelings before\r\nYou showed me the world\r\nHow could I ask for more\n\r\nAnd although there's confusion\r\nWe'll find a solution to keep my heart close to you\n\r\nAnd I know, yes I know\r\nIf you hold me, believe me\r\nI'll never, never ever leave\n\r\nAnd I know\r\nThere is nothing that I would not do for you\r\nForever be true\r\nAnd I know\r\nAlthough times can be hard\r\nWe will see it through\r\nI'm forever in love with you\n\r\nShow me affection\r\nIn all different ways\r\nGive you my heart\r\nFor the rest of my days\n\r\nWith you all my troubles are left far behind \r\nLike heaven on earth\r\nWhen I look in your eyes\n\r\nAnd although there's confusion\r\nWe'll find a solution\r\nTo keep my heart close to you\n\r\nAnd I know, yes I know\r\nIf you hold me, believe me\r\nI'll never, never ever leave\n\r\nAnd I know\r\nThere is nothing that I would not do for you\r\nForever be true\r\nAnd I know\r\nAlthough times can be hard\r\nWe will see it through\r\nI'm forever in love with you\n\r\nNo need to cry\r\nI'll be right by your side\r\n(Right by your side)\n\r\nLet's take our time\r\nLove won't run dry\r\nIf you hold me, believe me\r\nI'll never, never ever leave\n\r\nAnd I know\r\nThere is nothing that I would not do for you\r\nForever be true\r\nAnd I know\r\nAlthough times can be hard\r\nWe will see it through\r\nI'm forever in love\r\nAnd I know\r\nThere is nothing that I would not do for you\n\r\nForever be true\r\nAnd I know\n\r\nOh I know\r\nAlthough times can be hard\r\nWe will see it through\r\nI'm forever in love with you\n", 'London, United KingdomOslo, Norway', [], '1998–2002, 2009–present']
['A1_(group)', 'album: "Here We Come" (1999)', 'Be The First To Believe', "\n\n[INTRO-HOOK]\n\n[ALL:] JUST ONE ON ONE!\r\nTHAT'S THE WAY WE DO IT (BABY) [x 2]\n\n[BEN:] BABY, I CAN'T ALWAYS SAY WHAT'S ON MY MIND, YEAH NEW SENSATIONS\n\n[ALL:] GOT ME\n\r\nMARK: BREAKING OUT THE LOVE I FEEL INSIDE\r\nYEAH, I'LL TAKE YOU TO A WONDERLAND\n\n[BRIDGE]\n\n[ALL:] YOU HIT ME RIGHT BETWEEN THE EYES\r\nI SHOULDA LISTEN TO MA MAMMA DONE TOLD ME\r\nYOU SENT ME SOARING TO THE SKIES\r\nAIN'T GONNA LISTEN TO MA MAMMA DONE TOLD ME\n\n[CHORUS]\n\n[ALL:] GIRL, THIS PARADISE IS OURS\r\nTHE PLANET MOON AND STARS\r\nBELIEVE IN ME BABY\n\n[MARK:] (YOU'VE GOT TO BELIEVE)\n\n[ALL:] BELIEVE IN ME BABY\n\n[BEN:] BE THE FIRST TO BELIEVE\n\n[VERSE]\n\n[CHRISTIAN:] BABY, ELEVATE OUR LOVE INTO THE SKIES\r\nYEAH, COOL VIBRATIONS\n\n[ALL:] ROCK ME\n\n[PAUL:] FLY ME UP TO HEAVEN IN YOUR EYES\r\nYEAH, ITS MAGIC WHEN YOU HYPNOTISE\n\n[ALL:] REPEAT BRIDGE\r\nYOU HIT ME RIGHT BETWEEN THE EYES!!..\n\n[CHORUS]\n\n[ALL:] GIRL, THIS PARADISE IS OURS\r\nTHE PLANET MOON AND STARS\r\nBELIEVE IN ME BABY\n\n[PAUL:] (YEAH, YOU GOT TO BELIEVE)\n\n[ALL:] BELIEVE IN ME BABY\n\n[CHRISTIAN:] (BE THE FIRST TO BELIEVE)\n\n[ALL:] BELIEVE IN ME BABY\n\n[MARK:] (YOU'VE GOT TO BELIEVE)\n\n[ALL:] BELIEVE IN ME BABY\n\n[BEN:] (SAID, BE THE FIRST TO BELIEVE)\n\n[MUSICAL BREAK FOR FOUR BARS]\n\n[ALL:] JUST ONE ON ONE!\r\nTHAT'S THE WAY WE DO IT (BABY)\r\nJUST ONE ON ONE!\r\nTHAT'S THE WAY WE DO IT [x2]\n\n[CHORUS]\n\n[ALL:] GIRL, THIS PARADISE IS OURS\r\nTHE PLANET MOON AND STARS\r\nBELIEVE IN ME BABY\r\nJUST ONE ON ONE!.\n[CHORUS & HOOK SUNG TOGETHER OVERLAPPING]\n\n[ALL:] BELIEVE IN ME BABY\n\n[ALL:] JUST ONE ON ONE, OOOOH\r\nJUST ONE ON ONE, OOOOH [x2]\n\n[ALL]: BELIEVE IN ME BABY\n", 'London, United KingdomOslo, Norway', [], '1998–2002, 2009–present']
我想用 }
替换所有逗号,但 "lyrics" 列中的逗号太多,我尝试使用正则表达式。
for i in range(0,len(dat)):
dat[i] = re.sub('(['"])," , '}',dat[i])
dat[i] = re.sub('(\])," , ']}',dat[i])
编辑
我使用的是@Wiktor Stribiżew 的解决方案。但我发现了一个新问题。有时这种模式也匹配歌词文本中的逗号,分割得比需要的多。 这是造成问题的行之一:
['A1_(group)', 'album: "The A List" (2000)', "Livin' The Dream", "\n\r\nWhere have you been all my life? \r\nWhere have you come from? \r\nIs this your first time too? \r\nIt's like I've known you in some other lifetime. \r\nWe're part of the great plan. \r\nLike two stars the shine.\n\n[Pre-Chorus:]\r\nI stood here watchin', while it only ever happened to friends. \r\nNow I don't have to pretend.\n\n[Chorus:]\r\nI can't believe we're living the dream\r\nWe're diggin that scene. \r\nWe finally made it through the fire. \r\nSomething 'bout you blows me away, like night over day. \r\nKissing the loneliness goodbye yeah.\n\r\nTrue love, true love. \r\nBaby could this be. \r\nTrue love, true love happenin' to me? \n\r\nI've been waiting for you all my life\r\nAnticipating with ever dream ever night. \r\nDestiny's moment we all share in time love is the message. \r\nAnd I know I've got mine¡-\n\n[Pre-Chorus]\n\n[Chorus]\n\r\nTrue love, true love. \r\nBaby could this be. \r\nTrue love, true love happenin' to me? \r\nTrue love, true love. \r\nBaby could this be. \r\nTrue love, true love happenin' to me?\n\n[Chorus]\n\n[Outro:]\r\nTrue love, true love. Baby could this be. \r\nTrue love, true love happenin' to me? \r\nTrue love, true love. Baby could this be. \r\nTrue love, true love happenin' to me?\n", 'London, United KingdomOslo, Norway', [], '1998–2002, 2009–present']
在这种情况下,有一些词,例如 watchin', 带有撇号和逗号。
我该如何解决?我很确定这需要不止一步,也许删除逗号?但是如果我这样做,我不知道要遵循什么模式来划分字符串..
可以使用str.replace(),这将用想要的字符替换所有字符。 你的情况
dat.replace(',', '}')
或者如果它是一个列表
[x.replace(',', '}') for x in dat]
您没有形成有效的字符串文字和无效的反向引用语法。而不是 </code>,你需要在 Python 中写 <code>
。如果不想使用双反斜杠,请不要忘记使用原始字符串文字 (r''
= '\1'
).
一定要这样写
dat[i] = re.sub(r'''(['"]),''' , r'}',dat[i])
dat[i] = re.sub(r'''(\]),''' , r']}',dat[i])