如何将一列列表转换为字符串?
How do I turn a column of lists into strings?
Speaker ID Utterances
0 S1 [alright Sue now it's like uh i dropped like C...
1 S2 [this year? this term?, ri- oh but you dropped...
2 S3 [yeah. hi, hi, yeah i already signed [S2: okay...
3 S4 [back in i was like w- what is that?, yeah and...
4 S5 [okay well i'm not here for a drop-add class [...
5 S6 [me, yeah. that's right, i have a question lik...
6 S7 [hello, hi, what was your name?, i thought i o...
实际上,最终目标是创建一个新列,其中 'Utterances' 列下的所有内容都已删除标点符号并已标记化。我只需要先将字符串列表转换成字符串,对吗?
P.S。我知道格式很奇怪,但我不知道如何解决这个问题,而且我还没有在任何地方找到答案。如果有人能告诉我应该如何包含我正在使用的文本,这样它看起来就不会很奇怪,那就太好了。谢谢!
一个想法可以是:
import pandas as pd
from string import punctuation
import re
df = pd.DataFrame({'Utterances':[["me, yeah. that's right, i have a question lik"], ["hello, hi, what was your name?, i thought i o"]]})
df['Utterances'] = df['Utterances'].str.join(' ')
pattern = r'|'.join([re.escape(e) for e in punctuation])
df['Utterances'] = df['Utterances'].str.replace(pattern, '')
Speaker ID Utterances
0 S1 [alright Sue now it's like uh i dropped like C...
1 S2 [this year? this term?, ri- oh but you dropped...
2 S3 [yeah. hi, hi, yeah i already signed [S2: okay...
3 S4 [back in i was like w- what is that?, yeah and...
4 S5 [okay well i'm not here for a drop-add class [...
5 S6 [me, yeah. that's right, i have a question lik...
6 S7 [hello, hi, what was your name?, i thought i o...
实际上,最终目标是创建一个新列,其中 'Utterances' 列下的所有内容都已删除标点符号并已标记化。我只需要先将字符串列表转换成字符串,对吗?
P.S。我知道格式很奇怪,但我不知道如何解决这个问题,而且我还没有在任何地方找到答案。如果有人能告诉我应该如何包含我正在使用的文本,这样它看起来就不会很奇怪,那就太好了。谢谢!
一个想法可以是:
import pandas as pd
from string import punctuation
import re
df = pd.DataFrame({'Utterances':[["me, yeah. that's right, i have a question lik"], ["hello, hi, what was your name?, i thought i o"]]})
df['Utterances'] = df['Utterances'].str.join(' ')
pattern = r'|'.join([re.escape(e) for e in punctuation])
df['Utterances'] = df['Utterances'].str.replace(pattern, '')