如何只从 python 中的字符串中获取单词？

Question

我是 pandas 的新手，我遇到了字符串问题。所以我有一个字符串 s = "'hi'+'bikes'-'cars'>=20+'rangers'" 我只想要字符串中的单词，而不是符号或整数。我该怎么做？

我的输入：

s = "'hi'+'bikes'-'cars'>=20+'rangers'"

异常输出：

s = "'hi','bikes','cars','rangers'"

Answer 1

尝试使用正则表达式

s = "'hi'+'bikes'-'cars'>=20+'rangers'"
samp= re.compile('[a-zA-z]+')
word= samp.findall(s)

Answer 2

不确定pandas，但您也可以使用 Regex 来完成，这是解决方案

import re


s = "'hi'+'bikes'-'cars'>=20+'rangers'"
words = re.findall("(\'.+?\')", s)
output = ','.join(words)

print(output)

Answer 3

对于pandas，我首先将数据框中的列转换为字符串：

df
                                   a  b
0  'hi'+'bikes'-'cars'>=20+'rangers'  1
1      random_string 'with'+random,#  4
2             more,weird/stuff=wrong  6

df["a"] = df["a"].astype("string")

 df["a"]
0    'hi'+'bikes'-'cars'>=20+'rangers'
1        random_string 'with'+random,#
2               more,weird/stuff=wrong
Name: a, dtype: string

现在可以看到dtype是string，也就是说可以对它进行字符串操作，包括翻译和拆分 (pandas strings). But first you have to make a translate table with punctuation and digits imported from string module string docs

from string import digits, punctuation

然后制作一个字典，将每个数字和标点符号映射到空格

from itertools import chain
t = {k: " " for k in chain(punctuation, digits)}

使用 str.maketrans 创建翻译 table（python 3.8 不需要导入，但与其他版本可能有点不同）并应用翻译和拆分（使用 "str " 之间) 到列)

t = str.maketrans(t)

df["a"] = df["a"].str.translate(t).str.split()
df
                                a  b
0      [hi, bikes, cars, rangers]  1
1  [random, string, with, random]  4
2     [more, weird, stuff, wrong]  6

如您所见，您现在只有这些词。

如何只从 python 中的字符串中获取单词？

How to get only the word from a string in python?

python

string

integer

split

word