删除除逗号外的所有字符和数字
Remove all the characters and numbers except comma
我正在尝试从 DataFrame 列中的字符串中删除所有字符,但保留逗号,但它仍然会删除所有内容,包括逗号。
我知道之前有人问过这个问题,但我尝试了很多答案,并且都删除了逗号。
df[new_text_field_name] = df[new_text_field_name].apply(lambda elem: re.sub(r"(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)|^rt|http.+?", "", str(elem)))
示例文本:
'100% 聚酯纤维,纸板(至少 30% 回收),100% 聚丙烯',
要求输出:
'聚酯、纸板、聚丙烯',
可能的解决方案如下:
# pip install pandas
import pandas as pd
pd.set_option('display.max_colwidth', 200)
# set test data and create dataframe
data = {"text": ['100 % polyester, Paperboard (min. 30% recycled), 100% polypropylene','Polypropylene plastic', '100 % polyester, Paperboard (min. 30% recycled), 100% polypropylene', 'Bamboo, Clear nitrocellulose lacquer', 'Willow, Stain, Solid wood, Polypropylene plastic, Stainless steel, Steel, Galvanized, Steel, 100% polypropylene', 'Banana fibres, Clear lacquer', 'Polypropylene plastic (min. 20% recycled)']}
df = pd.DataFrame(data)
def cleanup(txt):
re_pattern = re.compile(r"[^a-z, ()]", re.I)
return re.sub(re_pattern, "", txt).replace(" ", " ").strip()
df['text_cleaned'] = df['text'].apply(cleanup)
df
Returns
Character.isDigit()和Character.isLetter()函数可以用来判断是数字还是字符
我正在尝试从 DataFrame 列中的字符串中删除所有字符,但保留逗号,但它仍然会删除所有内容,包括逗号。
我知道之前有人问过这个问题,但我尝试了很多答案,并且都删除了逗号。
df[new_text_field_name] = df[new_text_field_name].apply(lambda elem: re.sub(r"(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)|^rt|http.+?", "", str(elem)))
示例文本:
'100% 聚酯纤维,纸板(至少 30% 回收),100% 聚丙烯',
要求输出:
'聚酯、纸板、聚丙烯',
可能的解决方案如下:
# pip install pandas
import pandas as pd
pd.set_option('display.max_colwidth', 200)
# set test data and create dataframe
data = {"text": ['100 % polyester, Paperboard (min. 30% recycled), 100% polypropylene','Polypropylene plastic', '100 % polyester, Paperboard (min. 30% recycled), 100% polypropylene', 'Bamboo, Clear nitrocellulose lacquer', 'Willow, Stain, Solid wood, Polypropylene plastic, Stainless steel, Steel, Galvanized, Steel, 100% polypropylene', 'Banana fibres, Clear lacquer', 'Polypropylene plastic (min. 20% recycled)']}
df = pd.DataFrame(data)
def cleanup(txt):
re_pattern = re.compile(r"[^a-z, ()]", re.I)
return re.sub(re_pattern, "", txt).replace(" ", " ").strip()
df['text_cleaned'] = df['text'].apply(cleanup)
df
Returns
Character.isDigit()和Character.isLetter()函数可以用来判断是数字还是字符