从数据框列中的字符串值中删除字符

Question

希望你能帮我解答这个问题。我有一列数值作为字符串。由于它们是来自不同国家的数据，因此其中一些具有不同的格式，例如“，”和“$”。我正在尝试将系列转换为数字，但我在使用“,”和“$”值时遇到问题。

data={"valores":[1,1,3,"4","5.00","1,000",",700"]}
df=pd.DataFrame(data)
df

    valores
0   1
1   1
2   3
3   4
4   5.00
5   1,000
6   ,700

我试过以下方法：

df["valores"].replace(",","")

但它不会改变任何东西，因为“,”值在字符串中，而不是字符串值本身

pd.to_numeric(df["valores"])

但我收到“ValueError：无法解析位置 5 处的字符串“1,000””错误。

valores=[i.replace(",","") for i in df["valores"].values]

但我收到“AttributeError: 'int' object has no attribute 'replace' 错误。

所以，最后，我尝试了这个：

valores=[i.replace(",","") for i in df["valores"].values if type(i)==str]
valores
['4', '5.00', '1000', '00']

但是它跳过了前三个值，因为它们不是字符串..

我认为使用 Regex 代码我可以管理它，但我只是不明白如何使用它。

希望你能帮助我，因为我已经为此苦苦挣扎了大约 7 个小时。

Answer 1

.replace 默认搜索 整个单元格值 。由于要替换字符串的一部分，所以需要 .str.replace 或 replace(...,regex=True):

df['valores'] = df["valores"].replace(",","", regex=True)

或者：

df['valore'] = df["valores"].str.replace(",","")

Answer 2

你可以试试这个：

df['valores'] = df['valores'].replace(to_replace='[\,$]',value='',regex=True).astype(float)

Answer 3

你应该首先从它创建一个字符串，所以像这样

valores=[str(i).replace(",","") for i in df["valores"].values]

Answer 4

您需要使用 .astype(str) 将 valores 列中的值转换为字符串，然后使用 .str.replace('[,$]', '') 删除所有 $ 和 ,，然后您可以使用 pd.to_numeric:

将所有数据转换为数字

>>> pd.to_numeric(df["valores"].astype(str).str.replace("[,$]",""))
0       1.0
1       1.0
2       3.0
3       4.0
4       5.0
5    1000.0
6    5700.0

从数据框列中的字符串值中删除字符

removing character from string value in dataframe column

python

regex

string

integer

pandas