将列表列转换为整数

Question

尝试在编码后转换为整数，但它们是对象，所以我首先将它们转换为字符串

train_df["标签"] = train_df["标签"].astype(str).astype(int)

我遇到了这个错误

以 10 为底的 int() 的无效文字：'[0, 1, 0, 0]

数据集中一行的示例是

text                        labels
[word1,word2,word3,word4]    [1,0,1,0]

Answer 1

因为在train_df["labels"].astype(str)之后，这个Series变成了一个列表Series，所以不能把一个列表转成type int。

如果train_df["labels"]中的每个元素都是list类型，你可以这样做：

train_df["labels"].apply(lambda x: [int(el) for el in x])

如果是 str 类型，你可以这样做：

train_df["labels"].apply(lambda x: [int(el) for el in x.strip("[]").split(",")])

您可能想要训练某个模型，但您不能使用 pd.Series 个列表来完成它。您需要将其转换为 DataFrame。如果不查看超过 1 行数据，我不能说如何做到这一点。

Answer 2

从表面上看，您的问题是因为以字符串表示的数字可能是浮点数。如果那是问题，那么下面应该可以解决它：

train_df["labels"] = train_df["labels"].astype(str).astype(float).astype(int)

（在 Python 中，您无法将浮点数的字符串表示形式转换为 int 类型。）

根据错误，我怀疑您的字符串实际上包含方括号和逗号（从问题中 crystal 看不出来）。如果是这种情况，您需要告诉 Python 如何处理它们。例如，如果 train_df["labels"] 等于 "[1,0,1,0]" 那么你可以使用下面的：

train_df_labels = [int(label) for label in train_df["labels"][1:-1].split(',').strip()]

#first getting rid of the brackets in the string, 
#then splitting the string at commas and getting rid of the spaces,
#finally, converting values to int type one by one and making a list out of them

将列表列转换为整数

Convert column of lists to integer

python

dataframe

pandas

bert-language-model