如何解决pandas中节点格式错误或字符串错误的问题?
How to solve the issue of malformed node or string error in pandas?
这里我有这个数据框,我正尝试从第 2 列中的每个数组中删除重复元素,并在第 3 列中删除结果数组。
Column1 Column 2 Column3
0 [ABC|QWER|12345, ABC|QWER|12345] [ABC|QWER|12345]
1 [TBC|WERT|567890,TBC|WERT|567890] [TBC|WERT|567890]
2 [ERT|TYIO|9845366, ERT|TYIO|9845366,ERT|TYIO|5] [ERT|TYIO|9845366, ERT|TYIO|5]
3 NaN NaN
4 [SAR|QWPO|34564557,SAR|QWPO|3456455] [SAR|QWPO|34564557,SAR|QWPO|3456455]
5 NaN NaN
6 [SE|WERT|12233412] [SE|WERT|12233412]
7 NaN NaN
我正在使用以下代码,但它显示格式错误的节点错误或 string.Please 有助于解决此问题。
import ast
def ddpe(a):
return list(dict.fromkeys(ast.literal_eval(a)))
df['column3'] = df['column2'].apply(ddpe)
我假设 'column2' 的值是字符串,因为您正在尝试使用 ast.literal_eval
。在那种情况下,试试这个
import pandas as pd
import numpy as np
def ddpe(str_val):
if pd.isna(str_val): # return NaN if value is NaN
return np.nan
# Remove the square brackets, split on ',' and strip possible
# whitespaces between elements
vals = [v.strip() for v in str_val.strip('[]').split(',')]
# remove duplicates keeping the original order
return list(dict.fromkeys(vals))
df['column3'] = df['column2'].apply(ddpe)
如果列值已经是列表,您只需要
def ddpe(lst_val):
# return NaN is value is not a list.
# Assuming those are only the two options.
if not isinstance(lst_val, list):
return np.nan
return list(dict.fromkeys(lst_val))
df['column3'] = df['column2'].apply(ddpe)
这里我有这个数据框,我正尝试从第 2 列中的每个数组中删除重复元素,并在第 3 列中删除结果数组。
Column1 Column 2 Column3
0 [ABC|QWER|12345, ABC|QWER|12345] [ABC|QWER|12345]
1 [TBC|WERT|567890,TBC|WERT|567890] [TBC|WERT|567890]
2 [ERT|TYIO|9845366, ERT|TYIO|9845366,ERT|TYIO|5] [ERT|TYIO|9845366, ERT|TYIO|5]
3 NaN NaN
4 [SAR|QWPO|34564557,SAR|QWPO|3456455] [SAR|QWPO|34564557,SAR|QWPO|3456455]
5 NaN NaN
6 [SE|WERT|12233412] [SE|WERT|12233412]
7 NaN NaN
我正在使用以下代码,但它显示格式错误的节点错误或 string.Please 有助于解决此问题。
import ast
def ddpe(a):
return list(dict.fromkeys(ast.literal_eval(a)))
df['column3'] = df['column2'].apply(ddpe)
我假设 'column2' 的值是字符串,因为您正在尝试使用 ast.literal_eval
。在那种情况下,试试这个
import pandas as pd
import numpy as np
def ddpe(str_val):
if pd.isna(str_val): # return NaN if value is NaN
return np.nan
# Remove the square brackets, split on ',' and strip possible
# whitespaces between elements
vals = [v.strip() for v in str_val.strip('[]').split(',')]
# remove duplicates keeping the original order
return list(dict.fromkeys(vals))
df['column3'] = df['column2'].apply(ddpe)
如果列值已经是列表,您只需要
def ddpe(lst_val):
# return NaN is value is not a list.
# Assuming those are only the two options.
if not isinstance(lst_val, list):
return np.nan
return list(dict.fromkeys(lst_val))
df['column3'] = df['column2'].apply(ddpe)