'set' 无法在 pandas 的列表列中获取唯一值

Question

我不确定为什么 'set' 在以下示例中没有获得唯一值：

df6 = pd.DataFrame({
                  'Name': ['Sara', 'John'],
                   'one': ['UK', 'UK'],
                   'two': ['IN', 'SA'],
                    'three': ['IN', 'IN'],
                     'four': ['IN', 'US']
                   })

df6

给出：

    Name    one     two    three    four
0   Sara    UK      IN     IN       IN
1   John    UK      SA     IN       US

我连接了列表中的（一到四）列：

df6['Concat'] = df6[['one','two','three','four']].apply(lambda x: [', '.join(x[x.notnull()])], axis = 1)

给出：

    Name    one two three   four    Concat
0   Sara    UK  IN  IN  IN  [UK, IN, IN, IN]
1   John    UK  SA  IN  US  [UK, SA, IN, US]

现在我只想在每个名称的 Concat 列中获取唯一值：

我尝试了以下方法：

df6.Concat.apply(set)

但是结果和原来的列表一样！

0    {UK, IN, IN, IN}
1    {UK, SA, IN, US}
Name: Concat, dtype: object

为什么 'set' 在这种情况下不起作用？

我不想让unique list排序，只是为了加强学习，请问如何获取排序的unique values？

Answer 1

您的 Concat 列由 字符串列表 组成。它不是一个列表。当您将 set() 应用于该列表时，您会得到一组一个字符串。您应该将 set() 应用于原始数据列：

df6[['one','two','three','four']].apply(set, axis=1)
#0            {IN, UK}
#1    {SA, IN, UK, US}

参数axis=1指示apply()应用set()row-wise。

'set' 无法在 pandas 的列表列中获取唯一值

'set' does not work to get unique values in a column of lists in pandas

python

numpy

python-3.x

pandas