将字典转换为 Pandas 时如何删除条目?

How to remove entries when converting dictionary to Pandas?

我有一个用户评分字典存储在 user_dict 字典中,如下所示:

{'U1': [3, 4, 2, 5, 0, 4, 1, 3, 0, 0, 4], 
'U2': [2, 3, 1, 0, 3, 0, 2, 0, 0, 3, 0], 
'U3': [0, 4, 0, 5, 0, 4, 0, 3, 0, 2, 4], 
'U4': [0, 0, 2, 1, 4, 3, 2, 0, 0, 2, 0], 
'U5': [0, 0, 0, 5, 0, 4, 0, 3, 0, 0, 4], 
'U6': [2, 3, 4, 0, 3, 0, 3, 0, 3, 4, 0], 
'U7': [0, 4, 3, 5, 0, 5, 0, 0, 0, 0, 4], 
'U8': [4, 3, 0, 3, 4, 2, 2, 0, 2, 3, 2], 
'U9': [0, 2, 0, 3, 1, 0, 1, 0, 0, 2, 0], 
'U10': [0, 3, 0, 4, 3, 3, 0, 3, 0, 4, 4],  
'U11': [2, 2, 1, 2, 1, 0, 2, 0, 1, 0, 2], 
'U12': [0, 4, 4, 5, 0, 0, 0, 3, 0, 4, 5], 
'U13': [3, 3, 0, 2, 2, 3, 2, 0, 2, 0, 3], 
'U14': [0, 3, 4, 5, 0, 5, 0, 0, 0, 4, 0], 
'U15': [2, 0, 0, 3, 0, 2, 2, 3, 0, 0, 3], 
'U16': [4, 4, 0, 4, 3, 4, 0, 3, 0, 3, 0], 
'U17': [0, 2, 0, 3, 1, 0, 2, 0, 1, 0, 3], 
'U18': [2, 3, 1, 0, 3, 2, 3, 2, 0, 2, 0], 
'U19': [0, 5, 0, 4, 0, 3, 0, 4, 0, 0, 5], 
'U20': [0, 0, 3, 0, 3, 0, 4, 0, 2, 0, 0], 
'U21': [3, 0, 2, 4, 2, 3, 0, 4, 2, 3, 3], 
'U22': [4, 4, 0, 5, 3, 5, 0, 4, 0, 3, 0], 
'U23': [3, 0, 0, 0, 3, 0, 2, 0, 0, 4, 0], 
'U24': [4, 0, 3, 0, 3, 0, 3, 0, 0, 2, 2], 
'U25': [0, 5, 0, 3, 3, 4, 0, 3, 3, 4, 4]

当我将这个字典加载到 Pandas 数据框时,我希望数据框有 3 列:“用户”、“代理”、“评级”,所以 运行 此代码:

DF = pd.DataFrame()
for key in user_dict.keys():
  df = pd.DataFrame(columns=['User', 'Agent', 'Rating'])
  df['Rating'] = pd.Series(user_dict[key])
  df['Agent'] = pd.DataFrame(df.index)
  df['User'] = key

  DF = pd.concat([DF, df], axis = 0)

DF = DF.reset_index(drop=True)

但是,我不想添加任何评分为 0 的条目,因为这表示用户尚未对该“代理”评分。我如何让程序不 add/or 删除评分为 0 的条目?

您可以通过 DataFrame.unstack with DataFrame construcot, then filtering out 0 by compare for not equal, set index names for new columns names and last use Series.reset_index 进行整形:

DF = (pd.DataFrame(user_dict)
        .unstack()
        .loc[lambda x: x!= 0]
        .rename_axis(('User','Agent'))
        .reset_index(name='Rating'))
print (DF)
    User  Agent  Rating
0     U1      0       3
1     U1      1       4
2     U1      2       2
3     U1      3       5
4     U1      5       4
..   ...    ...     ...
155  U25      5       4
156  U25      7       3
157  U25      8       3
158  U25      9       4
159  U25     10       4

[160 rows x 3 columns]

另一个想法是在最后一步过滤 DataFrame.query:

DF = (pd.DataFrame(user_dict)
        .unstack()
        .rename_axis(('User','Agent'))
        .reset_index(name='Rating')
        .query('Rating != 0'))
print (DF)
    User  Agent  Rating
0     U1      0       3
1     U1      1       4
2     U1      2       2
3     U1      3       5
5     U1      5       4
..   ...    ...     ...
269  U25      5       4
271  U25      7       3
272  U25      8       3
273  U25      9       4
274  U25     10       4

[160 rows x 3 columns]