将字典转换为 Pandas 时如何删除条目?
How to remove entries when converting dictionary to Pandas?
我有一个用户评分字典存储在 user_dict 字典中,如下所示:
{'U1': [3, 4, 2, 5, 0, 4, 1, 3, 0, 0, 4],
'U2': [2, 3, 1, 0, 3, 0, 2, 0, 0, 3, 0],
'U3': [0, 4, 0, 5, 0, 4, 0, 3, 0, 2, 4],
'U4': [0, 0, 2, 1, 4, 3, 2, 0, 0, 2, 0],
'U5': [0, 0, 0, 5, 0, 4, 0, 3, 0, 0, 4],
'U6': [2, 3, 4, 0, 3, 0, 3, 0, 3, 4, 0],
'U7': [0, 4, 3, 5, 0, 5, 0, 0, 0, 0, 4],
'U8': [4, 3, 0, 3, 4, 2, 2, 0, 2, 3, 2],
'U9': [0, 2, 0, 3, 1, 0, 1, 0, 0, 2, 0],
'U10': [0, 3, 0, 4, 3, 3, 0, 3, 0, 4, 4],
'U11': [2, 2, 1, 2, 1, 0, 2, 0, 1, 0, 2],
'U12': [0, 4, 4, 5, 0, 0, 0, 3, 0, 4, 5],
'U13': [3, 3, 0, 2, 2, 3, 2, 0, 2, 0, 3],
'U14': [0, 3, 4, 5, 0, 5, 0, 0, 0, 4, 0],
'U15': [2, 0, 0, 3, 0, 2, 2, 3, 0, 0, 3],
'U16': [4, 4, 0, 4, 3, 4, 0, 3, 0, 3, 0],
'U17': [0, 2, 0, 3, 1, 0, 2, 0, 1, 0, 3],
'U18': [2, 3, 1, 0, 3, 2, 3, 2, 0, 2, 0],
'U19': [0, 5, 0, 4, 0, 3, 0, 4, 0, 0, 5],
'U20': [0, 0, 3, 0, 3, 0, 4, 0, 2, 0, 0],
'U21': [3, 0, 2, 4, 2, 3, 0, 4, 2, 3, 3],
'U22': [4, 4, 0, 5, 3, 5, 0, 4, 0, 3, 0],
'U23': [3, 0, 0, 0, 3, 0, 2, 0, 0, 4, 0],
'U24': [4, 0, 3, 0, 3, 0, 3, 0, 0, 2, 2],
'U25': [0, 5, 0, 3, 3, 4, 0, 3, 3, 4, 4]
当我将这个字典加载到 Pandas 数据框时,我希望数据框有 3 列:“用户”、“代理”、“评级”,所以 运行 此代码:
DF = pd.DataFrame()
for key in user_dict.keys():
df = pd.DataFrame(columns=['User', 'Agent', 'Rating'])
df['Rating'] = pd.Series(user_dict[key])
df['Agent'] = pd.DataFrame(df.index)
df['User'] = key
DF = pd.concat([DF, df], axis = 0)
DF = DF.reset_index(drop=True)
但是,我不想添加任何评分为 0 的条目,因为这表示用户尚未对该“代理”评分。我如何让程序不 add/or 删除评分为 0 的条目?
您可以通过 DataFrame.unstack
with DataFrame
construcot, then filtering out 0
by compare for not equal, set index names for new columns names and last use Series.reset_index
进行整形:
DF = (pd.DataFrame(user_dict)
.unstack()
.loc[lambda x: x!= 0]
.rename_axis(('User','Agent'))
.reset_index(name='Rating'))
print (DF)
User Agent Rating
0 U1 0 3
1 U1 1 4
2 U1 2 2
3 U1 3 5
4 U1 5 4
.. ... ... ...
155 U25 5 4
156 U25 7 3
157 U25 8 3
158 U25 9 4
159 U25 10 4
[160 rows x 3 columns]
另一个想法是在最后一步过滤 DataFrame.query
:
DF = (pd.DataFrame(user_dict)
.unstack()
.rename_axis(('User','Agent'))
.reset_index(name='Rating')
.query('Rating != 0'))
print (DF)
User Agent Rating
0 U1 0 3
1 U1 1 4
2 U1 2 2
3 U1 3 5
5 U1 5 4
.. ... ... ...
269 U25 5 4
271 U25 7 3
272 U25 8 3
273 U25 9 4
274 U25 10 4
[160 rows x 3 columns]
我有一个用户评分字典存储在 user_dict 字典中,如下所示:
{'U1': [3, 4, 2, 5, 0, 4, 1, 3, 0, 0, 4],
'U2': [2, 3, 1, 0, 3, 0, 2, 0, 0, 3, 0],
'U3': [0, 4, 0, 5, 0, 4, 0, 3, 0, 2, 4],
'U4': [0, 0, 2, 1, 4, 3, 2, 0, 0, 2, 0],
'U5': [0, 0, 0, 5, 0, 4, 0, 3, 0, 0, 4],
'U6': [2, 3, 4, 0, 3, 0, 3, 0, 3, 4, 0],
'U7': [0, 4, 3, 5, 0, 5, 0, 0, 0, 0, 4],
'U8': [4, 3, 0, 3, 4, 2, 2, 0, 2, 3, 2],
'U9': [0, 2, 0, 3, 1, 0, 1, 0, 0, 2, 0],
'U10': [0, 3, 0, 4, 3, 3, 0, 3, 0, 4, 4],
'U11': [2, 2, 1, 2, 1, 0, 2, 0, 1, 0, 2],
'U12': [0, 4, 4, 5, 0, 0, 0, 3, 0, 4, 5],
'U13': [3, 3, 0, 2, 2, 3, 2, 0, 2, 0, 3],
'U14': [0, 3, 4, 5, 0, 5, 0, 0, 0, 4, 0],
'U15': [2, 0, 0, 3, 0, 2, 2, 3, 0, 0, 3],
'U16': [4, 4, 0, 4, 3, 4, 0, 3, 0, 3, 0],
'U17': [0, 2, 0, 3, 1, 0, 2, 0, 1, 0, 3],
'U18': [2, 3, 1, 0, 3, 2, 3, 2, 0, 2, 0],
'U19': [0, 5, 0, 4, 0, 3, 0, 4, 0, 0, 5],
'U20': [0, 0, 3, 0, 3, 0, 4, 0, 2, 0, 0],
'U21': [3, 0, 2, 4, 2, 3, 0, 4, 2, 3, 3],
'U22': [4, 4, 0, 5, 3, 5, 0, 4, 0, 3, 0],
'U23': [3, 0, 0, 0, 3, 0, 2, 0, 0, 4, 0],
'U24': [4, 0, 3, 0, 3, 0, 3, 0, 0, 2, 2],
'U25': [0, 5, 0, 3, 3, 4, 0, 3, 3, 4, 4]
当我将这个字典加载到 Pandas 数据框时,我希望数据框有 3 列:“用户”、“代理”、“评级”,所以 运行 此代码:
DF = pd.DataFrame()
for key in user_dict.keys():
df = pd.DataFrame(columns=['User', 'Agent', 'Rating'])
df['Rating'] = pd.Series(user_dict[key])
df['Agent'] = pd.DataFrame(df.index)
df['User'] = key
DF = pd.concat([DF, df], axis = 0)
DF = DF.reset_index(drop=True)
但是,我不想添加任何评分为 0 的条目,因为这表示用户尚未对该“代理”评分。我如何让程序不 add/or 删除评分为 0 的条目?
您可以通过 DataFrame.unstack
with DataFrame
construcot, then filtering out 0
by compare for not equal, set index names for new columns names and last use Series.reset_index
进行整形:
DF = (pd.DataFrame(user_dict)
.unstack()
.loc[lambda x: x!= 0]
.rename_axis(('User','Agent'))
.reset_index(name='Rating'))
print (DF)
User Agent Rating
0 U1 0 3
1 U1 1 4
2 U1 2 2
3 U1 3 5
4 U1 5 4
.. ... ... ...
155 U25 5 4
156 U25 7 3
157 U25 8 3
158 U25 9 4
159 U25 10 4
[160 rows x 3 columns]
另一个想法是在最后一步过滤 DataFrame.query
:
DF = (pd.DataFrame(user_dict)
.unstack()
.rename_axis(('User','Agent'))
.reset_index(name='Rating')
.query('Rating != 0'))
print (DF)
User Agent Rating
0 U1 0 3
1 U1 1 4
2 U1 2 2
3 U1 3 5
5 U1 5 4
.. ... ... ...
269 U25 5 4
271 U25 7 3
272 U25 8 3
273 U25 9 4
274 U25 10 4
[160 rows x 3 columns]