如何在 pandas 中的列之间建立关系？

Question

我有一个来自 LASTFM 数据集的数据框，其中包含以下列： user_id | friend_id 像这样:

uid | fid
346 | 23
355 | 48

我想将用户之间的关系作为第三列（一种邻接向量），例如：

uid1 | uid2 | friends
23   | 48   | 0
23   | 56   | 0
23   | ..   | 0
23   | 346  | 1
48   | 23   | 0
48   | 56   | 0
48   | ..   | 0
48   | 346  | 0
48   | 355  | 1
23   | ..   | 0
23   | 346  | 1
346  | 23   | 1

我曾尝试使用 merge、join、lambda，但到目前为止都没有成功。任何帮助将不胜感激。

Answer 1

这里的策略是两步走。先创建UID叉积数据集，再附上好友指标：

通过首先对原始数据集中的对及其逆进行并集来创建 UID 叉积。我们将创建一个中间数据集，friends，我们将在稍后的过程中使用它来指示哪些 UID 是朋友：

pairs = df.rename(columns={'uid': 'uid1', 'fid': 'uid2'})
friends = pd.concat([pairs, pairs.rename(columns={'uid1': 'uid2', 'uid2':'uid1'})])
uids = friends.uid1.drop_duplicates().to_frame(name='uid')

   uid
0  346
1  355
0   23
1   48

然后，附加一个虚拟合并键以允许叉积合并：

uids['dummy_key'] = 1
uids = uids.merge(uids, on='dummy_key', suffixes=('1', '2'))[['uid1', 'uid2']]

    uid1  uid2
0    346   346
1    346   355
2    346    23
3    346    48
4    355   346
5    355   355
...

现在，我们合并朋友数据集，附加一个指标列，开始我们的邻接表：

adj = uids.merge(friends, on=['uid1', 'uid2'], how='left', indicator=True)

    uid1  uid2     _merge
0    346   346  left_only
1    346   355  left_only
2    346    23       both
3    346    48  left_only
4    355   346  left_only
5    355   355  left_only
...

最后，我们将 _merge 指标编码到 friend 列中：

adj['friends'] = adj.apply(lambda row: 1 if row['_merge'] == 'both' else 0, axis=1)
adj = adj[['uid1', 'uid2', 'friends']]

    uid1  uid2  friends
0    346   346        0
1    346   355        0
2    346    23        1
3    346    48        0
4    355   346        0
5    355   355        0

如何在 pandas 中的列之间建立关系？

How can I build a relation between columns in pandas?

python

last.fm

pandas

python-3.7

knowledge-graph