在 python 中将字典转换为二进制

Question

我有一个字典，键是我的客户 ID，值是我的电影 ID。虽然客户已经多次观看同一部电影，但我希望将它作为一个电影来制作。在这里我需要将我的字典转换为二进制数据。在所有行中，我需要客户 ID 和列作为电影 ID，如果客户看过电影，它会给出 1，否则为 0。

d = {'121212121' : 111, 222, 333, 333,444, 444, '212121212' : 222, 555, 555, 666, '212123322' : 555, 666, 666, 666, 777}

期望的输出：

customer ID 111 222 333 444 555 666 777
121212121   1   1   1   1   0   0   0
212121212   0   1   0   0   1   1   0
121323231   0   0   0   0   1   1   1

我试过使用 count vectorizer()

代码：

cv = CountVectorizer()
movies = cv.fit_transform(cust['movies_list'])
cols = cv.vocabulary_
movies_ = pd.DataFrame(movies.toarray(), columns = cols, index = 
cust['customer_id'])
movies_

输出：

customer ID 111 222 333 444 555 666 777
212121212   1   1   2   2   0   0   0
121212121   0   1   0   0   2   1   0
121323231   0   0   0   0   1   3   1

客户ID匹配，我统计了他看电影的次数。

Answer 1

看来您可以只使用 clip_upper 将正值限制为 1。

movies_.clip_upper(1)

           111  222  333  444  555  666  777
121212121    1    1    1    1    0    0    0
212121212    0    1    0    0    1    1    0
212123322    0    0    0    0    1    1    1

这是一个以 d 开头的替代解决方案。您可以使用 pd.get_dummies，然后使用 clip_upper。

import pandas as pd
df = pd.concat([
          pd.Series(v, name=k).astype(str) for k, v in d.items()  # `d` is your dict
     ], 
     axis=1
)
pd.get_dummies(df.stack()).sum(level=1).clip_upper(1)

           111  222  333  444  555  666  777
121212121    1    1    1    1    0    0    0
212121212    0    1    0    0    1    1    0
212123322    0    0    0    0    1    1    1

在 python 中将字典转换为二进制

converting dictionary to binary in python

python

pandas

feature-engineering