如何将数据与 CountVectorizer 功能合并

How do I merge data with CountVectorizer features

这是我的数据集

        body                                            customer_id   name
14828   Thank you to apply to us.                       5458          Sender A
23117   Congratulation your application is accepted.    5136          Sender B
23125   Your OTP will expire in 10 minutes.             5136          Sender A

这是我的代码

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
b = a['body']
vect = CountVectorizer()
vect.fit(b)
X_vect=vect.transform(b)
pd.DataFrame(X_vect.toarray(), columns=vect.get_feature_names())

输出为

    10  application apply ... your  
0   0   0           1         0
1   0   1           0         1
2   1   0           0         1 

我需要的是

        body                                            customer_id   name        10  application apply ... your
14828   Thank you to apply to us.                       5458          Sender A    0   0           1         0
23117   Congratulation your application is accepted.    5136          Sender B    0   1           0         1
23125   Your OTP will expire in 10 minutes.             5136          Sender A    1   0           0         1

我该怎么做?我还是希望能用CountVectorizer 以后可以修改功能

您可以将 index 添加到 Dataframe 构造函数,然后将 join 添加到原始 df,默认为 left join:

b = pd.DataFrame(X_vect.toarray(), columns=vect.get_feature_names(), index= a.index)
a = a.join(b)

或者使用merge,但是需要更多的参数,因为默认是inner join:

a = a.merge(b, left_index=True, right_index=True, how='left')