Python: 如何将 Dataframe 列作为参数传递给函数?
Python: How to pass Dataframe Columns as parameters to a function?
我有一个数据框 df
,其中包含 2 列文本嵌入,即 embedding_1
和 embedding_2
。我想在 df
中创建名为 distances
的第三列,其中应包含 embedding_1
和 embedding_2
.
的每一行之间的 cosine_similarity
但是当我尝试使用以下代码实现它时,我得到了 ValueError
.
如何解决?
数据帧df
embedding_1 | embedding_2
[[-0.28876397, -0.6367827, ...]] | [[-0.49163356, -0.4877703,...]]
[[-0.28876397, -0.6367827, ...]] | [[-0.06686627, -0.75147504...]]
[[-0.28876397, -0.6367827, ...]] | [[-0.42776933, -0.88310856,...]]
[[-0.28876397, -0.6367827, ...]] | [[-0.6520882, -1.049325,...]]
[[-0.28876397, -0.6367827, ...]] | [[-1.4216679, -0.8930428,...]]
计算余弦相似度的代码
df['distances'] = cosine_similarity(df['embeddings_1'], df['embeddings_2'])
错误
ValueError: setting an array element with a sequence.
需要数据帧
embedding_1 | embedding_2 | distances
[[-0.28876397, -0.6367827, ...]] | [[-0.49163356, -0.4877703,...]] | 0.427
[[-0.28876397, -0.6367827, ...]] | [[-0.06686627, -0.75147504...]] | 0.673
[[-0.28876397, -0.6367827, ...]] | [[-0.42776933, -0.88310856,...]] | 0.882
[[-0.28876397, -0.6367827, ...]] | [[-0.6520882, -1.049325,...]] | 0.665
[[-0.28876397, -0.6367827, ...]] | [[-1.4216679, -0.8930428,...]] | 0.312
您可以使用 apply()
在每一行上使用 cosine_similarity()
:
def cal_cosine_similarity(row):
return cosine_similarity(row['embeddings_1'], row['embeddings_2'])
df['distances'] = df.apply(cal_cosine_similarity, axis=1)
或一班
df['distances'] = df.apply(lambda row: cosine_similarity(row['embeddings_1'], row['embeddings_2']), axis=1)
我有一个数据框 df
,其中包含 2 列文本嵌入,即 embedding_1
和 embedding_2
。我想在 df
中创建名为 distances
的第三列,其中应包含 embedding_1
和 embedding_2
.
但是当我尝试使用以下代码实现它时,我得到了 ValueError
.
如何解决?
数据帧df
embedding_1 | embedding_2
[[-0.28876397, -0.6367827, ...]] | [[-0.49163356, -0.4877703,...]]
[[-0.28876397, -0.6367827, ...]] | [[-0.06686627, -0.75147504...]]
[[-0.28876397, -0.6367827, ...]] | [[-0.42776933, -0.88310856,...]]
[[-0.28876397, -0.6367827, ...]] | [[-0.6520882, -1.049325,...]]
[[-0.28876397, -0.6367827, ...]] | [[-1.4216679, -0.8930428,...]]
计算余弦相似度的代码
df['distances'] = cosine_similarity(df['embeddings_1'], df['embeddings_2'])
错误
ValueError: setting an array element with a sequence.
需要数据帧
embedding_1 | embedding_2 | distances
[[-0.28876397, -0.6367827, ...]] | [[-0.49163356, -0.4877703,...]] | 0.427
[[-0.28876397, -0.6367827, ...]] | [[-0.06686627, -0.75147504...]] | 0.673
[[-0.28876397, -0.6367827, ...]] | [[-0.42776933, -0.88310856,...]] | 0.882
[[-0.28876397, -0.6367827, ...]] | [[-0.6520882, -1.049325,...]] | 0.665
[[-0.28876397, -0.6367827, ...]] | [[-1.4216679, -0.8930428,...]] | 0.312
您可以使用 apply()
在每一行上使用 cosine_similarity()
:
def cal_cosine_similarity(row):
return cosine_similarity(row['embeddings_1'], row['embeddings_2'])
df['distances'] = df.apply(cal_cosine_similarity, axis=1)
或一班
df['distances'] = df.apply(lambda row: cosine_similarity(row['embeddings_1'], row['embeddings_2']), axis=1)