如何将从现有列派生的句子嵌入添加到新列中?
How to add sentence embeddings derived from an existing column into a new column?
我有一个包含四个 nw_data=['Qn_id'、'Qn_context'、'Qns'、'Anwsers'] 的数据框。这是它的样子
Qn_id | Qn_context | Qns | Anwsers
01 | In 1962, Uk gave... | what year....| the year 1962 was.....
02 | Major kanuti raised..| Who raised...| Kanuti akorimo rasied.
我想向该数据集添加第五列,该数据集包含 ['Answers'].
列的 句子嵌入
我正在使用 sentence_transformers 生成句子嵌入。
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
我尝试使用一种方法,其中:
#Created a var for the column
sent = nw_data['Answers']
和
#Passed the variable sent into the model and created the embeddings
embeddings = model.encode(sent)
然后
#Tried passing the embeddings into a new column named Embeddings
nw_data['Embeddings'] = embeddings
我收到一个错误:
KeyError: 'Embeddings'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
KeyError: 'Embeddings'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/blocks.py in check_ndim(values, placement, ndim)
1978 if len(placement) != len(values):
1979 raise ValueError(
-> 1980 f"Wrong number of items passed {len(values)}, "
1981 f"placement implies {len(placement)}"
1982 )
ValueError: Wrong number of items passed 384, placement implies 1
我如何创建这些嵌入并将它们添加到同一数据框中的新列 nw_data!!
是否可行,建议尝试使用 .apply() 方法 或 lambda 函数 但问题不确定关于如何或何时使用它们。
如果我没理解错的话,您想将列表(嵌入)插入到单元格中。
尝试使用 at
:
>>> import pandas as pd
>>> from sentence_transformers import SentenceTransformer
>>> sentences = 'Absence of sanity'
>>> embedding = model.encode(sentences)
>>> df = pd.DataFrame({'foo': [1, 2], 'Embedding': None})
>>> df.at[0, 'Embedding'] = embedding.tolist()
>>> df.dtypes
foo int64
Embedding object
>>> df.head()
dtype: object
foo Embedding
0 1 [0.2954030930995941, 0.29181134700775146, 2.16...
1 2 None
如果有多个句子,直接传单:
>>> import pandas as pd
>>> sentences = ['Absence of sanity', 'its a new day', 'make the best of it']
>>> embeddings = model.encode(sentences)
>>> df = pd.DataFrame({'foo': [1, 2, 3], 'Embedding': None})
>>> df['Embedding'] = embeddings.tolist()
>>> print(df.head())
foo Embedding
0 1 [0.29540303349494934, 0.29181137681007385, 2.1...
1 2 [0.0362740121781826, -0.8035800457000732, 2.44...
2 3 [-0.4539063572883606, -0.4333038330078125, 2.2...
我有一个包含四个 nw_data=['Qn_id'、'Qn_context'、'Qns'、'Anwsers'] 的数据框。这是它的样子
Qn_id | Qn_context | Qns | Anwsers
01 | In 1962, Uk gave... | what year....| the year 1962 was.....
02 | Major kanuti raised..| Who raised...| Kanuti akorimo rasied.
我想向该数据集添加第五列,该数据集包含 ['Answers'].
列的 句子嵌入我正在使用 sentence_transformers 生成句子嵌入。
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
我尝试使用一种方法,其中:
#Created a var for the column
sent = nw_data['Answers']
和
#Passed the variable sent into the model and created the embeddings
embeddings = model.encode(sent)
然后
#Tried passing the embeddings into a new column named Embeddings
nw_data['Embeddings'] = embeddings
我收到一个错误:
KeyError: 'Embeddings'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
KeyError: 'Embeddings'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/blocks.py in check_ndim(values, placement, ndim)
1978 if len(placement) != len(values):
1979 raise ValueError(
-> 1980 f"Wrong number of items passed {len(values)}, "
1981 f"placement implies {len(placement)}"
1982 )
ValueError: Wrong number of items passed 384, placement implies 1
我如何创建这些嵌入并将它们添加到同一数据框中的新列 nw_data!!
是否可行,建议尝试使用 .apply() 方法 或 lambda 函数 但问题不确定关于如何或何时使用它们。
如果我没理解错的话,您想将列表(嵌入)插入到单元格中。
尝试使用 at
:
>>> import pandas as pd
>>> from sentence_transformers import SentenceTransformer
>>> sentences = 'Absence of sanity'
>>> embedding = model.encode(sentences)
>>> df = pd.DataFrame({'foo': [1, 2], 'Embedding': None})
>>> df.at[0, 'Embedding'] = embedding.tolist()
>>> df.dtypes
foo int64
Embedding object
>>> df.head()
dtype: object
foo Embedding
0 1 [0.2954030930995941, 0.29181134700775146, 2.16...
1 2 None
如果有多个句子,直接传单:
>>> import pandas as pd
>>> sentences = ['Absence of sanity', 'its a new day', 'make the best of it']
>>> embeddings = model.encode(sentences)
>>> df = pd.DataFrame({'foo': [1, 2, 3], 'Embedding': None})
>>> df['Embedding'] = embeddings.tolist()
>>> print(df.head())
foo Embedding
0 1 [0.29540303349494934, 0.29181137681007385, 2.1...
1 2 [0.0362740121781826, -0.8035800457000732, 2.44...
2 3 [-0.4539063572883606, -0.4333038330078125, 2.2...