测试Dataframe内容的真值时如何解决ValueError? Python
How to solve ValueError when testing truth value of Dataframe contents? Python
我有一个看起来像这样的数据框。
done sentence 3_tags
0 0 ['What', 'were', 'the', '...] ['WP', 'VBD', 'DT']
1 0 ['What', 'was', 'the', '...] ['WP', 'VBD', 'DT']
2 0 ['Why', 'did', 'John', '...] ['WP', 'VBD', 'NN']
...
对于每一行,我想检查“3_tags”列中的列表是否在列表 temp1 中,如下所示:
a = pd.read_csv('sentences.csv')
temp1 = [ ['WP', 'VBD', 'DT'], ['WRB', 'JJ', 'VBZ'], ['WP', 'VBD', 'DT'] ]
q = a['3_tags']
q in temp1
对于第0行的第一个句子,'3_tags'的值=['WP','VBD','DT'] 在 temp1 中,所以我希望上面的结果是:
True
但是,我得到这个错误:
ValueError: Arrays were different lengths: 1 vs 3
我怀疑q的数据类型有问题:
print(type(q))
<class 'pandas.core.series.Series'>
问题是q是一个系列,temp1包含列表吗?我应该怎么做才能得到合乎逻辑的结果 'True' ?
您希望这些列表改为元组。
然后使用 pd.Series.isin
*temp1, = map(tuple, temp1)
q = a['3_tags'].apply(tuple)
q.isin(temp1)
0 True
1 True
2 False
Name: 3_tags, dtype: bool
但是,'3_tags'
列似乎由看起来像列表的字符串组成。在这种情况下,我们想用 ast.literal_eval
来解析它们
from ast import literal_eval
*temp1, = map(tuple, temp1)
q = a['3_tags'].apply(lambda x: tuple(literal_eval(x)))
q.isin(temp1)
0 True
1 True
2 False
Name: 3_tags, dtype: bool
设置 1
a = pd.DataFrame({
'done': [0, 0, 0],
'sentence': list(map(str.split, ('What were the', 'What was the', 'Why did John'))),
'3_tags': list(map(str.split, ('WP VBD DT', 'WP VBD DT', 'WP VBD NN')))
}, columns='done sentence 3_tags'.split())
temp1 = [['WP', 'VBD', 'DT'], ['WRB', 'JJ', 'VBZ'], ['WP', 'VBD', 'DT']]
设置2
a = pd.DataFrame({
'done': [0, 0, 0],
'sentence': list(map(str.split, ('What were the', 'What was the', 'Why did John'))),
'3_tags': list(map(str, map(str.split, ('WP VBD DT', 'WP VBD DT', 'WP VBD NN'))))
}, columns='done sentence 3_tags'.split())
temp1 = [['WP', 'VBD', 'DT'], ['WRB', 'JJ', 'VBZ'], ['WP', 'VBD', 'DT']]
我有一个看起来像这样的数据框。
done sentence 3_tags
0 0 ['What', 'were', 'the', '...] ['WP', 'VBD', 'DT']
1 0 ['What', 'was', 'the', '...] ['WP', 'VBD', 'DT']
2 0 ['Why', 'did', 'John', '...] ['WP', 'VBD', 'NN']
...
对于每一行,我想检查“3_tags”列中的列表是否在列表 temp1 中,如下所示:
a = pd.read_csv('sentences.csv')
temp1 = [ ['WP', 'VBD', 'DT'], ['WRB', 'JJ', 'VBZ'], ['WP', 'VBD', 'DT'] ]
q = a['3_tags']
q in temp1
对于第0行的第一个句子,'3_tags'的值=['WP','VBD','DT'] 在 temp1 中,所以我希望上面的结果是:
True
但是,我得到这个错误:
ValueError: Arrays were different lengths: 1 vs 3
我怀疑q的数据类型有问题:
print(type(q))
<class 'pandas.core.series.Series'>
问题是q是一个系列,temp1包含列表吗?我应该怎么做才能得到合乎逻辑的结果 'True' ?
您希望这些列表改为元组。
然后使用 pd.Series.isin
*temp1, = map(tuple, temp1)
q = a['3_tags'].apply(tuple)
q.isin(temp1)
0 True
1 True
2 False
Name: 3_tags, dtype: bool
但是,'3_tags'
列似乎由看起来像列表的字符串组成。在这种情况下,我们想用 ast.literal_eval
from ast import literal_eval
*temp1, = map(tuple, temp1)
q = a['3_tags'].apply(lambda x: tuple(literal_eval(x)))
q.isin(temp1)
0 True
1 True
2 False
Name: 3_tags, dtype: bool
设置 1
a = pd.DataFrame({
'done': [0, 0, 0],
'sentence': list(map(str.split, ('What were the', 'What was the', 'Why did John'))),
'3_tags': list(map(str.split, ('WP VBD DT', 'WP VBD DT', 'WP VBD NN')))
}, columns='done sentence 3_tags'.split())
temp1 = [['WP', 'VBD', 'DT'], ['WRB', 'JJ', 'VBZ'], ['WP', 'VBD', 'DT']]
设置2
a = pd.DataFrame({
'done': [0, 0, 0],
'sentence': list(map(str.split, ('What were the', 'What was the', 'Why did John'))),
'3_tags': list(map(str, map(str.split, ('WP VBD DT', 'WP VBD DT', 'WP VBD NN'))))
}, columns='done sentence 3_tags'.split())
temp1 = [['WP', 'VBD', 'DT'], ['WRB', 'JJ', 'VBZ'], ['WP', 'VBD', 'DT']]