`features['contains(%s)' % word.lower()] = True` 在 NLTK 中是什么意思?
What does `features['contains(%s)' % word.lower()] = True` mean in NLTK?
我一直在阅读nltk文档recently.And我不明白下面的代码。
def dialogue_act_features(post):
features = {}
for word in nltk.word_tokenize(post):
features['contains(%s)' % word.lower()] = True
return features
这是 NaiveBayesClassifier 的特征提取器,但是
features['contains(%s)' % word.lower()] = True
是什么意思?
我认为这行代码是一种生成字典的方法,但我不知道它是如何工作的。
谢谢
说 word='ABCxyz'
,
word.lower()
---> 将其转换为小写,因此 returns abcxyz'
'contains(%s)' % word.lower()
---> 将格式化字符串并将 %s
替换为 word.lower()
和 returns 'contains(abcxyz)'
的值
features['contains(%s)' % word.lower()] = True
---> 将在特征字典中创建键值对,键为 'contains(abcxyz)'
,值为 True
因此,
features = {}
features['contains(%s)' % word.lower()] = True
将创建
features = {'contains(abcxyz)':True}
在此代码中:
>>> import nltk
>>> def word_features(sentence):
... features = {}
... for word in nltk.word_tokenize(sentence):
... features['contains(%s)' % word.lower()] = True
... return features
...
...
...
>>> sent = 'This a foobar word extractor function'
>>> word_features(sent)
{'contains(a)': True, 'contains(word)': True, 'contains(this)': True, 'contains(function)': True, 'contains(extractor)': True, 'contains(foobar)': True}
>>>
此行试图populate/fill建立一个特征字典。:
features['contains(%s)' % word.lower()] = True
下面是python中字典的一个简单示例(详情见https://docs.python.org/2/tutorial/datastructures.html#dictionaries):
>>> adict = {}
>>> adict['key'] = 'value'
>>> adict['key']
'value'
>>> adict['apple'] = 'red'
>>> adict['apple']
'red'
>>> adict
{'apple': 'red', 'key': 'value'}
和word.lower()
将字符串小写,例如
>>> str = 'Apple'
>>> str.lower()
'apple'
>>> str = 'APPLE'
>>> str.lower()
'apple'
>>> str = 'AppLe'
>>> str.lower()
'apple'
当您执行 'contains(%s)' % word
时,它会尝试创建字符串 contain(
和符号运算符,然后是 )
。符号运算符将在字符串外部分配,例如
>>> a = 'apple'
>>> o = 'orange'
>>> '%s' % a
'apple'
>>> '%s and' % a
'apple and'
>>> '%s and %s' % (a,o)
'apple and orange'
符号运算符类似于 str.format()
函数,例如
>>> a = 'apple'
>>> o = 'orange'
>>> '%s and %s' % (a,o)
'apple and orange'
>>> '{} and {}'.format(a,o)
'apple and orange'
所以当代码执行 'contains(%s)' % word
时,它实际上是在尝试生成这样的字符串:
>>> 'contains(%s)' % a
'contains(apple)'
当您将该字符串作为您的密钥放入字典时,您的密钥将如下所示:
>>> adict = {}
>>> key1 = 'contains(%s)' % a
>>> value1 = True
>>> adict[key1] = value1
>>> adict
{'contains(apple)': True}
>>> key2 = 'contains(%s)' % o
>>> value = 'orange'
>>> value2 = False
>>> adict[key2] = value2
>>> adict
{'contains(orange)': False, 'contains(apple)': True}
有关详细信息,请参阅
- Python string formatting: % vs. .format
- http://www.tutorialspoint.com/python/python_strings.htm
- https://docs.python.org/2/library/string.html
我一直在阅读nltk文档recently.And我不明白下面的代码。
def dialogue_act_features(post):
features = {}
for word in nltk.word_tokenize(post):
features['contains(%s)' % word.lower()] = True
return features
这是 NaiveBayesClassifier 的特征提取器,但是
features['contains(%s)' % word.lower()] = True
是什么意思?
我认为这行代码是一种生成字典的方法,但我不知道它是如何工作的。
谢谢
说 word='ABCxyz'
,
word.lower()
---> 将其转换为小写,因此 returns abcxyz'
'contains(%s)' % word.lower()
---> 将格式化字符串并将 %s
替换为 word.lower()
和 returns 'contains(abcxyz)'
的值
features['contains(%s)' % word.lower()] = True
---> 将在特征字典中创建键值对,键为 'contains(abcxyz)'
,值为 True
因此,
features = {}
features['contains(%s)' % word.lower()] = True
将创建
features = {'contains(abcxyz)':True}
在此代码中:
>>> import nltk
>>> def word_features(sentence):
... features = {}
... for word in nltk.word_tokenize(sentence):
... features['contains(%s)' % word.lower()] = True
... return features
...
...
...
>>> sent = 'This a foobar word extractor function'
>>> word_features(sent)
{'contains(a)': True, 'contains(word)': True, 'contains(this)': True, 'contains(function)': True, 'contains(extractor)': True, 'contains(foobar)': True}
>>>
此行试图populate/fill建立一个特征字典。:
features['contains(%s)' % word.lower()] = True
下面是python中字典的一个简单示例(详情见https://docs.python.org/2/tutorial/datastructures.html#dictionaries):
>>> adict = {}
>>> adict['key'] = 'value'
>>> adict['key']
'value'
>>> adict['apple'] = 'red'
>>> adict['apple']
'red'
>>> adict
{'apple': 'red', 'key': 'value'}
和word.lower()
将字符串小写,例如
>>> str = 'Apple'
>>> str.lower()
'apple'
>>> str = 'APPLE'
>>> str.lower()
'apple'
>>> str = 'AppLe'
>>> str.lower()
'apple'
当您执行 'contains(%s)' % word
时,它会尝试创建字符串 contain(
和符号运算符,然后是 )
。符号运算符将在字符串外部分配,例如
>>> a = 'apple'
>>> o = 'orange'
>>> '%s' % a
'apple'
>>> '%s and' % a
'apple and'
>>> '%s and %s' % (a,o)
'apple and orange'
符号运算符类似于 str.format()
函数,例如
>>> a = 'apple'
>>> o = 'orange'
>>> '%s and %s' % (a,o)
'apple and orange'
>>> '{} and {}'.format(a,o)
'apple and orange'
所以当代码执行 'contains(%s)' % word
时,它实际上是在尝试生成这样的字符串:
>>> 'contains(%s)' % a
'contains(apple)'
当您将该字符串作为您的密钥放入字典时,您的密钥将如下所示:
>>> adict = {}
>>> key1 = 'contains(%s)' % a
>>> value1 = True
>>> adict[key1] = value1
>>> adict
{'contains(apple)': True}
>>> key2 = 'contains(%s)' % o
>>> value = 'orange'
>>> value2 = False
>>> adict[key2] = value2
>>> adict
{'contains(orange)': False, 'contains(apple)': True}
有关详细信息,请参阅
- Python string formatting: % vs. .format
- http://www.tutorialspoint.com/python/python_strings.htm
- https://docs.python.org/2/library/string.html