sys.intern() 是用于每次查找,还是仅在第一次创建字符串时使用? (Python跟进)
Is sys.intern() used for every look-up, or only when a string is created the first time? (Python Follow-up)
这是我之前关于 Python 中的字符串实习问题的后续问题,但我认为它无关紧要,可以作为一个单独的问题。
简而言之,当使用 sys.intern 时,我是否需要在使用 most/every 时将有问题的字符串传递给函数,或者我是否只需要实习一次字符串并跟踪其引用?
用一个伪代码用例来澄清我所做的 认为 是正确的:
(见评论)
# stores all words in sequence,
# we want duplicate words too,
# but those should refer to the same string
# (the reason we want interning)
word_sequence = []
# simple word count dictionary
word_dictionary = {}
for line in text:
for word in line: # using magic unspecified parsing/tokenizing logic
# returns a canonical "reference"
word_i = sys.intern(word)
word_sequence.append(word_i)
try:
# do not need to intern again for
# specific use as dictonary key,
# or is something undesirable done
# by the dictionary that would require
# another call here?
word_dictionary[word_i] += 1
except KeyError:
word_dictionary[word_i] = 1
# ...somewhere else in a function far away...
# Let's say that we want to use the word sequence list to
# access the dictionary (even the duplicates):
for word in word_sequence:
# Do NOT need to re-sys.intern() word
# because it is the same string object
# interned previously?
count = word_dictionary[word]
print(count)
如果我想访问不同词典中的单词怎么办?插入 key:value 时是否需要再次使用 sys.intern(),即使密钥已经被保留?
我可以澄清一下吗?先感谢您。
每次你有一个新的字符串对象时你必须使用sys.intern()
,否则你不能保证你对所表示的值有相同的对象。
但是,您的 word_seq
列表包含对驻留字符串对象的引用。您不必在这些上再次使用 sys.intern()
。任何时候都不会在这里创建字符串的副本(这将是 unnecessary and wasteful)。
sys.intern()
所做的只是将字符串 value 映射到具有该值的特定 object。只要您保留对 return 值的引用,就可以保证您仍然可以访问该特定对象。
这是我之前关于 Python 中的字符串实习问题的后续问题,但我认为它无关紧要,可以作为一个单独的问题。 简而言之,当使用 sys.intern 时,我是否需要在使用 most/every 时将有问题的字符串传递给函数,或者我是否只需要实习一次字符串并跟踪其引用? 用一个伪代码用例来澄清我所做的 认为 是正确的: (见评论)
# stores all words in sequence,
# we want duplicate words too,
# but those should refer to the same string
# (the reason we want interning)
word_sequence = []
# simple word count dictionary
word_dictionary = {}
for line in text:
for word in line: # using magic unspecified parsing/tokenizing logic
# returns a canonical "reference"
word_i = sys.intern(word)
word_sequence.append(word_i)
try:
# do not need to intern again for
# specific use as dictonary key,
# or is something undesirable done
# by the dictionary that would require
# another call here?
word_dictionary[word_i] += 1
except KeyError:
word_dictionary[word_i] = 1
# ...somewhere else in a function far away...
# Let's say that we want to use the word sequence list to
# access the dictionary (even the duplicates):
for word in word_sequence:
# Do NOT need to re-sys.intern() word
# because it is the same string object
# interned previously?
count = word_dictionary[word]
print(count)
如果我想访问不同词典中的单词怎么办?插入 key:value 时是否需要再次使用 sys.intern(),即使密钥已经被保留? 我可以澄清一下吗?先感谢您。
每次你有一个新的字符串对象时你必须使用sys.intern()
,否则你不能保证你对所表示的值有相同的对象。
但是,您的 word_seq
列表包含对驻留字符串对象的引用。您不必在这些上再次使用 sys.intern()
。任何时候都不会在这里创建字符串的副本(这将是 unnecessary and wasteful)。
sys.intern()
所做的只是将字符串 value 映射到具有该值的特定 object。只要您保留对 return 值的引用,就可以保证您仍然可以访问该特定对象。