Python: 自动在短语中引入轻微的单词拼写错误?
Python: Automatically introduce slight word-typos into phrases?
有没有人知道如何自动将 common 拼写错误引入短语的单词中?
我找到了这个 但我觉得它有点太笼统了,因为它只是用一个随机字符替换每个第 n 个字母。
我想介绍一下 "common" 错别字。
知道怎么做吗?
为了便于解释,我们假设您有一个字符串变量 messages
,您希望将错别字引入其中。我在 messages
中引入拼写错误的策略是 和 常见的拼写错误,将替换 messages
中的 随机 字母与键盘附近的其他字母(即替换 a with s
或 d with f
)。方法如下:
import random # random typos
message = "The quick brown fox jumped over the big red dog."
# convert the message to a list of characters
message = list(message)
typo_prob = 0.1 # percent (out of 1.0) of characters to become typos
# the number of characters that will be typos
n_chars_to_flip = round(len(message) * typo_prob)
# is a letter capitalized?
capitalization = [False] * len(message)
# make all characters lowercase & record uppercase
for i in range(len(message)):
capitalization[i] = message[i].isupper()
message[i] = message[i].lower()
# list of characters that will be flipped
pos_to_flip = []
for i in range(n_chars_to_flip):
pos_to_flip.append(random.randint(0, len(message) - 1))
# dictionary... for each letter list of letters
# nearby on the keyboard
nearbykeys = {
'a': ['q','w','s','x','z'],
'b': ['v','g','h','n'],
'c': ['x','d','f','v'],
'd': ['s','e','r','f','c','x'],
'e': ['w','s','d','r'],
'f': ['d','r','t','g','v','c'],
'g': ['f','t','y','h','b','v'],
'h': ['g','y','u','j','n','b'],
'i': ['u','j','k','o'],
'j': ['h','u','i','k','n','m'],
'k': ['j','i','o','l','m'],
'l': ['k','o','p'],
'm': ['n','j','k','l'],
'n': ['b','h','j','m'],
'o': ['i','k','l','p'],
'p': ['o','l'],
'q': ['w','a','s'],
'r': ['e','d','f','t'],
's': ['w','e','d','x','z','a'],
't': ['r','f','g','y'],
'u': ['y','h','j','i'],
'v': ['c','f','g','v','b'],
'w': ['q','a','s','e'],
'x': ['z','s','d','c'],
'y': ['t','g','h','u'],
'z': ['a','s','x'],
' ': ['c','v','b','n','m']
}
# insert typos
for pos in pos_to_flip:
# try-except in case of special characters
try:
typo_arrays = nearbykeys[message[pos]]
message[pos] = random.choice(typo_arrays)
except:
break
# reinsert capitalization
for i in range(len(message)):
if (capitalization[i]):
message[i] = message[i].upper()
# recombine the message into a string
message = ''.join(message)
# show the message in the console
print(message)
有没有人知道如何自动将 common 拼写错误引入短语的单词中?
我找到了这个
我想介绍一下 "common" 错别字。
知道怎么做吗?
为了便于解释,我们假设您有一个字符串变量 messages
,您希望将错别字引入其中。我在 messages
中引入拼写错误的策略是 和 常见的拼写错误,将替换 messages
中的 随机 字母与键盘附近的其他字母(即替换 a with s
或 d with f
)。方法如下:
import random # random typos
message = "The quick brown fox jumped over the big red dog."
# convert the message to a list of characters
message = list(message)
typo_prob = 0.1 # percent (out of 1.0) of characters to become typos
# the number of characters that will be typos
n_chars_to_flip = round(len(message) * typo_prob)
# is a letter capitalized?
capitalization = [False] * len(message)
# make all characters lowercase & record uppercase
for i in range(len(message)):
capitalization[i] = message[i].isupper()
message[i] = message[i].lower()
# list of characters that will be flipped
pos_to_flip = []
for i in range(n_chars_to_flip):
pos_to_flip.append(random.randint(0, len(message) - 1))
# dictionary... for each letter list of letters
# nearby on the keyboard
nearbykeys = {
'a': ['q','w','s','x','z'],
'b': ['v','g','h','n'],
'c': ['x','d','f','v'],
'd': ['s','e','r','f','c','x'],
'e': ['w','s','d','r'],
'f': ['d','r','t','g','v','c'],
'g': ['f','t','y','h','b','v'],
'h': ['g','y','u','j','n','b'],
'i': ['u','j','k','o'],
'j': ['h','u','i','k','n','m'],
'k': ['j','i','o','l','m'],
'l': ['k','o','p'],
'm': ['n','j','k','l'],
'n': ['b','h','j','m'],
'o': ['i','k','l','p'],
'p': ['o','l'],
'q': ['w','a','s'],
'r': ['e','d','f','t'],
's': ['w','e','d','x','z','a'],
't': ['r','f','g','y'],
'u': ['y','h','j','i'],
'v': ['c','f','g','v','b'],
'w': ['q','a','s','e'],
'x': ['z','s','d','c'],
'y': ['t','g','h','u'],
'z': ['a','s','x'],
' ': ['c','v','b','n','m']
}
# insert typos
for pos in pos_to_flip:
# try-except in case of special characters
try:
typo_arrays = nearbykeys[message[pos]]
message[pos] = random.choice(typo_arrays)
except:
break
# reinsert capitalization
for i in range(len(message)):
if (capitalization[i]):
message[i] = message[i].upper()
# recombine the message into a string
message = ''.join(message)
# show the message in the console
print(message)