文本文件中的字母加扰
Scrambling of letters within a text file
我正在准备一个测试数据,其中必须有不同的字母说 அ-20 次
ம-30次,த-40次.....(它们是UTF-8编码支持的泰米尔语字母)
这可以使用打印语句
来实现
{print ( ' ம் ' * 30 ) + ( ' த ' * 40 ) + }
但是,我需要将它们打乱,这样它们就不会以任何特定的顺序出现。我有大约 230 多封信,我将打印 20、30、40 次。然后我需要打乱它们并将它们写入输出文件。
在这方面的任何帮助都是有帮助的。
只需使用random.choice:
import random
size = 1000
values = [' ம் ', ' த ', ' த ']
print "".join(random.choice(values) for i in xrange(size))
我建议将此问题分为 3 个部分:assemble 你的字母列表,打乱列表,然后将其写入文件。请注意,以下代码中的第一行应位于 python 文件的顶部,以允许您在源代码本身中使用 utf-8 字符。
# -*- coding: utf-8 -*-
import codecs # To write UTF-8 characters to a file
import random
# Assemble data list
letters = [u'அ', u'ம', u'த']
data = [] # This list will hold the shuffled data
for current_letter in letters:
# Choose how many times to repeat the current letter.
times_repeated = random.choice([20, 30, 40])
data.extend([current_letter] * times_repeated)
# Now, shuffle the 'data' list
random.shuffle(data)
# Now write the shuffled list to a file as one continuous string
data_string = "".join(data)
with codecs.open("data.txt", "w", "utf-8") as f:
f.write(data_string)
请注意,如果您知道希望每个字母出现多少次,您可以将该信息放入字典中,而不是从 [20, 30, 40]
:
中随机选择
# The key is the letter to repeat, the value is the number of times to repeat it
letters = {u'அ': 20,
u'ம': 30,
u'த': 20}
for letter in letters:
times_repeated = letters[letter]
# ... rest of the code would look the same ...
您可以通过多种方式解决此问题。最有效的方法是使用 random
module.
random.shuffle
>>> from random import shuffle
>>> my_string = list('This is a test string.')
>>> shuffle(my_string)
>>> scrambled = ''.join(my_string)
>>> print(scrambled)
.sTtha te s rtisns gii
为此,您必须从字符串的字符创建一个list
,因为字符串是immutable.
A new object has to be created if a different value has to be stored.
random.sample
>>> from random import sample
>>> my_string = 'This is a test string.'
>>> scrambled = random.sample(my_string, len(my_string))
>>> scrambled = ''.join(scrambled)
>>> print(scrambled)
gr.s i tisstheit Tn sa
您不必为此创建 list
;因为,来自 random.sample
文档:
Returns a new list containing elements from the population while leaving the original population unchanged.
The sorted
built-in with random.random
>>> from random import random
>>> my_string = 'This is a test string.'
>>> scrambled = sorted(my_string, key=lambda i: random())
>>> scrambled = ''.join(scrambled)
>>> print(scrambled)
ngi rts ithsT.staie s
您也不需要 list
。来自 sorted
文档:
Return a new sorted list from the items in iterable.
因为字符串在 Python 中被视为 iterable(见下文),所以可以在其上使用 sorted
。
可迭代定义为
An object capable of returning its members one at a time.
感谢帮助我编写代码的朋友@AswinMurugesh。
以下代码成功了。
import codecs
import tamil
from random import shuffle
inp_file = codecs.open("/home/sibi/Desktop/scramble.txt",encoding="utf-8")
inp_text = inp_file.read().rstrip()
tamil_letters = tamil.utf8.get_letters(inp_text)
shuffle(tamil_letters)
tamil_letters = "".join(tamil_letters).encode("utf-8")
print tamil_letters
out_file = open('outputscrambled.txt','w')
out_file.write(tamil_letters)
我正在准备一个测试数据,其中必须有不同的字母说 அ-20 次 ம-30次,த-40次.....(它们是UTF-8编码支持的泰米尔语字母) 这可以使用打印语句
来实现{print ( ' ம் ' * 30 ) + ( ' த ' * 40 ) + }
但是,我需要将它们打乱,这样它们就不会以任何特定的顺序出现。我有大约 230 多封信,我将打印 20、30、40 次。然后我需要打乱它们并将它们写入输出文件。 在这方面的任何帮助都是有帮助的。
只需使用random.choice:
import random
size = 1000
values = [' ம் ', ' த ', ' த ']
print "".join(random.choice(values) for i in xrange(size))
我建议将此问题分为 3 个部分:assemble 你的字母列表,打乱列表,然后将其写入文件。请注意,以下代码中的第一行应位于 python 文件的顶部,以允许您在源代码本身中使用 utf-8 字符。
# -*- coding: utf-8 -*-
import codecs # To write UTF-8 characters to a file
import random
# Assemble data list
letters = [u'அ', u'ம', u'த']
data = [] # This list will hold the shuffled data
for current_letter in letters:
# Choose how many times to repeat the current letter.
times_repeated = random.choice([20, 30, 40])
data.extend([current_letter] * times_repeated)
# Now, shuffle the 'data' list
random.shuffle(data)
# Now write the shuffled list to a file as one continuous string
data_string = "".join(data)
with codecs.open("data.txt", "w", "utf-8") as f:
f.write(data_string)
请注意,如果您知道希望每个字母出现多少次,您可以将该信息放入字典中,而不是从 [20, 30, 40]
:
# The key is the letter to repeat, the value is the number of times to repeat it
letters = {u'அ': 20,
u'ம': 30,
u'த': 20}
for letter in letters:
times_repeated = letters[letter]
# ... rest of the code would look the same ...
您可以通过多种方式解决此问题。最有效的方法是使用 random
module.
random.shuffle
>>> from random import shuffle
>>> my_string = list('This is a test string.')
>>> shuffle(my_string)
>>> scrambled = ''.join(my_string)
>>> print(scrambled)
.sTtha te s rtisns gii
为此,您必须从字符串的字符创建一个list
,因为字符串是immutable.
A new object has to be created if a different value has to be stored.
random.sample
>>> from random import sample
>>> my_string = 'This is a test string.'
>>> scrambled = random.sample(my_string, len(my_string))
>>> scrambled = ''.join(scrambled)
>>> print(scrambled)
gr.s i tisstheit Tn sa
您不必为此创建 list
;因为,来自 random.sample
文档:
Returns a new list containing elements from the population while leaving the original population unchanged.
The sorted
built-in with random.random
>>> from random import random
>>> my_string = 'This is a test string.'
>>> scrambled = sorted(my_string, key=lambda i: random())
>>> scrambled = ''.join(scrambled)
>>> print(scrambled)
ngi rts ithsT.staie s
您也不需要 list
。来自 sorted
文档:
Return a new sorted list from the items in iterable.
因为字符串在 Python 中被视为 iterable(见下文),所以可以在其上使用 sorted
。
可迭代定义为
An object capable of returning its members one at a time.
感谢帮助我编写代码的朋友@AswinMurugesh。
以下代码成功了。
import codecs
import tamil
from random import shuffle
inp_file = codecs.open("/home/sibi/Desktop/scramble.txt",encoding="utf-8")
inp_text = inp_file.read().rstrip()
tamil_letters = tamil.utf8.get_letters(inp_text)
shuffle(tamil_letters)
tamil_letters = "".join(tamil_letters).encode("utf-8")
print tamil_letters
out_file = open('outputscrambled.txt','w')
out_file.write(tamil_letters)