Python 文本处理
Python text processing
我有一个包含大量数据的数据文件。我只对每个用户需要更新的两个代码感兴趣。我在单独的文件中有新代码。我只想比较这两个文件并向现有文件添加新代码。
旧文件:(txt2)
.
..
..
alpha Donec vulputate lorem tortor, nec fermentum nibh bibendum vel.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Praesent dictum luctus massa, non euismod lacus.
${alpha_john}: 'Lorem ipsum dolor sit amet, consectetur'
${beta_john}: 'iuhertgh jndsfbjpijwrg'
${alpha_mac}: 'acerat a lorem eget, ultricies'
${beta_mac}: 'elit nibh, eu condimentum orci viverra q'
${alpha_joe}: 'gravida lorem, ut congue diam.'
${beta_joe}: 'orttitor in condimentum nec, venenatis eu urna'
${alpha_mark}: ''
${beta_mark}: ''
${alpha_ross}: 'suscipit vitae felis non suscipit.'
${beta_ross}: 'non vulputate convallis, ligula diam sagittis urna, in venenatis'
${alpha_don}: 'Pellentesque feugiat diam est, at rhoncus orci porttitor'
${beta_don}: 'Sed elementum elit nibh'
${alpha_harry}: 'Proin tempor lacus arcu.'
${beta_harry}: 'posuere sollicitudin mi, et vulputate nisl fringilla non'
Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos.
Aliquam euismod ultrices lorem, sit amet imperdiet est tincidunt vel.
Phasellus dictum justo sit amet ligula varius aliquet auctor et metus.
..
..
.
代码文件:(txt1)
${alpha_john}: 'XXXXXHHHHHHHXXXXXX'
${beta_john}: 'XFFFFFFFFFGGGGGGGGDDDDDD'
${alpha_mac}: 'DDDDDDKKKKKKKKK'
${beta_mac}: 'KKKKKKKKKKKYYYYYYYYYYYYD'
${alpha_joe}: 'TTTTTVVVVVVVVVVVKK'
${beta_joe}: 'OOOOOOOSSSSSSSSSSPPPPPP'
${alpha_ross}: 'SSSSSHHHHHHHHTTTTTTTT'
${beta_ross}: 'PPPPPWWWWWHHHHHHHHHH'
${alpha_harry}: 'IIIIIIEEEEEEETTTTTTTTTT'
${beta_harry}: 'YYYYYYYYEEEEEEEEEEMMMMMMMMMM'
我的代码:
#!/usr/bin/env python
import os, sys, re, time
import argparse
import logging
import time
cat /dev/null > /home/user/scripts/temp/txt3
file1=open("/home/user/scripts/temp/txt1",'r+')
file2=open("/home/user/scripts/temp/txt2", 'r+')
file3=open("/home/user/scripts/temp/txt3", 'r+')
for line1 in file1:
keyword=line1[line1.find("{")+1:line1.find("}")]
for line2 in file2:
if keyword in line2:
file3.write(line1)
else:
file3.write(line2)
file1.close()
file2.close()
file3.close()
输出:
alpha Donec vulputate lorem tortor, nec fermentum nibh bibendum vel.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Praesent dictum luctus massa, non euismod lacus.
${alpha_john}: 'XXXXXHHHHHHHXXXXXX'
${beta_john}: 'iuhertgh jndsfbjpijwrg'
${alpha_mac}: 'acerat a lorem eget, ultricies'
${beta_mac}: 'elit nibh, eu condimentum orci viverra q'
${alpha_joe}: 'gravida lorem, ut congue diam.'
${beta_joe}: 'orttitor in condimentum nec, venenatis eu urna'
${alpha_mark}: ''
${beta_mark}: ''
${alpha_ross}: 'suscipit vitae felis non suscipit.'
${beta_ross}: 'non vulputate convallis, ligula diam sagittis urna, in venenatis'
${alpha_don}: 'Pellentesque feugiat diam est, at rhoncus orci porttitor'
${beta_don}: 'Sed elementum elit nibh'
${alpha_harry}: 'Proin tempor lacus arcu.'
${beta_harry}: 'posuere sollicitudin mi, et vulputate nisl fringilla non'
Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos.
Aliquam euismod ultrices lorem, sit amet imperdiet est tincidunt vel.
Phasellus dictum justo sit amet ligula varius aliquet auctor et metus.
此代码仅在新文件中打印 txt1 '${alpha_john}: 'XXXXXHHHHHHHXXXXXX'' 中的一行,但其余行与旧文件 (txt2) 中的相同。
如何覆盖 (txt1) 中的所有行?
如果需要任何其他信息,请告诉我。
您正在迭代 file2
len(file1)
次,这肯定不是您想做的。你想从 file1
构造一个替换字典,如下所示:
import re
# regex to find usernames.
# You can use str.split to find the usernames like you did if you're
# not comfortable with regular expressions.
user_regex = re.compile(r'^${([a-zA-Z0-9_]+)}: ')
# rename files to something better
codes_file = "/home/user/scripts/temp/txt1"
old_file = "/home/user/scripts/temp/txt2"
new_file = "/home/user/scripts/temp/txt3"
codes = {}
with open(codes) as f: # use with to safely open files
for line in f:
match = user_regex.search(line)
if match:
codes[match.group(1)] = line
# now we have the codes in ram for easy lookup
with open(old_file) as old, open(new_file, 'w') as new:
for line in old:
match = user_regex.search(line)
if match and match.group(1) in codes.keys():
new.write(codes[match.group(1)])
else:
new.write(line)
我有一个包含大量数据的数据文件。我只对每个用户需要更新的两个代码感兴趣。我在单独的文件中有新代码。我只想比较这两个文件并向现有文件添加新代码。
旧文件:(txt2)
.
..
..
alpha Donec vulputate lorem tortor, nec fermentum nibh bibendum vel.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Praesent dictum luctus massa, non euismod lacus.
${alpha_john}: 'Lorem ipsum dolor sit amet, consectetur'
${beta_john}: 'iuhertgh jndsfbjpijwrg'
${alpha_mac}: 'acerat a lorem eget, ultricies'
${beta_mac}: 'elit nibh, eu condimentum orci viverra q'
${alpha_joe}: 'gravida lorem, ut congue diam.'
${beta_joe}: 'orttitor in condimentum nec, venenatis eu urna'
${alpha_mark}: ''
${beta_mark}: ''
${alpha_ross}: 'suscipit vitae felis non suscipit.'
${beta_ross}: 'non vulputate convallis, ligula diam sagittis urna, in venenatis'
${alpha_don}: 'Pellentesque feugiat diam est, at rhoncus orci porttitor'
${beta_don}: 'Sed elementum elit nibh'
${alpha_harry}: 'Proin tempor lacus arcu.'
${beta_harry}: 'posuere sollicitudin mi, et vulputate nisl fringilla non'
Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos.
Aliquam euismod ultrices lorem, sit amet imperdiet est tincidunt vel.
Phasellus dictum justo sit amet ligula varius aliquet auctor et metus.
..
..
.
代码文件:(txt1)
${alpha_john}: 'XXXXXHHHHHHHXXXXXX'
${beta_john}: 'XFFFFFFFFFGGGGGGGGDDDDDD'
${alpha_mac}: 'DDDDDDKKKKKKKKK'
${beta_mac}: 'KKKKKKKKKKKYYYYYYYYYYYYD'
${alpha_joe}: 'TTTTTVVVVVVVVVVVKK'
${beta_joe}: 'OOOOOOOSSSSSSSSSSPPPPPP'
${alpha_ross}: 'SSSSSHHHHHHHHTTTTTTTT'
${beta_ross}: 'PPPPPWWWWWHHHHHHHHHH'
${alpha_harry}: 'IIIIIIEEEEEEETTTTTTTTTT'
${beta_harry}: 'YYYYYYYYEEEEEEEEEEMMMMMMMMMM'
我的代码:
#!/usr/bin/env python
import os, sys, re, time
import argparse
import logging
import time
cat /dev/null > /home/user/scripts/temp/txt3
file1=open("/home/user/scripts/temp/txt1",'r+')
file2=open("/home/user/scripts/temp/txt2", 'r+')
file3=open("/home/user/scripts/temp/txt3", 'r+')
for line1 in file1:
keyword=line1[line1.find("{")+1:line1.find("}")]
for line2 in file2:
if keyword in line2:
file3.write(line1)
else:
file3.write(line2)
file1.close()
file2.close()
file3.close()
输出:
alpha Donec vulputate lorem tortor, nec fermentum nibh bibendum vel.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Praesent dictum luctus massa, non euismod lacus.
${alpha_john}: 'XXXXXHHHHHHHXXXXXX'
${beta_john}: 'iuhertgh jndsfbjpijwrg'
${alpha_mac}: 'acerat a lorem eget, ultricies'
${beta_mac}: 'elit nibh, eu condimentum orci viverra q'
${alpha_joe}: 'gravida lorem, ut congue diam.'
${beta_joe}: 'orttitor in condimentum nec, venenatis eu urna'
${alpha_mark}: ''
${beta_mark}: ''
${alpha_ross}: 'suscipit vitae felis non suscipit.'
${beta_ross}: 'non vulputate convallis, ligula diam sagittis urna, in venenatis'
${alpha_don}: 'Pellentesque feugiat diam est, at rhoncus orci porttitor'
${beta_don}: 'Sed elementum elit nibh'
${alpha_harry}: 'Proin tempor lacus arcu.'
${beta_harry}: 'posuere sollicitudin mi, et vulputate nisl fringilla non'
Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos.
Aliquam euismod ultrices lorem, sit amet imperdiet est tincidunt vel.
Phasellus dictum justo sit amet ligula varius aliquet auctor et metus.
此代码仅在新文件中打印 txt1 '${alpha_john}: 'XXXXXHHHHHHHXXXXXX'' 中的一行,但其余行与旧文件 (txt2) 中的相同。
如何覆盖 (txt1) 中的所有行?
如果需要任何其他信息,请告诉我。
您正在迭代 file2
len(file1)
次,这肯定不是您想做的。你想从 file1
构造一个替换字典,如下所示:
import re
# regex to find usernames.
# You can use str.split to find the usernames like you did if you're
# not comfortable with regular expressions.
user_regex = re.compile(r'^${([a-zA-Z0-9_]+)}: ')
# rename files to something better
codes_file = "/home/user/scripts/temp/txt1"
old_file = "/home/user/scripts/temp/txt2"
new_file = "/home/user/scripts/temp/txt3"
codes = {}
with open(codes) as f: # use with to safely open files
for line in f:
match = user_regex.search(line)
if match:
codes[match.group(1)] = line
# now we have the codes in ram for easy lookup
with open(old_file) as old, open(new_file, 'w') as new:
for line in old:
match = user_regex.search(line)
if match and match.group(1) in codes.keys():
new.write(codes[match.group(1)])
else:
new.write(line)