替换列表中的索引字符串并写出到文本文件
Replace indexed strings in a list and write out to a text file
我正在尝试替换文本文件中的某些单词(基本上是每行的第二个单词),然后将它们写回新文件或覆盖现有文件。
我以为我正在取得进展,但是当我去写入一个新文件时,我得到一个错误,说我无法将列表写入文本文件。我不能简单地用一个词替换另一个词,因为我有一个 'else' 子句,它涵盖了与我需要替换的其他词不匹配的任何词。
这是我要修改的文本示例,该文本包含在 .txt 文件中:
id int,
organization_id int,
billing_month date,
fee_type varchar(100),
rate float,
price float,
uom varchar(25),
amount float,
currency_code_id float,
process_ts timestamptz NOT NULL DEFAULT (now())::timestamptz(6)
我想更改:
'int' --> 'BIGINT'
'numeric' --> 'DOUBLE'
'float' --> 'DOUBLE'
ELSE other data type --> 'STRING' .
在原始数据中,请注意有些字符有其他字符,例如 "varchar(100)" - 我想用 "STRING" 替换它们并删除 '(100)' 部分。
然后覆盖或创建一个新的文本文件。因此,如果正确替换,上面的示例输出将是:
id BIGINT,
organization_id BIGINT,
billing_month STRING,
fee_type STRING,
rate DOUBLE,
price DOUBLE,
uom STRING,
amount DOUBLE,
currency_code_id DOUBLE,
process_ts STRING
我不知道是否应该创建列表,然后修改它们,然后将这些列表写入文本文件、字典或其他我没有想到的方法。我是一个初学者,如果不是很清楚,请见谅。
txt.txt 的内容:
id int,
organization_id int,
billing_month date,
fee_type varchar(100),
rate float,
price float,
uom varchar(25),
amount float,
currency_code_id float,
process_ts timestamptz NOT NULL DEFAULT (now())::timestamptz(6)
代码:
with open('txt.txt', 'r') as f:
text = f.read().splitlines()
mapping = {'int':'BIGINT',
'numeric':'DOUBLE',
'float':'DOUBLE'}
replaced_text = []
for line in text:
# temporarily remove comma
line = line.rstrip(',')
split_line = line.split()
other_text, dtype = split_line[0], split_line[1:]
new_dtype = mapping.get(' '.join(dtype), 'STRING')
new_line = '{} {},\n'.format(other_text, new_dtype)
replaced_text.append(new_line)
with open('txt_replaced.txt', 'w') as f:
f.writelines(replaced_text)
txt_replaced.txt 的内容:
id BIGINT,
organization_id BIGINT,
billing_month STRING,
fee_type STRING,
rate DOUBLE,
price DOUBLE,
uom STRING,
amount DOUBLE,
currency_code_id DOUBLE,
process_ts STRING,
您可以遍历每一行并使用字典替换每行第二个位置的值。这适用于任何长度的行,只要要替换的文本是第二个单词即可。
#vals to replace
replace_vals = {'int':'BIGINT', 'numeric':'DOUBLE', 'float':'DOUBLE'}
#file we write to
with open('out.txt', 'w') as outfile:
#file we read from
with open ("in.txt", 'r') as infile:
#check each line
for line in infile:
#split line into words
words = line.split()
#get the first word and then replace the second word, defaulting to STRING
w = words[0] + " " + replace_vals.get(words[1], 'STRING')
#add a final newline
w += "\n"
#print to file
outfile.write(w)
也许这样的事情可以帮助你:
import os, sys
from re import match as regexSearch
path = os.path.dirname(__file__)
myFile = open(os.path.join(path, "filename.txt"), "r")
regExpr = r"[\w_]+ ([\w\(\)\d:]+)(,|\s)"
with open(os.path.join(path, "newFile.txt"), "w") as f:
for line in myFile.readlines():
match = regexSearch(regExpr, line)
if match:
result = match.group(1)
if "int" in result:
f.write(line.replace(result, "BIGINT"))
elif result in ["numeric", "float"]:
f.write(line.replace(result, "DOUBLE"))
else:
f.write(line.replace(result, "STRING"))
else:
print("couldn't find something in line:\n", line)
f.close()
我正在尝试替换文本文件中的某些单词(基本上是每行的第二个单词),然后将它们写回新文件或覆盖现有文件。
我以为我正在取得进展,但是当我去写入一个新文件时,我得到一个错误,说我无法将列表写入文本文件。我不能简单地用一个词替换另一个词,因为我有一个 'else' 子句,它涵盖了与我需要替换的其他词不匹配的任何词。
这是我要修改的文本示例,该文本包含在 .txt 文件中:
id int,
organization_id int,
billing_month date,
fee_type varchar(100),
rate float,
price float,
uom varchar(25),
amount float,
currency_code_id float,
process_ts timestamptz NOT NULL DEFAULT (now())::timestamptz(6)
我想更改:
'int' --> 'BIGINT'
'numeric' --> 'DOUBLE'
'float' --> 'DOUBLE'
ELSE other data type --> 'STRING' .
在原始数据中,请注意有些字符有其他字符,例如 "varchar(100)" - 我想用 "STRING" 替换它们并删除 '(100)' 部分。
然后覆盖或创建一个新的文本文件。因此,如果正确替换,上面的示例输出将是:
id BIGINT,
organization_id BIGINT,
billing_month STRING,
fee_type STRING,
rate DOUBLE,
price DOUBLE,
uom STRING,
amount DOUBLE,
currency_code_id DOUBLE,
process_ts STRING
我不知道是否应该创建列表,然后修改它们,然后将这些列表写入文本文件、字典或其他我没有想到的方法。我是一个初学者,如果不是很清楚,请见谅。
txt.txt 的内容:
id int,
organization_id int,
billing_month date,
fee_type varchar(100),
rate float,
price float,
uom varchar(25),
amount float,
currency_code_id float,
process_ts timestamptz NOT NULL DEFAULT (now())::timestamptz(6)
代码:
with open('txt.txt', 'r') as f:
text = f.read().splitlines()
mapping = {'int':'BIGINT',
'numeric':'DOUBLE',
'float':'DOUBLE'}
replaced_text = []
for line in text:
# temporarily remove comma
line = line.rstrip(',')
split_line = line.split()
other_text, dtype = split_line[0], split_line[1:]
new_dtype = mapping.get(' '.join(dtype), 'STRING')
new_line = '{} {},\n'.format(other_text, new_dtype)
replaced_text.append(new_line)
with open('txt_replaced.txt', 'w') as f:
f.writelines(replaced_text)
txt_replaced.txt 的内容:
id BIGINT,
organization_id BIGINT,
billing_month STRING,
fee_type STRING,
rate DOUBLE,
price DOUBLE,
uom STRING,
amount DOUBLE,
currency_code_id DOUBLE,
process_ts STRING,
您可以遍历每一行并使用字典替换每行第二个位置的值。这适用于任何长度的行,只要要替换的文本是第二个单词即可。
#vals to replace
replace_vals = {'int':'BIGINT', 'numeric':'DOUBLE', 'float':'DOUBLE'}
#file we write to
with open('out.txt', 'w') as outfile:
#file we read from
with open ("in.txt", 'r') as infile:
#check each line
for line in infile:
#split line into words
words = line.split()
#get the first word and then replace the second word, defaulting to STRING
w = words[0] + " " + replace_vals.get(words[1], 'STRING')
#add a final newline
w += "\n"
#print to file
outfile.write(w)
也许这样的事情可以帮助你:
import os, sys
from re import match as regexSearch
path = os.path.dirname(__file__)
myFile = open(os.path.join(path, "filename.txt"), "r")
regExpr = r"[\w_]+ ([\w\(\)\d:]+)(,|\s)"
with open(os.path.join(path, "newFile.txt"), "w") as f:
for line in myFile.readlines():
match = regexSearch(regExpr, line)
if match:
result = match.group(1)
if "int" in result:
f.write(line.replace(result, "BIGINT"))
elif result in ["numeric", "float"]:
f.write(line.replace(result, "DOUBLE"))
else:
f.write(line.replace(result, "STRING"))
else:
print("couldn't find something in line:\n", line)
f.close()