打开不同文件夹中的文本文件并写入 csv 单元格
opening text files in different folders and write to a csv cell
我正在尝试从不同的文件夹中获取文本,并将每个文本及其文件名 (*.txt) 以 CSV 格式写入单个单元格
import os
folders = os.listdir("/Users/hilo/Documents/digitization/ReleasedDataset_mp3")
folders
import glob, csv
这里我试图获取文件夹名称列表,它们是这样的:
['Becton Dickinson_20170803',
'CIGNA Corp._20170202',
'The Bank of New York Mellon Corp._20170720',
'JPMorgan Chase & Co._20170714']
这里我尝试应用一个循环来打开和提取每个 *txt 文件中的所有文本,然后使用键 (*)
将所有文本写入 csv 文件的单元格中
for i in folders:
files=glob.glob("/Users/hilo/Documents/digitization/ReleasedDataset_mp3/i/*.txt")
with open('writeData.csv', mode='w') as new_file:
writer = csv.writer(new_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
for filename in files:
# Take all sentences from a given file
file = open(filename, 'rt')
text = file.read()
file.close()
for text in text:
writer.writerow((filename, text))
这一直在生成一个空的 CSV。有人有解决代码中问题的建议吗?
您错过了第 2 行中的字符串插值。
应该是files=glob.glob(f"/Users/hilo/Documents/digitization/ReleasedDataset_mp3/{i}/*.txt")
现在它将替换循环中 i 的值,而不是将其解释为文字值
根据您在评论中提供的其他信息,我认为这会奏效:
import csv
import glob
import os
from pprint import pprint, pp
#root_folder = "/Users/hilo/Documents/digitization/ReleasedDataset_mp3"
root_folder = "/Stack Overflow/_test_files_root"
#folders = ['Becton Dickinson_20170803',
# 'CIGNA Corp._20170202',
# 'The Bank of New York Mellon Corp._20170720',
# 'JPMorgan Chase & Co._20170714']
folders = ['Subfolder1', 'Subfolder3']
filepaths = []
for subfolder in folders:
filepaths.extend(glob.glob(os.path.join(root_folder, subfolder, "*.txt")))
if os.name == 'nt': # Improve readability on Windows (optional)
filepaths[:] = [filepath.replace('\', '/') for filepath in filepaths]
pprint(filepaths, width=128) # Show files to be processed (optional)
# Process the files.
with open('writeData.csv', mode='w', newline='') as new_file:
writer = csv.writer(new_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
for filename in filepaths:
# Take all sentences from a given file.
with open(filename, 'rt') as file:
text = file.read()
# Write them into CSV along with filename.
writer.writerow((filename, text))
print('-FINI-')
这是在 Excel 中创建的文件的样子:
(我使用了各种在线新闻文章中的文本进行测试。)
我正在尝试从不同的文件夹中获取文本,并将每个文本及其文件名 (*.txt) 以 CSV 格式写入单个单元格
import os
folders = os.listdir("/Users/hilo/Documents/digitization/ReleasedDataset_mp3")
folders
import glob, csv
这里我试图获取文件夹名称列表,它们是这样的:
['Becton Dickinson_20170803',
'CIGNA Corp._20170202',
'The Bank of New York Mellon Corp._20170720',
'JPMorgan Chase & Co._20170714']
这里我尝试应用一个循环来打开和提取每个 *txt 文件中的所有文本,然后使用键 (*)
将所有文本写入 csv 文件的单元格中for i in folders:
files=glob.glob("/Users/hilo/Documents/digitization/ReleasedDataset_mp3/i/*.txt")
with open('writeData.csv', mode='w') as new_file:
writer = csv.writer(new_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
for filename in files:
# Take all sentences from a given file
file = open(filename, 'rt')
text = file.read()
file.close()
for text in text:
writer.writerow((filename, text))
这一直在生成一个空的 CSV。有人有解决代码中问题的建议吗?
您错过了第 2 行中的字符串插值。
应该是files=glob.glob(f"/Users/hilo/Documents/digitization/ReleasedDataset_mp3/{i}/*.txt")
现在它将替换循环中 i 的值,而不是将其解释为文字值
根据您在评论中提供的其他信息,我认为这会奏效:
import csv
import glob
import os
from pprint import pprint, pp
#root_folder = "/Users/hilo/Documents/digitization/ReleasedDataset_mp3"
root_folder = "/Stack Overflow/_test_files_root"
#folders = ['Becton Dickinson_20170803',
# 'CIGNA Corp._20170202',
# 'The Bank of New York Mellon Corp._20170720',
# 'JPMorgan Chase & Co._20170714']
folders = ['Subfolder1', 'Subfolder3']
filepaths = []
for subfolder in folders:
filepaths.extend(glob.glob(os.path.join(root_folder, subfolder, "*.txt")))
if os.name == 'nt': # Improve readability on Windows (optional)
filepaths[:] = [filepath.replace('\', '/') for filepath in filepaths]
pprint(filepaths, width=128) # Show files to be processed (optional)
# Process the files.
with open('writeData.csv', mode='w', newline='') as new_file:
writer = csv.writer(new_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
for filename in filepaths:
# Take all sentences from a given file.
with open(filename, 'rt') as file:
text = file.read()
# Write them into CSV along with filename.
writer.writerow((filename, text))
print('-FINI-')
这是在 Excel 中创建的文件的样子:
(我使用了各种在线新闻文章中的文本进行测试。)