使用 python 从 CSV 文件中提取特定字符串和整数并将其写入 .txt 文件
Pulling Specific strings and integers from CSV file and writing it to .txt file using python
NEW Update 11:30am CST 以下是我的完整代码
所需的结果将是此格式的 txt 文件:
Logical ID: (&) 192.168.xx.xxx (if it has both)
192.168.xx.xxx
Logical ID:
192.168.xx.xxx
192.168.xx.xxx
Logical ID:
Logical ID:
192.168.xx.xxx
**最新代码-> 如果存在逻辑 ID,我想打印它,如果不存在,我希望它打印 IP 地址(到新表格)
代码显示我指定了一个 model.csv 来写入一个 model.txt 并且必须逐个模型手动更改它。因此,如果有解决方案,那也很好
import csv
import re
import sys
sys.stdout = open("C:\Users\ADMIN-SURV\Desktop\data_pull\2.0C-H4A-DC2 .txt", 'w')
with open('C:\Users\ADMIN-SURV\Desktop\data_pull\2.0C-H4A-DC2_filter.csv') as fid:
inputfile = csv.reader(fid)
for row in inputfile:
if len(row) >= 4:
if row[0] == 'File name':
# skip the header row
continue
m = re.match(r".*(.* [0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})", row[3])
if m:
print(m.group(1))
else:
print(row[3])
sys.stdout.close()
*** error parsing line: model not found H4SL-D1(2305854) Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown 192.168.50.250 00:18:85:***
*** error parsing line: model not found H4SL-D1(2878617) Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown 192.168.50.194 00:18:85:***
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:1026
*** error parsing line: model not found Unsupported SOUTH LV Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown 192.168.206.250 00:18:85:***
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:3027
*** error parsing line: model not found ELEVATOR GROUND FL Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown 192.168.80.203 00:18:85:***
当然,那只是return的一小部分。
我想补充一点,CSV 文件实际上是专门针对模型的,所以我实际上要做的就是从第一列中提取逻辑 ID,并将它们添加到定义模式下的列表中,如果没有附加逻辑 ID,然后 return 给定的 IP 地址。
我搜索了一个项目的所有实例并将结果保存为 CSV 文件。我正在使用 python 来尝试获取特定信息。我很想添加照片,但我不允许。
这是我收到的错误
Traceback (most recent call last):
File "C:\Users\ADMIN-SURV\PycharmProjects\pdf_scraping\test_file.,py", line 7, in
print(column[3])
IndexError: list index out of range
这是我唯一写过的代码:
import csv
inputfile = csv.reader(open('C:\Users\ADMIN-SURV\Desktop\data_pull\Untitled.csv','r'))
for column in inputfile:
print(column[3])
当我去掉最后一行的 [3] 就离开时
print(column)
它在控制台中打印我的整个 CSV 文件。我想要的只是每一行的特定信息,我可以通过从特定列中获取信息来获取该信息。
CSV 文件数据如下所示:
Search Results"
"Summary"
"Saved on","12/8/2021 1:57:21 PM"
"Searched for","Avigilon (ONVIF) 1.3C-H4SL-D1"
"In document","C:\Users\ADMIN-SURV\Desktop\data_pull\IslandView.pdf"
"Number of document(s) found","1"
"Number of instance(s) found","551"
"File name","Title","Page","Search Instance"
"IslandView.pdf","","5","Detection: Unsupported 2058 Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:2058 192.168.202.206 "
"IslandView.pdf","","9","BAR POS 01 Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown 192.168.70.214 00:18:85:"
"IslandView.pdf","","9","H4SL-D1(1866954) Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:481 192.168.11.203 "
"IslandView.pdf","","9","H4SL-D1(1825930) Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:20 192.168.16.203 "
"IslandView.pdf","","9","Detection: Unsupported 2048 Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown 192.168.202.200 00:18:85:"
"IslandView.pdf","","9","H4SL-D1(1866877) Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:92 192.168.15.205 "
"IslandView.pdf","","9","Detection: Unsupported 2074 Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:2074 192.168.203.241 "
"IslandView.pdf","","9","Detection: Unsupported 2174 Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:2174 192.168.201.232 "
"IslandView.pdf","","9","Detection: Unsupported 2161 Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:2161 192.168.205.231 "
有 500 多行。您可以看到 headers
列
"File name","Title","Page","Search Instance"
我只需要第一列中的型号和逻辑 ID 信息。我想隔离然后制作一个有组织的列表,其中逻辑 ID 与哪个模型相关。
以防万一,这是第 1 列中的型号和逻辑 ID 示例
Avigilon (ONVIF) 1.3C-H4SL-D1
Logical ID: 875
最终目标是创建一个 sheet,其中列出了每个模型(这是一个模型的搜索结果),并在该列表下方列出了与该模型关联的所有逻辑 ID。
如果我可以澄清或提供任何进一步的信息,请告诉我。
谢谢!
CSV 文件的顶部包含少于 4 列的行。为避免 IndexError 首先尝试测试行长度:
# "inputfile" is a CSV reader instance
for row in inputfile:
if len(row) >= 4:
print(row[3])
这是一个使用正则表达式分解模型名称的变体,在 IP 地址处停止:
import re
with open('example_data.csv') as fid:
inputfile = csv.reader(fid)
for row in inputfile:
if len(row) >= 4:
if row[0] == 'File name':
# skip the header row
continue
m = re.match(r'.*(Avigilon.* [0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})', row[3])
if m:
print(m.group(1))
else:
print(f'*** error parsing line: model not found {row[3]}***')
以上数据打印:
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:2058 192.168.202.206
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown 192.168.70.214
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:481 192.168.11.203
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:20 192.168.16.203
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown 192.168.202.200
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:92 192.168.15.205
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:2074 192.168.203.241
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:2174 192.168.201.232
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:2161 192.168.205.231
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown 192.168.50.246
要将输出写入文本文件,请尝试如下操作:
with open('logfile.txt', 'w') as fout:
with open('example_data.csv') as fid:
inputfile = csv.reader(fid)
for row in inputfile:
if len(row) >= 4:
if row[0] == 'File name':
# skip the header row
continue
m = re.match(r'.*(Avigilon.* [0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})', row[3])
if m:
# optional: delete this print line
print(m.group(1))
fout.write(f'{m.group(1)}\n')
else:
print(f'*** error parsing line: model not found {row[3]}***')
NEW Update 11:30am CST 以下是我的完整代码
所需的结果将是此格式的 txt 文件:
Logical ID: (&) 192.168.xx.xxx (if it has both)
192.168.xx.xxx
Logical ID:
192.168.xx.xxx
192.168.xx.xxx
Logical ID:
Logical ID:
192.168.xx.xxx
**最新代码-> 如果存在逻辑 ID,我想打印它,如果不存在,我希望它打印 IP 地址(到新表格)
代码显示我指定了一个 model.csv 来写入一个 model.txt 并且必须逐个模型手动更改它。因此,如果有解决方案,那也很好
import csv
import re
import sys
sys.stdout = open("C:\Users\ADMIN-SURV\Desktop\data_pull\2.0C-H4A-DC2 .txt", 'w')
with open('C:\Users\ADMIN-SURV\Desktop\data_pull\2.0C-H4A-DC2_filter.csv') as fid:
inputfile = csv.reader(fid)
for row in inputfile:
if len(row) >= 4:
if row[0] == 'File name':
# skip the header row
continue
m = re.match(r".*(.* [0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})", row[3])
if m:
print(m.group(1))
else:
print(row[3])
sys.stdout.close()
*** error parsing line: model not found H4SL-D1(2305854) Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown 192.168.50.250 00:18:85:***
*** error parsing line: model not found H4SL-D1(2878617) Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown 192.168.50.194 00:18:85:***
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:1026 *** error parsing line: model not found Unsupported SOUTH LV Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown 192.168.206.250 00:18:85:***
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:3027 *** error parsing line: model not found ELEVATOR GROUND FL Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown 192.168.80.203 00:18:85:***
当然,那只是return的一小部分。
我想补充一点,CSV 文件实际上是专门针对模型的,所以我实际上要做的就是从第一列中提取逻辑 ID,并将它们添加到定义模式下的列表中,如果没有附加逻辑 ID,然后 return 给定的 IP 地址。
我搜索了一个项目的所有实例并将结果保存为 CSV 文件。我正在使用 python 来尝试获取特定信息。我很想添加照片,但我不允许。
这是我收到的错误
Traceback (most recent call last): File "C:\Users\ADMIN-SURV\PycharmProjects\pdf_scraping\test_file.,py", line 7, in print(column[3]) IndexError: list index out of range
这是我唯一写过的代码:
import csv
inputfile = csv.reader(open('C:\Users\ADMIN-SURV\Desktop\data_pull\Untitled.csv','r'))
for column in inputfile:
print(column[3])
当我去掉最后一行的 [3] 就离开时
print(column)
它在控制台中打印我的整个 CSV 文件。我想要的只是每一行的特定信息,我可以通过从特定列中获取信息来获取该信息。
CSV 文件数据如下所示:
Search Results"
"Summary"
"Saved on","12/8/2021 1:57:21 PM"
"Searched for","Avigilon (ONVIF) 1.3C-H4SL-D1"
"In document","C:\Users\ADMIN-SURV\Desktop\data_pull\IslandView.pdf"
"Number of document(s) found","1"
"Number of instance(s) found","551"
"File name","Title","Page","Search Instance"
"IslandView.pdf","","5","Detection: Unsupported 2058 Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:2058 192.168.202.206 "
"IslandView.pdf","","9","BAR POS 01 Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown 192.168.70.214 00:18:85:"
"IslandView.pdf","","9","H4SL-D1(1866954) Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:481 192.168.11.203 "
"IslandView.pdf","","9","H4SL-D1(1825930) Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:20 192.168.16.203 "
"IslandView.pdf","","9","Detection: Unsupported 2048 Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown 192.168.202.200 00:18:85:"
"IslandView.pdf","","9","H4SL-D1(1866877) Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:92 192.168.15.205 "
"IslandView.pdf","","9","Detection: Unsupported 2074 Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:2074 192.168.203.241 "
"IslandView.pdf","","9","Detection: Unsupported 2174 Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:2174 192.168.201.232 "
"IslandView.pdf","","9","Detection: Unsupported 2161 Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:2161 192.168.205.231 "
有 500 多行。您可以看到 headers
列"File name","Title","Page","Search Instance"
我只需要第一列中的型号和逻辑 ID 信息。我想隔离然后制作一个有组织的列表,其中逻辑 ID 与哪个模型相关。
以防万一,这是第 1 列中的型号和逻辑 ID 示例
Avigilon (ONVIF) 1.3C-H4SL-D1 Logical ID: 875
最终目标是创建一个 sheet,其中列出了每个模型(这是一个模型的搜索结果),并在该列表下方列出了与该模型关联的所有逻辑 ID。
如果我可以澄清或提供任何进一步的信息,请告诉我。
谢谢!
CSV 文件的顶部包含少于 4 列的行。为避免 IndexError 首先尝试测试行长度:
# "inputfile" is a CSV reader instance
for row in inputfile:
if len(row) >= 4:
print(row[3])
这是一个使用正则表达式分解模型名称的变体,在 IP 地址处停止:
import re
with open('example_data.csv') as fid:
inputfile = csv.reader(fid)
for row in inputfile:
if len(row) >= 4:
if row[0] == 'File name':
# skip the header row
continue
m = re.match(r'.*(Avigilon.* [0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})', row[3])
if m:
print(m.group(1))
else:
print(f'*** error parsing line: model not found {row[3]}***')
以上数据打印:
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:2058 192.168.202.206
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown 192.168.70.214
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:481 192.168.11.203
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:20 192.168.16.203
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown 192.168.202.200
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:92 192.168.15.205
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:2074 192.168.203.241
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:2174 192.168.201.232
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:2161 192.168.205.231
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown 192.168.50.246
要将输出写入文本文件,请尝试如下操作:
with open('logfile.txt', 'w') as fout:
with open('example_data.csv') as fid:
inputfile = csv.reader(fid)
for row in inputfile:
if len(row) >= 4:
if row[0] == 'File name':
# skip the header row
continue
m = re.match(r'.*(Avigilon.* [0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})', row[3])
if m:
# optional: delete this print line
print(m.group(1))
fout.write(f'{m.group(1)}\n')
else:
print(f'*** error parsing line: model not found {row[3]}***')