一起使用 Pandas 和 xlrd。忽略列 headers 的 absence/presence
Using Pandas and xlrd together. Ignoring absence/presence of column headers
我希望你能帮助我 - 我相信这可能是一件小事,如果有人知道如何解决的话。
在我的车间,我和我的同事都不能通过我们数据库的 front-end 进行 'find and replace all' 更改。老板只是拒绝了我们那个级别的访问权限。如果我们需要更改数十条或数百条记录,则必须全部通过 copy-and-paste 或类似方式完成。疯狂。
我正在尝试使用 Python 2 来解决这个问题,尤其是 Pandas、pyautogui 和 xlrd 等库。
我研究了 serval Whosebug 线程,到目前为止已经成功编写了一些代码,可以很好地读取给定的 XL 文件。在生产中,这将是从数据库 GUI 中找到的数据集导出的文件 front-end 并且对于计算机工作室中的项目来说只是 'Article Numbers' 的一列。这将始终有一个 Excel 列 header。例如
我们还可以使用红外扫描仪将物品扫描到 iPad 上的 'Workflow' 应用程序,并自动从扫描的物品列表中生成 XL 文件。
此处的 XL 文件可能与此类似。
不同的是没有列header。所有 XL 文件在“Sheet1”的单元格 A1 中都有其数据 'anchored',并且将再次使用单列。这里没有不必要的复杂化!
无论如何,这是脚本。当它完全工作时,系统参数将提供给它。现在,假设我们需要更改记录以将其 'RAM' 值从
到 "2 GB"
import xlrd
import string
import re
import pandas as pd
field = "RAM"
value = "2 GB"
myFile = "/Users/me/folder/testArticles.xlsx"
df = pd.read_excel(myFile)
myRegex = "^[0-9]{5}$"
# data collection and putting into lists.
workbook = xlrd.open_workbook(myFile)
sheet = workbook.sheet_by_index(0)
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
formatted = []
deDuped = []
# removing any possible XL headers, setting all values to strings
# that look like five-digit ints, apply a regex to be sure.
for i in data:
cellValue = str(i)
cellValue = cellValue.translate(None, '\'[u]\'')
# remove the decimal point
# Searching for the header will cause a database front-end problem.
cellValue = cellValue[:-2]
cellValue = cellValue.translate(None, string.letters)
# making sure only valid article numbers get through
# blank rows etc can take a hike
if len(cellValue) != 0:
if re.match(myRegex, cellValue):
# weeding out any possilbe dupes.
for i in formatted:
if i not in deDuped:
#main code block
for i in deDuped:
#lots going on here involving pyauotgui
#making sure of no error running searches, checking for warnings, moving/tabbing around DB front-end etc
#if all goes to plan
#removing that record number from the excel file and saving the change
#so that if we run the script again for the same XL file
#we don't needlessly update an already OK record again.
df = df[~df['ANR'].astype(str).str.startswith(i)]
df.to_excel(myFile, index=False)
我真正想知道的是我如何 运行 脚本,以便“不关心”是否存在列 header。
df = df[~df['ANR'].astype(str).str.startswith(i)]
如果列 header,("ANR") 在我的例子中,对于这个特定的 'pandas' 方法是必不可少的,是否有 straight-forward 插入列的方法 header 转换成一个 XL 文件,如果它首先缺少 XL 文件 - 即来自红外扫描仪的 XL 文件和 iPad 上的 'Workflow' 应用程序?
我已经尝试按照 Patrick 的建议实施一些代码来检查单元格 "A1" 是否有 header。部分成功。如果它丢失了,我可以将 "ANR" 放在单元格 A1 中,但我首先丢失了那里的任何内容。
import xlwt
from openpyxl import Workbook, load_workbook
from xlutils.copy import copy
import openpyxl
# data collection
workbook = xlrd.open_workbook(myFile)
sheet = workbook.sheet_by_index(0)
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
cell_a1 = sheet.cell_value(rowx=0, colx=0)
if cell_a1 == "ANR":
print "has header"
wb = openpyxl.load_workbook(filename= myFile)
ws = wb['Sheet1']
ws['A1'] = "ANE"
#re-open XL file again etc etc.
我在 writing to existing workbook using xlwt 找到了这个新代码块。在这种情况下,贡献者实际上使用了 openpyxl。
仍然有点乱,但似乎可以正常工作。添加了 'if/else' 子句以检查单元格 A1 的值并采取相应的操作。在 找到了大部分代码 - 使用 openpyxl
import pyperclip
import xlrd
import pyautogui
import string
import re
import os
import pandas as pd
import xlwt
from openpyxl import Workbook, load_workbook
from xlutils.copy import copy
field = "RAM"
value = "2 GB"
myFile = "/Users/me/testSerials.xlsx"
df = pd.read_excel(myFile)
myRegex = "^[0-9]{5}$"
# data collection
workbook = xlrd.open_workbook(myFile)
sheet = workbook.sheet_by_index(0)
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
cell_a1 = sheet.cell_value(rowx=0, colx=0)
if cell_a1 == "ANR":
print "has header"
headers = ['ANR']
workbook_name = 'myFile'
wb = Workbook()
page = wb.active
# page.title = 'companies'
page.append(headers) # write the headers to the first line
workbook = xlrd.open_workbook(workbook_name)
sheet = workbook.sheet_by_index(0)
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
for records in data:
#then load the data all over again, this time with inserted header
workbook = xlrd.open_workbook(myFile)
sheet = workbook.sheet_by_index(0)
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
formatted = []
deDuped = []
# removing any possible XL headers, setting all values to strings that look like five-digit ints, apply a regex to be sure.
for i in data:
cellValue = str(i)
cellValue = cellValue.translate(None, '\'[u]\'')
# remove the decimal point
cellValue = cellValue[:-2]
# cellValue = cellValue.translate(None, ".0")
cellValue = cellValue.translate(None, string.letters)
# making sure any valid ANRs get through
if len(cellValue) != 0:
if re.match(myRegex, cellValue):
# ------------------------------------------
# weeding out any possilbe dupes.
for i in formatted:
if i not in deDuped:
# ref -
df = pd.read_excel(myFile)
print df
for i in deDuped:
#pyautogui code is run here...
#if all goes to plan update the XL file
df = df[~df['ANR'].astype(str).str.startswith(i)]
df.to_excel(myFile, index=False)
我希望你能帮助我 - 我相信这可能是一件小事,如果有人知道如何解决的话。
在我的车间,我和我的同事都不能通过我们数据库的 front-end 进行 'find and replace all' 更改。老板只是拒绝了我们那个级别的访问权限。如果我们需要更改数十条或数百条记录,则必须全部通过 copy-and-paste 或类似方式完成。疯狂。
我正在尝试使用 Python 2 来解决这个问题,尤其是 Pandas、pyautogui 和 xlrd 等库。
我研究了 serval Whosebug 线程,到目前为止已经成功编写了一些代码,可以很好地读取给定的 XL 文件。在生产中,这将是从数据库 GUI 中找到的数据集导出的文件 front-end 并且对于计算机工作室中的项目来说只是 'Article Numbers' 的一列。这将始终有一个 Excel 列 header。例如
所有记录编号均为5位数字。 我们还可以使用红外扫描仪将物品扫描到 iPad 上的 'Workflow' 应用程序,并自动从扫描的物品列表中生成 XL 文件。
此处的 XL 文件可能与此类似。
不同的是没有列header。所有 XL 文件在“Sheet1”的单元格 A1 中都有其数据 'anchored',并且将再次使用单列。这里没有不必要的复杂化!
无论如何,这是脚本。当它完全工作时,系统参数将提供给它。现在,假设我们需要更改记录以将其 'RAM' 值从
到 "2 GB"
import xlrd
import string
import re
import pandas as pd
field = "RAM"
value = "2 GB"
myFile = "/Users/me/folder/testArticles.xlsx"
df = pd.read_excel(myFile)
myRegex = "^[0-9]{5}$"
# data collection and putting into lists.
workbook = xlrd.open_workbook(myFile)
sheet = workbook.sheet_by_index(0)
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
formatted = []
deDuped = []
# removing any possible XL headers, setting all values to strings
# that look like five-digit ints, apply a regex to be sure.
for i in data:
cellValue = str(i)
cellValue = cellValue.translate(None, '\'[u]\'')
# remove the decimal point
# Searching for the header will cause a database front-end problem.
cellValue = cellValue[:-2]
cellValue = cellValue.translate(None, string.letters)
# making sure only valid article numbers get through
# blank rows etc can take a hike
if len(cellValue) != 0:
if re.match(myRegex, cellValue):
# weeding out any possilbe dupes.
for i in formatted:
if i not in deDuped:
#main code block
for i in deDuped:
#lots going on here involving pyauotgui
#making sure of no error running searches, checking for warnings, moving/tabbing around DB front-end etc
#if all goes to plan
#removing that record number from the excel file and saving the change
#so that if we run the script again for the same XL file
#we don't needlessly update an already OK record again.
df = df[~df['ANR'].astype(str).str.startswith(i)]
df.to_excel(myFile, index=False)
我真正想知道的是我如何 运行 脚本,以便“不关心”是否存在列 header。
df = df[~df['ANR'].astype(str).str.startswith(i)]
如果列 header,("ANR") 在我的例子中,对于这个特定的 'pandas' 方法是必不可少的,是否有 straight-forward 插入列的方法 header 转换成一个 XL 文件,如果它首先缺少 XL 文件 - 即来自红外扫描仪的 XL 文件和 iPad 上的 'Workflow' 应用程序?
我已经尝试按照 Patrick 的建议实施一些代码来检查单元格 "A1" 是否有 header。部分成功。如果它丢失了,我可以将 "ANR" 放在单元格 A1 中,但我首先丢失了那里的任何内容。
import xlwt
from openpyxl import Workbook, load_workbook
from xlutils.copy import copy
import openpyxl
# data collection
workbook = xlrd.open_workbook(myFile)
sheet = workbook.sheet_by_index(0)
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
cell_a1 = sheet.cell_value(rowx=0, colx=0)
if cell_a1 == "ANR":
print "has header"
wb = openpyxl.load_workbook(filename= myFile)
ws = wb['Sheet1']
ws['A1'] = "ANE"
#re-open XL file again etc etc.
我在 writing to existing workbook using xlwt 找到了这个新代码块。在这种情况下,贡献者实际上使用了 openpyxl。
仍然有点乱,但似乎可以正常工作。添加了 'if/else' 子句以检查单元格 A1 的值并采取相应的操作。在
import pyperclip
import xlrd
import pyautogui
import string
import re
import os
import pandas as pd
import xlwt
from openpyxl import Workbook, load_workbook
from xlutils.copy import copy
field = "RAM"
value = "2 GB"
myFile = "/Users/me/testSerials.xlsx"
df = pd.read_excel(myFile)
myRegex = "^[0-9]{5}$"
# data collection
workbook = xlrd.open_workbook(myFile)
sheet = workbook.sheet_by_index(0)
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
cell_a1 = sheet.cell_value(rowx=0, colx=0)
if cell_a1 == "ANR":
print "has header"
headers = ['ANR']
workbook_name = 'myFile'
wb = Workbook()
page = wb.active
# page.title = 'companies'
page.append(headers) # write the headers to the first line
workbook = xlrd.open_workbook(workbook_name)
sheet = workbook.sheet_by_index(0)
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
for records in data:
#then load the data all over again, this time with inserted header
workbook = xlrd.open_workbook(myFile)
sheet = workbook.sheet_by_index(0)
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
formatted = []
deDuped = []
# removing any possible XL headers, setting all values to strings that look like five-digit ints, apply a regex to be sure.
for i in data:
cellValue = str(i)
cellValue = cellValue.translate(None, '\'[u]\'')
# remove the decimal point
cellValue = cellValue[:-2]
# cellValue = cellValue.translate(None, ".0")
cellValue = cellValue.translate(None, string.letters)
# making sure any valid ANRs get through
if len(cellValue) != 0:
if re.match(myRegex, cellValue):
# ------------------------------------------
# weeding out any possilbe dupes.
for i in formatted:
if i not in deDuped:
# ref -
df = pd.read_excel(myFile)
print df
for i in deDuped:
#pyautogui code is run here...
#if all goes to plan update the XL file
df = df[~df['ANR'].astype(str).str.startswith(i)]
df.to_excel(myFile, index=False)