仅将员工 ID 复制到 excel sheet
Copy only Employee id to excel sheet
我有一个未格式化的数据包含在记事本文件中,如下所示。
#Civil
GROUP CIVIL RPatel66 LKohli12 m12 PSen72 m72
GROUP CIVIL SKumar22 ASekar32 m32 BSiva90
#Mechanical
GROUP MECHANICAL OKhan78 m78 MShah81 JKumar11
GROUP MECHANICAL VHiremath12 TVasu43 m43 NReddy21
#Electrical
GROUP ELECTRICAL LPathan88 SPatil56 m56 AParth33
GROUP ELECTRICAL HAnil45 m45 Khari67 m67 Skumar49
当我运行下面的代码
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
f = open('C:\Users\Kiran\Desktop\Input.txt', 'r+')
data = f.readlines()
spaces = ""
for i in range(len(data)):
row = data[i].split(" ")
ws.append(row)
wb.save("Output1.xlsx")
import openpyxl
book= openpyxl.load_workbook('Output1.xlsx')
sheet = book['Sheet']
sheet.delete_cols(1,2) #deletes Column 1 and 2
book.save("Output1.xlsx")
对于上面的内容,我遇到了错误,没有得到我需要的输出。
我需要在 excel sheet 中输出,如图所示 below.I 需要在 excel sheet 中输出 Eg:Rpatel66,LKohli12 等它不应该包含 m12,m72
RPatel66
LKohli12
PSen72
SKumar22
ASekar32
BSiva90
OKhan78
MShah81
JKumar11
VHiremath12
TVasu43
NReddy21
LPathan88
SPatil56
AParth33
HAnil45
Khari67
Skumar49
请参考以下代码以获得您的查询所需的输出。假设数据存在于
import re
import pandas as pd
with open("<your-file-name.txt>",'r') as f:
content=f.readlines()
content = [x for x in content if not x.startswith('#')]
temp_content_1=list(map(lambda x: x.replace('GROUP','').replace('MECHANICAL','').replace('CIVIL','').replace('ELECTRICAL','').strip(), content))
temp_content_2=list(map(lambda x: re.sub(' m\d+','',x), temp_content_1))
final=' '.join(temp_content_2).split()
df=pd.DataFrame({"Employee":final})
df.to_excel("<your-output-file-name.xlsx>", index=None)
备用解决方案
import re
import pandas as pd
with open("<your-file-name.txt>",'r') as f:
content=f.readlines()
temp_content_1=list(map(lambda x: re.sub(' m\d+','',x), content))
temp_content_2=' '.join(temp_content_1)
final=re.findall(r'\w+\d+', temp_content_2)
df=pd.DataFrame({"Employee":final})
df.to_excel("<your-output-file-name.xlsx>", index=None)
我有一个未格式化的数据包含在记事本文件中,如下所示。
#Civil
GROUP CIVIL RPatel66 LKohli12 m12 PSen72 m72
GROUP CIVIL SKumar22 ASekar32 m32 BSiva90
#Mechanical
GROUP MECHANICAL OKhan78 m78 MShah81 JKumar11
GROUP MECHANICAL VHiremath12 TVasu43 m43 NReddy21
#Electrical
GROUP ELECTRICAL LPathan88 SPatil56 m56 AParth33
GROUP ELECTRICAL HAnil45 m45 Khari67 m67 Skumar49
当我运行下面的代码
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
f = open('C:\Users\Kiran\Desktop\Input.txt', 'r+')
data = f.readlines()
spaces = ""
for i in range(len(data)):
row = data[i].split(" ")
ws.append(row)
wb.save("Output1.xlsx")
import openpyxl
book= openpyxl.load_workbook('Output1.xlsx')
sheet = book['Sheet']
sheet.delete_cols(1,2) #deletes Column 1 and 2
book.save("Output1.xlsx")
对于上面的内容,我遇到了错误,没有得到我需要的输出。
我需要在 excel sheet 中输出,如图所示 below.I 需要在 excel sheet 中输出 Eg:Rpatel66,LKohli12 等它不应该包含 m12,m72
RPatel66
LKohli12
PSen72
SKumar22
ASekar32
BSiva90
OKhan78
MShah81
JKumar11
VHiremath12
TVasu43
NReddy21
LPathan88
SPatil56
AParth33
HAnil45
Khari67
Skumar49
请参考以下代码以获得您的查询所需的输出。假设数据存在于
import re
import pandas as pd
with open("<your-file-name.txt>",'r') as f:
content=f.readlines()
content = [x for x in content if not x.startswith('#')]
temp_content_1=list(map(lambda x: x.replace('GROUP','').replace('MECHANICAL','').replace('CIVIL','').replace('ELECTRICAL','').strip(), content))
temp_content_2=list(map(lambda x: re.sub(' m\d+','',x), temp_content_1))
final=' '.join(temp_content_2).split()
df=pd.DataFrame({"Employee":final})
df.to_excel("<your-output-file-name.xlsx>", index=None)
备用解决方案
import re
import pandas as pd
with open("<your-file-name.txt>",'r') as f:
content=f.readlines()
temp_content_1=list(map(lambda x: re.sub(' m\d+','',x), content))
temp_content_2=' '.join(temp_content_1)
final=re.findall(r'\w+\d+', temp_content_2)
df=pd.DataFrame({"Employee":final})
df.to_excel("<your-output-file-name.xlsx>", index=None)