将不同的文本文件合并到一个 Excel 文件中,并删除除第一个以外的所有文件的第一列
Merge different text files into one Excel file, and delete first column of all but first
我需要将不同的文本文件(.sav 文件)合并到一个 Excel 文件中(output.xls),但是从第二个输入文件开始,我想排除第一列在每个文件中。
我要获取的是下面的(带占位符数据,方便看行列)
文件 1
A1 A2 A3 A4
B1 B2 B3 B4
C1 C2 C3 C4
and so on
文件 2
X1 X2 X3 X4
Y1 Y2 Y3 Y4
Z1 Z2 Z3 Z4
and so on
文件输出
A1 A2 A3 A4 X2 X3 X4
B1 B2 B3 B4 Y2 Y3 Y4
C1 C2 C3 C4 Z2 Z3 Z4
and so on
这是我的代码。
import glob
filenames = glob.glob("*.sav")
filenames.sort()
with open('output.txt', 'w') as writer:
readers = [open(filename) for filename in filenames]
for lines in zip(*readers):
print(' '.join([line.strip() for line in lines]), file=writer)
import pandas as pd
df = pd.read_table('output.txt')
df.to_excel('output.csv', 'DATI', index=False, header=False)
import os
os.remove('output.txt')
但是,这会保留所有列。我如何省略不需要的那些?
应该像下面这样简单
from pathlib import Path
import pandas as pd
import numpy as np
fnames = Path("your-path").glob("*.sav")
first_df, *dfs = [pd.read_csv(f, sep="\t") for f in fnames]
dfs = [df.iloc[:, 1:] for df in dfs] # Drop first column
df = pd.concat([first_df, *dfs], axis=1)
df.to_excel("output.xlsx", index=False, header=False)
我需要将不同的文本文件(.sav 文件)合并到一个 Excel 文件中(output.xls),但是从第二个输入文件开始,我想排除第一列在每个文件中。
我要获取的是下面的(带占位符数据,方便看行列)
文件 1
A1 A2 A3 A4
B1 B2 B3 B4
C1 C2 C3 C4
and so on
文件 2
X1 X2 X3 X4
Y1 Y2 Y3 Y4
Z1 Z2 Z3 Z4
and so on
文件输出
A1 A2 A3 A4 X2 X3 X4
B1 B2 B3 B4 Y2 Y3 Y4
C1 C2 C3 C4 Z2 Z3 Z4
and so on
这是我的代码。
import glob
filenames = glob.glob("*.sav")
filenames.sort()
with open('output.txt', 'w') as writer:
readers = [open(filename) for filename in filenames]
for lines in zip(*readers):
print(' '.join([line.strip() for line in lines]), file=writer)
import pandas as pd
df = pd.read_table('output.txt')
df.to_excel('output.csv', 'DATI', index=False, header=False)
import os
os.remove('output.txt')
但是,这会保留所有列。我如何省略不需要的那些?
应该像下面这样简单
from pathlib import Path
import pandas as pd
import numpy as np
fnames = Path("your-path").glob("*.sav")
first_df, *dfs = [pd.read_csv(f, sep="\t") for f in fnames]
dfs = [df.iloc[:, 1:] for df in dfs] # Drop first column
df = pd.concat([first_df, *dfs], axis=1)
df.to_excel("output.xlsx", index=False, header=False)