select 文件中的列并以文件名作为列名写入新文件
select column from files and write to new file with file name as column name
我的数据集格式如下:
Theta DeltaD DeltaS Lambda Rho LogLik
1 0.0137060718 0.0378903969 0.4939959667 0.3795642767 0.57232859 -963.7743175455
2 0.0137060718 0.0378903969 0.4951036519 0.3795642767 0.57232859 -963.745770314
3 0.0136703063 0.038522257 0.4807565701 0.3551424944 0.5639182313 -964.5802838333
4 0.0136703063 0.0382752067 0.4597773216 0.3551424944 0.5621381788 -963.0634821126
5 0.0136703063 0.0377739624 0.4597773216 0.3486538546 0.5552092482 -963.315982188
6 0.0136119461 0.0359108581 0.4597773216 0.3486538546 0.5552092482 -963.5321138251
7 0.0136119461 0.0374395068 0.4597773216 0.3582883699 0.5862608093 -963.3432259866
8 0.0136119461 0.0374395068 0.4597773216 0.3582883699 0.5862608093 -963.3432259866
9 0.0136119461 0.0383243243 0.4597773216 0.3582883699 0.5862608093 -963.288725532
10 0.0136119461 0.0383243243 0.467850463 0.3582883699 0.5862608093 -963.058588502
我想从每个文件中 select 列 DeltaS,并将输出保存为 csv 或任何分隔格式,但文件名作为列名。
我想出了一个代码如下:
import glob
import numpy
import pandas as pd
import csv
outfile = open("final_DeltaS",'w')
list_of_files = []
for name in glob.glob('*iter.csv'):
list_of_files.append(name)
def fileinput(files):
for f in files:
df = pd.read_csv(f)
print f, df["DeltaS"]
fileinput(list_of_files)
但是我正在研究如何从此循环输出数据:x
预期输出:
File_1 File_2
0.0378903969 0.4939959667
0.0378903969 0.4951036519
0.038522257 0.4807565701
0.0382752067 0.4597773216
0.0377739624 0.4597773216
0.0359108581 0.4597773216
0.0374395068 0.4597773216
0.0374395068 0.4597773216
0.0383243243 0.4597773216
0.0383243243 0.467850463
IIUC 那么以下应该有效:
df_col_list = []
def fileinput(files):
for f in files:
df = pd.read_csv(f, usecols=['DeltaS'])
df.rename(columns={'DeltaS':f}, inplace=True)
df_col_list.append(df)
concat = pd.concat(df_col_list, axis = 1)
concat.to_csv(your_output_path)
您可能需要将文件名删除为您真正想要的,但这很简单
我的数据集格式如下:
Theta DeltaD DeltaS Lambda Rho LogLik
1 0.0137060718 0.0378903969 0.4939959667 0.3795642767 0.57232859 -963.7743175455
2 0.0137060718 0.0378903969 0.4951036519 0.3795642767 0.57232859 -963.745770314
3 0.0136703063 0.038522257 0.4807565701 0.3551424944 0.5639182313 -964.5802838333
4 0.0136703063 0.0382752067 0.4597773216 0.3551424944 0.5621381788 -963.0634821126
5 0.0136703063 0.0377739624 0.4597773216 0.3486538546 0.5552092482 -963.315982188
6 0.0136119461 0.0359108581 0.4597773216 0.3486538546 0.5552092482 -963.5321138251
7 0.0136119461 0.0374395068 0.4597773216 0.3582883699 0.5862608093 -963.3432259866
8 0.0136119461 0.0374395068 0.4597773216 0.3582883699 0.5862608093 -963.3432259866
9 0.0136119461 0.0383243243 0.4597773216 0.3582883699 0.5862608093 -963.288725532
10 0.0136119461 0.0383243243 0.467850463 0.3582883699 0.5862608093 -963.058588502
我想从每个文件中 select 列 DeltaS,并将输出保存为 csv 或任何分隔格式,但文件名作为列名。
我想出了一个代码如下:
import glob
import numpy
import pandas as pd
import csv
outfile = open("final_DeltaS",'w')
list_of_files = []
for name in glob.glob('*iter.csv'):
list_of_files.append(name)
def fileinput(files):
for f in files:
df = pd.read_csv(f)
print f, df["DeltaS"]
fileinput(list_of_files)
但是我正在研究如何从此循环输出数据:x 预期输出:
File_1 File_2
0.0378903969 0.4939959667
0.0378903969 0.4951036519
0.038522257 0.4807565701
0.0382752067 0.4597773216
0.0377739624 0.4597773216
0.0359108581 0.4597773216
0.0374395068 0.4597773216
0.0374395068 0.4597773216
0.0383243243 0.4597773216
0.0383243243 0.467850463
IIUC 那么以下应该有效:
df_col_list = []
def fileinput(files):
for f in files:
df = pd.read_csv(f, usecols=['DeltaS'])
df.rename(columns={'DeltaS':f}, inplace=True)
df_col_list.append(df)
concat = pd.concat(df_col_list, axis = 1)
concat.to_csv(your_output_path)
您可能需要将文件名删除为您真正想要的,但这很简单