合并两个数据框
Merge two data frames
我尝试通过将第二个 df 的第一行添加到第一个 df 的第一行来合并两个数据框。我也尝试将它们连接起来,但都失败了。
数据格式为
1,3,N0128,Durchm.,5.0,0.1,5.0760000000000005,0.076,-----****--
2,0.000,,,,,,,
3,3,N0129,Position,62.2,0.376,62.238,0.136,***---
4,76.1,-36.000,0.300,-36.057,,,,
5,2,N0130,Durchm.,5.0,0.1,5.067,0.067,-----***---
6,0.000,,,,,,,
预期的输出格式应该是
1,3,N0128,Durchm.,5.0,0.1,5.0760000000000005,0.076,-----****--,0.000,,,,,,,
2,3,N0129,Position,62.2,0.376,62.238,0.136,***---**,76.1,-36.000,0.300,-36.057,,,,
3,N0130,Durchm.,5.0,0.1,5.067,0.067,-----***---,0.000,,,,,,,
我已经将上面的数据帧拆分为两个帧。第一个只包含奇数索引,第二个包含偶数索引。
我现在的问题是,通过将第二个 df 的第一行添加到第一个 df 的第一行,merge/concatenate 两个帧。我已经尝试了 merging/concatenating 的一些方法,但都失败了。所有的打印功能都不是必需的,我只是用它们在控制台中快速浏览一下。
我觉得最舒服的代码是:
os.chdir(output)
csv_files = os.listdir('.')
for csv_file in (csv_files):
if csv_file.endswith(".asc.csv"):
df = pd.read_csv(csv_file)
keep_col = ['Messpunkt', 'Zeichnungspunkt', 'Eigenschaft', 'Position', 'Sollmass','Toleranz','Abweichung','Lage']
new_df = df[keep_col]
new_df = new_df[~new_df['Messpunkt'].isin(['**Teil'])]
new_df = new_df[~new_df['Messpunkt'].isin(['**KS-Oben'])]
new_df = new_df[~new_df['Messpunkt'].isin(['**KS-Unten'])]
new_df = new_df[~new_df['Messpunkt'].isin(['**N'])]
print(new_df)
new_df.to_csv(output+csv_file)
df1 = new_df[new_df.index % 2 ==1]
df2 = new_df[new_df.index % 2 ==0]
df1.reset_index()
df2.reset_index()
print (df1)
print (df2)
merge_df = pd.concat([df1,df2], axis=1)
print (merge_df)
merge_df.to_csv(output+csv_file)
非常感谢您的帮助。
使用此代码,输出为:
1,3,N0128,Durchm.,5.0,0.1,5.0760000000000005,0.076,-----****--,,,,,,,,
2,,,,,,,,,0.000,,,,,,,
3,3,N0129,Position,62.2,0.376,62.238,0.136,***---,,,,,,,,
4,,,,,,,,,76.1,-36.000,0.300,-36.057,,,,
5,2,N0130,Durchm.,5.0,0.1,5.067,0.067,-----***---,,,,,,,,
6,,,,,,,,,0.000,,,,,,,
当我使用 reset_index()
在两个 DataFrame 中使用相同的索引时,我得到了预期的结果。
可能还需要drop=True
跳过索引作为新列
pd.concat([df1.reset_index(drop=True), df2.reset_index(drop=True)], axis=1)
最简单的工作示例。
我使用io
只是为了模拟内存中的文件。
text = '''1,3,N0128,Durchm.,5.0,0.1,5.0760000000000005,0.076,-----****--
2,0.000,,,,,,,
3,3,N0129,Position,62.2,0.376,62.238,0.136,***---
4,76.1,-36.000,0.300,-36.057,,,,
5,2,N0130,Durchm.,5.0,0.1,5.067,0.067,-----***---
6,0.000,,,,,,,'''
import pandas as pd
import io
pd.options.display.max_columns = 20 # to display all columns
df = pd.read_csv(io.StringIO(text), header=None, index_col=0)
#print(df)
df1 = df[df.index % 2 == 1] # .reset_index(drop=True)
df2 = df[df.index % 2 == 0] # .reset_index(drop=True)
#print(df1)
#print(df2)
merge_df = pd.concat([df1.reset_index(drop=True), df2.reset_index(drop=True)], axis=1)
print(merge_df)
结果:
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
0 3.0 N0128 Durchm. 5.0 0.100 5.076 0.076 -----****-- 0.0 NaN NaN NaN NaN NaN NaN NaN
1 3.0 N0129 Position 62.2 0.376 62.238 0.136 ***--- 76.1 -36.000 0.300 -36.057 NaN NaN NaN NaN
2 2.0 N0130 Durchm. 5.0 0.100 5.067 0.067 -----***--- 0.0 NaN NaN NaN NaN NaN NaN NaN
编辑:
可能需要
merge_df.index = merge_df.index + 1
修正索引。
我尝试通过将第二个 df 的第一行添加到第一个 df 的第一行来合并两个数据框。我也尝试将它们连接起来,但都失败了。 数据格式为
1,3,N0128,Durchm.,5.0,0.1,5.0760000000000005,0.076,-----****--
2,0.000,,,,,,,
3,3,N0129,Position,62.2,0.376,62.238,0.136,***---
4,76.1,-36.000,0.300,-36.057,,,,
5,2,N0130,Durchm.,5.0,0.1,5.067,0.067,-----***---
6,0.000,,,,,,,
预期的输出格式应该是
1,3,N0128,Durchm.,5.0,0.1,5.0760000000000005,0.076,-----****--,0.000,,,,,,,
2,3,N0129,Position,62.2,0.376,62.238,0.136,***---**,76.1,-36.000,0.300,-36.057,,,,
3,N0130,Durchm.,5.0,0.1,5.067,0.067,-----***---,0.000,,,,,,,
我已经将上面的数据帧拆分为两个帧。第一个只包含奇数索引,第二个包含偶数索引。 我现在的问题是,通过将第二个 df 的第一行添加到第一个 df 的第一行,merge/concatenate 两个帧。我已经尝试了 merging/concatenating 的一些方法,但都失败了。所有的打印功能都不是必需的,我只是用它们在控制台中快速浏览一下。 我觉得最舒服的代码是:
os.chdir(output)
csv_files = os.listdir('.')
for csv_file in (csv_files):
if csv_file.endswith(".asc.csv"):
df = pd.read_csv(csv_file)
keep_col = ['Messpunkt', 'Zeichnungspunkt', 'Eigenschaft', 'Position', 'Sollmass','Toleranz','Abweichung','Lage']
new_df = df[keep_col]
new_df = new_df[~new_df['Messpunkt'].isin(['**Teil'])]
new_df = new_df[~new_df['Messpunkt'].isin(['**KS-Oben'])]
new_df = new_df[~new_df['Messpunkt'].isin(['**KS-Unten'])]
new_df = new_df[~new_df['Messpunkt'].isin(['**N'])]
print(new_df)
new_df.to_csv(output+csv_file)
df1 = new_df[new_df.index % 2 ==1]
df2 = new_df[new_df.index % 2 ==0]
df1.reset_index()
df2.reset_index()
print (df1)
print (df2)
merge_df = pd.concat([df1,df2], axis=1)
print (merge_df)
merge_df.to_csv(output+csv_file)
非常感谢您的帮助。
使用此代码,输出为:
1,3,N0128,Durchm.,5.0,0.1,5.0760000000000005,0.076,-----****--,,,,,,,,
2,,,,,,,,,0.000,,,,,,,
3,3,N0129,Position,62.2,0.376,62.238,0.136,***---,,,,,,,,
4,,,,,,,,,76.1,-36.000,0.300,-36.057,,,,
5,2,N0130,Durchm.,5.0,0.1,5.067,0.067,-----***---,,,,,,,,
6,,,,,,,,,0.000,,,,,,,
当我使用 reset_index()
在两个 DataFrame 中使用相同的索引时,我得到了预期的结果。
可能还需要drop=True
跳过索引作为新列
pd.concat([df1.reset_index(drop=True), df2.reset_index(drop=True)], axis=1)
最简单的工作示例。
我使用io
只是为了模拟内存中的文件。
text = '''1,3,N0128,Durchm.,5.0,0.1,5.0760000000000005,0.076,-----****--
2,0.000,,,,,,,
3,3,N0129,Position,62.2,0.376,62.238,0.136,***---
4,76.1,-36.000,0.300,-36.057,,,,
5,2,N0130,Durchm.,5.0,0.1,5.067,0.067,-----***---
6,0.000,,,,,,,'''
import pandas as pd
import io
pd.options.display.max_columns = 20 # to display all columns
df = pd.read_csv(io.StringIO(text), header=None, index_col=0)
#print(df)
df1 = df[df.index % 2 == 1] # .reset_index(drop=True)
df2 = df[df.index % 2 == 0] # .reset_index(drop=True)
#print(df1)
#print(df2)
merge_df = pd.concat([df1.reset_index(drop=True), df2.reset_index(drop=True)], axis=1)
print(merge_df)
结果:
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
0 3.0 N0128 Durchm. 5.0 0.100 5.076 0.076 -----****-- 0.0 NaN NaN NaN NaN NaN NaN NaN
1 3.0 N0129 Position 62.2 0.376 62.238 0.136 ***--- 76.1 -36.000 0.300 -36.057 NaN NaN NaN NaN
2 2.0 N0130 Durchm. 5.0 0.100 5.067 0.067 -----***--- 0.0 NaN NaN NaN NaN NaN NaN NaN
编辑:
可能需要
merge_df.index = merge_df.index + 1
修正索引。