如何组合不同大小的数据框......?
How does one combine a dataframe of different sizes....?
我正在尝试将项目列表合并到一个主数据框中,但我似乎无法弄清楚如何将它们合并在一起?我生成的框架大小不同,但大多数列名都是相同的,除了一两个....
所以基本上,我正在列出这样的项目阶段...(有些项目只有 2 或 3 个阶段,而其他项目有 8 或 9 个阶段..)
示例:
Stage 1 SUCCESS
stage 2 SUCCESS
stage 3 SUCCESS
stage 4 DELAYED
stage 5 PENDING
并且,我在 python 循环中生成了如下所示的数据帧...
df
project_name Stage 1 Stage 2
0 project 1 SUCCESS DELAYED
df
project_name Stage 1 Stage 2 Stage 3 Stage 4 Stage 5
0 project-2 NaN NaN NaN NaN NaN
df
project_name Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6 Stage 7 Stage 8
0 project-3 NaN NaN STARTED ABANDONED NaN NaN NaN
NaN
但是,我似乎无法弄清楚如何生成包含所有其他帧的主数据帧...
# items passed in from other function...
project_data = [('Stage 1','SUCCESS'),('Stage 2','DELAYED')]
project_name = 'project-x'
project_headers = ['Stage 1','Stage 2','Stage 3','Stage 4','Stage 5','Stage 6']
project_displayname = ''
# Create the pandas DataFrame
try:
df
except NameError:
print("Well, 'df' WASN'T defined after all!")
df = pd.DataFrame( columns = project_headers, index=['0'])
else:
df = df.reindex(list(range(0, 1))).reset_index(drop=True)
df['project_name'] = project_name
df.loc[df.project_name == project_name, "project"] = project_displayname
combined_frame = pd.DataFrame(columns = ['project_name']) # empty frame with one colum for merge
for details in project_data:
(item, item_status) = details
if item not in df:
df[item] = np.nan
df.loc[df.project_name == project_name, item] = item_status
print('')
print('')
print(df)
print('')
# Which gives us a generated dataframe.... like so...
#project_name Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6 Stage 7 Stage 8
#project-3 NaN NaN STARTED ABANDONED NaN NaN NaN NaN
#final_frame = combined_frame.merge(df, how='left')
try:
final_frame = pd.merge(df, combined_frame, how='outer', left_index=True, right_on=combined_frame.iloc[: , -1])
except IndexError:
final_frame = df.reindex_axis(df.columns.union(combined_frame.columns), axis=1)
print(final_frame)
当我 运行 代码时出现错误:Empty DataFrame
或者,我得到...
Columns: [project, project_name, Stage 1, Stage 2, Stage 3, Stage 4, Stage 5, Stage 6, Stage 7, Stage 8, Stage 9]
Index: []
或者我得到...
Columns: [project, project_name, Stage 1, Stage 2, Stage 3, Stage 4, Stage 5, Stage 6, Stage 7, Stage 8, Stage 9, project_x, project_name_x, Stage 1_x, Stage 2_x, Stage 3_x, Stage 4_x]
Index: []
有人能指出我方法中的错误吗?显然我错过了什么?
我想尝试获得这样的输出:
project_name Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6 Stage 7 Stage 8
0 project-1 STARTED NaN NaN NaN NaN NaN NaN NaN
1 project-2 STARTED STARTED STARTED DELAYED NaN NaN NaN NaN
2 project-3 NaN NaN STARTED ABANDONED NaN NaN NaN NaN
3 project-4 NaN NaN STARTED ABANDONED NaN STARTED NaN NaN
4 project-5 CANCELED NaN NaN NaN NaN NaN NaN NaN
5 project-6 DELAYED DELAYED STARTED ABANDONED NaN NaN STARTED NaN
提前致谢,
E
您可以根据 输入 数据轻松构建单独的框架:
# items passed in from other function...
project_data = [('Stage 1','SUCCESS'),('Stage 2','DELAYED')]
project_name = 'project-x'
project_headers = ['Stage 1','Stage 2','Stage 3','Stage 4','Stage 5','Stage 6']
project_displayname = ''
df = pd.DataFrame([dict(project_data)], columns = ['project','project_name']
+ project_headers)
df.loc[:, ['project', 'project_name']] = [[project_name, project_displayname]]
它将给 df
:
project project_name Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6
0 project-x SUCCESS DELAYED NaN NaN NaN NaN
然后您可以使用 pd.concat
连接所有单独的数据帧。唯一的限制是您必须事先知道所有列的名称(或此处的最大阶段数...)
我正在尝试将项目列表合并到一个主数据框中,但我似乎无法弄清楚如何将它们合并在一起?我生成的框架大小不同,但大多数列名都是相同的,除了一两个....
所以基本上,我正在列出这样的项目阶段...(有些项目只有 2 或 3 个阶段,而其他项目有 8 或 9 个阶段..) 示例:
Stage 1 SUCCESS
stage 2 SUCCESS
stage 3 SUCCESS
stage 4 DELAYED
stage 5 PENDING
并且,我在 python 循环中生成了如下所示的数据帧...
df
project_name Stage 1 Stage 2
0 project 1 SUCCESS DELAYED
df
project_name Stage 1 Stage 2 Stage 3 Stage 4 Stage 5
0 project-2 NaN NaN NaN NaN NaN
df
project_name Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6 Stage 7 Stage 8
0 project-3 NaN NaN STARTED ABANDONED NaN NaN NaN
NaN
但是,我似乎无法弄清楚如何生成包含所有其他帧的主数据帧...
# items passed in from other function...
project_data = [('Stage 1','SUCCESS'),('Stage 2','DELAYED')]
project_name = 'project-x'
project_headers = ['Stage 1','Stage 2','Stage 3','Stage 4','Stage 5','Stage 6']
project_displayname = ''
# Create the pandas DataFrame
try:
df
except NameError:
print("Well, 'df' WASN'T defined after all!")
df = pd.DataFrame( columns = project_headers, index=['0'])
else:
df = df.reindex(list(range(0, 1))).reset_index(drop=True)
df['project_name'] = project_name
df.loc[df.project_name == project_name, "project"] = project_displayname
combined_frame = pd.DataFrame(columns = ['project_name']) # empty frame with one colum for merge
for details in project_data:
(item, item_status) = details
if item not in df:
df[item] = np.nan
df.loc[df.project_name == project_name, item] = item_status
print('')
print('')
print(df)
print('')
# Which gives us a generated dataframe.... like so...
#project_name Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6 Stage 7 Stage 8
#project-3 NaN NaN STARTED ABANDONED NaN NaN NaN NaN
#final_frame = combined_frame.merge(df, how='left')
try:
final_frame = pd.merge(df, combined_frame, how='outer', left_index=True, right_on=combined_frame.iloc[: , -1])
except IndexError:
final_frame = df.reindex_axis(df.columns.union(combined_frame.columns), axis=1)
print(final_frame)
当我 运行 代码时出现错误:Empty DataFrame
或者,我得到...
Columns: [project, project_name, Stage 1, Stage 2, Stage 3, Stage 4, Stage 5, Stage 6, Stage 7, Stage 8, Stage 9]
Index: []
或者我得到...
Columns: [project, project_name, Stage 1, Stage 2, Stage 3, Stage 4, Stage 5, Stage 6, Stage 7, Stage 8, Stage 9, project_x, project_name_x, Stage 1_x, Stage 2_x, Stage 3_x, Stage 4_x]
Index: []
有人能指出我方法中的错误吗?显然我错过了什么?
我想尝试获得这样的输出:
project_name Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6 Stage 7 Stage 8
0 project-1 STARTED NaN NaN NaN NaN NaN NaN NaN
1 project-2 STARTED STARTED STARTED DELAYED NaN NaN NaN NaN
2 project-3 NaN NaN STARTED ABANDONED NaN NaN NaN NaN
3 project-4 NaN NaN STARTED ABANDONED NaN STARTED NaN NaN
4 project-5 CANCELED NaN NaN NaN NaN NaN NaN NaN
5 project-6 DELAYED DELAYED STARTED ABANDONED NaN NaN STARTED NaN
提前致谢,
E
您可以根据 输入 数据轻松构建单独的框架:
# items passed in from other function...
project_data = [('Stage 1','SUCCESS'),('Stage 2','DELAYED')]
project_name = 'project-x'
project_headers = ['Stage 1','Stage 2','Stage 3','Stage 4','Stage 5','Stage 6']
project_displayname = ''
df = pd.DataFrame([dict(project_data)], columns = ['project','project_name']
+ project_headers)
df.loc[:, ['project', 'project_name']] = [[project_name, project_displayname]]
它将给 df
:
project project_name Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6
0 project-x SUCCESS DELAYED NaN NaN NaN NaN
然后您可以使用 pd.concat
连接所有单独的数据帧。唯一的限制是您必须事先知道所有列的名称(或此处的最大阶段数...)