带有标记目标的线的动态直方图子图
Dynamic histogram subplots with line to mark target
我一直在尝试发布一些类似的解决方案,但运气不好。
我正在尝试获取制造过程中所有 Step No
的 Cost
的直方图。每个部分都有不同数量的步骤,所以我想为每个部分在一个 plot/image 上有一组直方图。
在我的真实数据中有很多部分,所以如果这可以遍历许多部分并保存图表,那将是理想的。
此外,我们有一个目标成本,我想覆盖在直方图上的每个步骤。这在一个单独的数据框中表示。我卡在了子图的循环中,所以我还没有尝试这个。
下面是我能找到的每一步直方图应该是什么样子的最接近的内容:
到目前为止,这是我的代码:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_excel('Dist_Example.xlsx')
df1 = df[~df['Cost Type'].isin(['Material'])]
number_of_subplots = len(df1['Step No'].unique())
steps = df1['Step No'].unique()
fig, axs = plt.subplots(1, number_of_subplots, sharey = True, tight_layout=True)
for step in steps:
df2 = df1[df1['Step No'].isin([step])]
axs[step].hist(df2['Cost'])
plt.show()
在此先感谢您对我的帮助!
这里是 Target Cost
我想在直方图上显示为垂直线:
PartNo StepNo TargetCost
ABC 10 12
ABC 20 20
ABC 30 13
下面是一些样本历史数据,它们应该在直方图中的 bin 中:
PartNo SerialNo StepNo CostType Cost
ABC 123 10 Labor 11
ABC 123 10 Material 16
ABC 456 10 Labor 21
ABC 456 10 Material 26
ABC 789 10 Labor 21
ABC 789 10 Material 16
ABC 1011 10 Labor 11
ABC 1011 10 Material 6
ABC 1112 10 Labor 1
ABC 1112 10 Material -4
ABC 123 20 Labor 11
ABC 123 20 Material 19
ABC 456 20 Labor 24
ABC 456 20 Material 29
ABC 789 20 Labor 24
ABC 789 20 Material 19
ABC 1011 20 Labor 14
ABC 1011 20 Material 9
ABC 1112 20 Labor 4
ABC 1112 20 Material -1
ABC 123 30 Labor 11
ABC 123 30 Material 13
ABC 456 30 Labor 18
ABC 456 30 Material 23
ABC 789 30 Labor 18
ABC 789 30 Material 13
ABC 1011 30 Labor 8
ABC 1011 30 Material 3
ABC 1112 30 Labor -2
ABC 1112 30 Material -7
第二个样本数据集:
PartNo SerialNo StepNo CostType Cost
DEF Aplha 10 Labor 2
DEF Zed 10 Labor 3
DEF Kelly 10 Labor 4
DEF Aplha 20 Labor 3
DEF Zed 20 Labor 2
DEF Kelly 20 Labor 5
DEF Aplha 30 Labor 6
DEF Zed 30 Labor 7
DEF Kelly 30 Labor 5
DEF Aplha 40 Labor 3
DEF Zed 40 Labor 4
DEF Kelly 40 Labor 2
DEF Aplha 50 Labor 8
DEF Zed 50 Labor 9
DEF Kelly 50 Labor 7
您找不到可以直接为您的数据集解决此问题的直方图函数。您需要以适合您需要的方式聚合数据,然后用条形图表示您的发现。
我发现您的 objective 和数据有点令人困惑,但我想我已经根据这些假设弄清楚了您的目标:
- 您想汇总每个 StepNo 的成本
- 成本类型无关
- 必须计算 Target 总成本,因为您要汇总每个 StepNo 中的所有成本。
剧情
编辑
这不是 OP 想要的。经过一番反复之后,我们找到了一个似乎可行的解决方案
(from the question) I am trying to get histograms for Cost for all the Step No
(from a comment) I actually want to have a historgram for the sum of the cost per serial no in each step.
由于您必须在直方图中的 y 轴上具有 count
或频率,因此您必须以某种有意义的方式聚合数据。下面您将看到每个 SerialNO 在每个步骤的总成本的选择箱数的计数。
结果:
代码:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.pyplot as plt
import pylab
# Load data in two steps:
# df1 = pd.read_clipboard(sep='\s+')
# Part No Serial No Step No Cost Type Cost
# ABC 123 10 Labor 11
# ABC 123 10 Material 16
# ABC 456 10 Labor 21
# ABC 456 10 Material 26
# ...
# df2 = pd.read_clipboard(sep='\s+')
# Part No Step No Target Cost
# ABC 10 12
# ABC 20 20
# ABC 30 13
# Cost type and SerialNo irrelevant
df11 = df1.drop(['CostType'] , axis = 1)
# Aggregate by StepNo, find total cost and count
##df12 = df11.groupby(['PartNo', 'StepNo']).agg(['sum', 'count']).reset_index()
df12 = df11.groupby(['PartNo', 'StepNo', 'SerialNo']).agg(['sum', 'count']).reset_index()
df12.columns = ['PartNo', 'StepNo', 'SerialNo', 'Cost', 'Count']
df3 = pd.merge(df2, df12, how = 'left', on = ['PartNo', 'StepNo'])
# Calculate total target cost
df3['TargetTotal'] = df3['TargetCost']*df3['Count']
# pylab.rcParams['figure.figsize'] = (2, 1)
def multiHist(x_data, x_label, bins):
# Hisrogram setup
fig, ax = plt.subplots()
ax.hist(x_data, bins=bins, color='blue', alpha=0.5, histtype='stepfilled')
# Horizontal line
x0 = dfs['TargetTotal'].iloc[0]
ax.axvline(x0, color='red', linewidth=2)
# Annotation
ax.annotate('Target: {:0.2f}'.format(x0), xy=(x0, 1), xytext=(-15, 15),
xycoords=('data', 'axes fraction'), textcoords='offset points',
horizontalalignment='left', verticalalignment='center',
arrowprops=dict(arrowstyle='-|>', fc='white', shrinkA=0, shrinkB=0,
connectionstyle='angle,angleA=0,angleB=90,rad=10'),)
# Labels
ax.set_xlabel(x_label, color = 'grey')
ax.legend(loc='upper left')
plt.show()
# Identify and plot data for each StepNo
for step in df3['StepNo'].unique():
dfs = df3[df3['StepNo']==step]
# Data to plot
cost = dfs['Cost']
labels = 'Part: ' + dfs['PartNo'].iloc[0] + ', ' 'Step:' + str(dfs['StepNo'].iloc[0])
# Plot
multiHist(x_data = cost, x_label = labels, bins = 4)
我一直在尝试发布一些类似的解决方案,但运气不好。
我正在尝试获取制造过程中所有 Step No
的 Cost
的直方图。每个部分都有不同数量的步骤,所以我想为每个部分在一个 plot/image 上有一组直方图。
在我的真实数据中有很多部分,所以如果这可以遍历许多部分并保存图表,那将是理想的。
此外,我们有一个目标成本,我想覆盖在直方图上的每个步骤。这在一个单独的数据框中表示。我卡在了子图的循环中,所以我还没有尝试这个。
下面是我能找到的每一步直方图应该是什么样子的最接近的内容:
到目前为止,这是我的代码:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_excel('Dist_Example.xlsx')
df1 = df[~df['Cost Type'].isin(['Material'])]
number_of_subplots = len(df1['Step No'].unique())
steps = df1['Step No'].unique()
fig, axs = plt.subplots(1, number_of_subplots, sharey = True, tight_layout=True)
for step in steps:
df2 = df1[df1['Step No'].isin([step])]
axs[step].hist(df2['Cost'])
plt.show()
在此先感谢您对我的帮助!
这里是 Target Cost
我想在直方图上显示为垂直线:
PartNo StepNo TargetCost
ABC 10 12
ABC 20 20
ABC 30 13
下面是一些样本历史数据,它们应该在直方图中的 bin 中:
PartNo SerialNo StepNo CostType Cost
ABC 123 10 Labor 11
ABC 123 10 Material 16
ABC 456 10 Labor 21
ABC 456 10 Material 26
ABC 789 10 Labor 21
ABC 789 10 Material 16
ABC 1011 10 Labor 11
ABC 1011 10 Material 6
ABC 1112 10 Labor 1
ABC 1112 10 Material -4
ABC 123 20 Labor 11
ABC 123 20 Material 19
ABC 456 20 Labor 24
ABC 456 20 Material 29
ABC 789 20 Labor 24
ABC 789 20 Material 19
ABC 1011 20 Labor 14
ABC 1011 20 Material 9
ABC 1112 20 Labor 4
ABC 1112 20 Material -1
ABC 123 30 Labor 11
ABC 123 30 Material 13
ABC 456 30 Labor 18
ABC 456 30 Material 23
ABC 789 30 Labor 18
ABC 789 30 Material 13
ABC 1011 30 Labor 8
ABC 1011 30 Material 3
ABC 1112 30 Labor -2
ABC 1112 30 Material -7
第二个样本数据集:
PartNo SerialNo StepNo CostType Cost
DEF Aplha 10 Labor 2
DEF Zed 10 Labor 3
DEF Kelly 10 Labor 4
DEF Aplha 20 Labor 3
DEF Zed 20 Labor 2
DEF Kelly 20 Labor 5
DEF Aplha 30 Labor 6
DEF Zed 30 Labor 7
DEF Kelly 30 Labor 5
DEF Aplha 40 Labor 3
DEF Zed 40 Labor 4
DEF Kelly 40 Labor 2
DEF Aplha 50 Labor 8
DEF Zed 50 Labor 9
DEF Kelly 50 Labor 7
您找不到可以直接为您的数据集解决此问题的直方图函数。您需要以适合您需要的方式聚合数据,然后用条形图表示您的发现。
我发现您的 objective 和数据有点令人困惑,但我想我已经根据这些假设弄清楚了您的目标:
- 您想汇总每个 StepNo 的成本
- 成本类型无关
- 必须计算 Target 总成本,因为您要汇总每个 StepNo 中的所有成本。
剧情
编辑
这不是 OP 想要的。经过一番反复之后,我们找到了一个似乎可行的解决方案
(from the question) I am trying to get histograms for Cost for all the Step No
(from a comment) I actually want to have a historgram for the sum of the cost per serial no in each step.
由于您必须在直方图中的 y 轴上具有 count
或频率,因此您必须以某种有意义的方式聚合数据。下面您将看到每个 SerialNO 在每个步骤的总成本的选择箱数的计数。
结果:
代码:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.pyplot as plt
import pylab
# Load data in two steps:
# df1 = pd.read_clipboard(sep='\s+')
# Part No Serial No Step No Cost Type Cost
# ABC 123 10 Labor 11
# ABC 123 10 Material 16
# ABC 456 10 Labor 21
# ABC 456 10 Material 26
# ...
# df2 = pd.read_clipboard(sep='\s+')
# Part No Step No Target Cost
# ABC 10 12
# ABC 20 20
# ABC 30 13
# Cost type and SerialNo irrelevant
df11 = df1.drop(['CostType'] , axis = 1)
# Aggregate by StepNo, find total cost and count
##df12 = df11.groupby(['PartNo', 'StepNo']).agg(['sum', 'count']).reset_index()
df12 = df11.groupby(['PartNo', 'StepNo', 'SerialNo']).agg(['sum', 'count']).reset_index()
df12.columns = ['PartNo', 'StepNo', 'SerialNo', 'Cost', 'Count']
df3 = pd.merge(df2, df12, how = 'left', on = ['PartNo', 'StepNo'])
# Calculate total target cost
df3['TargetTotal'] = df3['TargetCost']*df3['Count']
# pylab.rcParams['figure.figsize'] = (2, 1)
def multiHist(x_data, x_label, bins):
# Hisrogram setup
fig, ax = plt.subplots()
ax.hist(x_data, bins=bins, color='blue', alpha=0.5, histtype='stepfilled')
# Horizontal line
x0 = dfs['TargetTotal'].iloc[0]
ax.axvline(x0, color='red', linewidth=2)
# Annotation
ax.annotate('Target: {:0.2f}'.format(x0), xy=(x0, 1), xytext=(-15, 15),
xycoords=('data', 'axes fraction'), textcoords='offset points',
horizontalalignment='left', verticalalignment='center',
arrowprops=dict(arrowstyle='-|>', fc='white', shrinkA=0, shrinkB=0,
connectionstyle='angle,angleA=0,angleB=90,rad=10'),)
# Labels
ax.set_xlabel(x_label, color = 'grey')
ax.legend(loc='upper left')
plt.show()
# Identify and plot data for each StepNo
for step in df3['StepNo'].unique():
dfs = df3[df3['StepNo']==step]
# Data to plot
cost = dfs['Cost']
labels = 'Part: ' + dfs['PartNo'].iloc[0] + ', ' 'Step:' + str(dfs['StepNo'].iloc[0])
# Plot
multiHist(x_data = cost, x_label = labels, bins = 4)