如何使用函数从宽数据帧创建多个子图

How to create multiple subplots from a wide dataframe with a function

我有一个数据框 df,其中有 4 个唯一 UID - 1001100210031004.

我想在 python 中编写一个 user-defined function 来执行以下操作:

  1. 增长曲线 - 针对每个独特的 UID 针对 Time 绘制 TurbidityTurbidity 值是 Time_1Time_2Time_3Time_4Time_5 列中的值。例如,UID = 1003 每个图上有 4 个图

  1. 为每个图表添加图例,例如 M+LF+LM+RF+R(来自 Gen 列和 Type)

  2. 为每个图表添加标题。例如- UID:1003 + Site:FRX

  3. 将图表导出为 pdfjpegtiff 文件 - 每页 4 个图表

# The dataset 
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import numpy as np
df= {
    'Gen':['M','M','M','M','F','F','F','F','M','M','M','M','F','F','F','F'],
    'Site':['FRX','FRX','FRX','FRX','FRX','FRX','FRX','FRX','FRX','FRX','FRX','FRX','FRX','FRX','FRX','FRX'],
    'Type':['L','L','L','L','L','L','L','L','R','R','R','R','R','R','R','R'],
     'UID':[1001,1002,1003,1004,1001,1002,1003,1004,1001,1002,1003,1004,1001,1002,1003,1004],
    'Time1':[100.78,112.34,108.52,139.19,149.02,177.77,79.18,89.10,106.78,102.34,128.52,119.19,129.02,147.77,169.18,170.11],
    'Time2':[150.78,162.34,188.53,197.69,208.07,217.76,229.48,139.51,146.87,182.54,189.57,199.97,229.28,244.73,269.91,249.19],
     'Time3':[250.78,262.34,288.53,297.69,308.07,317.7,329.81,339.15,346.87,382.54,369.59,399.97,329.28,347.73,369.91,349.12],
     'Time4':[240.18,232.14,258.53,276.69,338.07,307.74,359.16,339.25,365.87,392.48,399.97,410.75,429.08,448.39,465.15,469.33],
     'Time5':[270.84,282.14,298.53,306.69,318.73,327.47,369.63,389.59,398.75,432.18,449.78,473.55,494.85,509.39,515.52,539.23]
}
df = pd.DataFrame(df,columns = ['Gen','Site','Type','UID','Time1','Time2','Time3','Time4','Time5'])
df

我的尝试

# See below for my thoughts/attempt- I am open to other python libraries and approaches

def graph2pdf(inputdata):
  #1. convert from wide to long
    inputdata = pd.melt(df,id_vars = ['Gen','Type','UID'],var_name = 'Time',value_name = 'Turbidity')
  #
    cmaps = ['Reds', 'Blues', 'Greens', 'Greys','Yellows']
    label_patches = []
    for i, cmap in enumerate(cmaps):
           # I want a growth curve not a distribution curve
           sns.kdeplot(x = Time, y = Turbidity,data = data, cmap=cmaps[i]+'_d')
           label_patch = mpatches.Patch(color=sns.color_palette(cmaps[i])[2],label=label)
           label_patches.append(label_patch)
    #2. add legend
    plt.legend(handles=label_patches, loc='upper left')
    #3. add title- 'UID number+ SiteName: FRX' to each of the graphs
    plt.title('UID:1003+FRX')
    plt.show()
    #4. export as pdf file i.e 4 graphs per page
    with PdfPages('turbidityvstime_pdf.pdf') as pdf:
         plt.figure(figsize=(2,2)) # 4 graphs per page, I am anticipating more pages in the future
    
         pdf.savefig()  # saves the current figure into a pdf page
         plt.close()

# testing the user-defined function   
graph2pdf(df)

我希望图表看起来像下图(turbidity 而不是 y-axis 上的 densityx-axis 上的 time) .如果可能,最好使用白色或清晰的背景

谢谢

  • I 线图通常不适用于离散数据,因为线的斜率可能暗示不存在的趋势。
    • 这是离散的,因为测量是在离散的时间点进行的,而不是连续的时间序列。
    • 离散数据最好用条形图可视化。
  • 使用 seaborn 图级方法,例如 sns.catplot or sns.replot 创建具有四个子图的图。
  • 测试于 python 3.8.11pandas 1.3.2matplotlib 3.4.3seaborn 0.11.2
import pandas as pd
import seaborn as sns

def graph2pdf(df):
    # melt the dataframe; any column not a var or value, should be in id_vars
    data = df.melt(id_vars=df.columns[:4], var_name='Time', value_name='Turbidity')
    
    # combine Gen and Type to create label, which can be used for hue
    data['label'] = data.Gen + '-' + data.Type
    
    # plot a catplot for bars
    p1 = sns.catplot(data=data, kind='bar', x='Time', y='Turbidity', hue='label', col='UID', col_wrap=2, height=3.25)
    p1.fig.subplots_adjust(top=0.9) # adjust the figure
    p1.fig.suptitle('UID:1003+FRX')
    p1.savefig("barplots.png")

    # plot a relplot for lines
    p2 = sns.relplot(data=data, kind='line', x='Time', y='Turbidity', hue='label', col='UID', col_wrap=2, height=3.25, marker='o')
    p2.fig.subplots_adjust(top=0.9)
    p2.fig.suptitle('UID:1003+FRX')
    p2.savefig("lineplots.png")
    

graph2pdf(df)