我如何规范化我的数据框,使我的线图从同一点开始?

how can I normalize my dataframe , in way that my line plots start from a same point?

我有一个如下所示的数据框(名为 net_asset),从 2015 年到今天

    a   b   c   d   e   f   g   h   i   j   k   l   m   n   o   p   q   r
Date                                                                        
2015-04-30  162.20100   38.69620    98.88842    11.75094    8.92177 1.07767 112.81237   110.08090   NaN 4.20428 221.5440    NaN 1.63142 155.30297   8.19891 13.94684    7.40493 27.85345
2015-05-29  164.04053   39.19910    101.54701   11.97325    8.94295 1.12211 114.48715   113.24696   NaN 4.30719 215.7512    NaN 1.65257 154.85456   8.33938 14.29280    7.47724 27.32846
2015-06-30  163.17050   39.00262    101.77694   11.93908    8.96241 1.13880 114.23190   112.75483   10.0000 4.22515 207.5485    NaN 1.67049 158.25418   8.57353 14.13962    7.61546 26.99618
2015-07-31  160.73069   38.49814    102.63752   11.95354    8.93894 1.14438 111.00177   110.01403   10.1106 4.19375 205.0794    NaN 1.65833 161.83255   8.67075 14.25327    7.67866 27.31167

为了在绘图后更容易比较数据,我希望所有列都从同一点开始,这里是 100。(2015 年应该都是 100)

我试过下面的代码,但无法得到我想象的,2015 年是 100。

net_asset.apply(lambda x: (x - x.min()) / (x.max() - x.min()))

上面的代码returns。 net_asset.head()

Date                                                                        
2015-04-30  29.481157   20.728226   12.566996   14.006493   24.887183   85.363231   11.168351   20.119944   NaN 26.292755   38.674209   NaN 19.586481   9.290352    5.570366    9.204228    4.566915    100.000000
2015-05-29  31.475018   22.683843   15.138121   16.334712   25.302741   95.113764   12.794772   25.172351   NaN 31.434296   34.177011   NaN 21.440216   9.022051    7.029734    11.419483   5.223939    95.558550
2015-06-30  30.531995   21.919795   15.360487   15.976855   25.684553   98.775698   12.546892   24.387008   26.207877   27.335452   27.808905   NaN 23.010851   11.056174   9.462360    10.438639   6.479836    92.747440
2015-07-31  27.887493   19.958033   16.192755   16.128292   25.224064   100.000000  9.410033    20.013232   27.427053   25.766660   25.892037   NaN 21.945063   13.197250   10.472396   11.166364   7.054085    95.416506

net_asset.tail()

2020-11-30  67.200005   72.608636   76.959357   85.856731   88.155809   57.219650   94.367147   84.263184   84.411962   49.771676   78.669830   91.698367   91.659509   95.793550   97.312319   100.000000  98.638703   12.572080
2020-12-31  79.321960   80.759312   87.806721   94.821595   96.394572   69.535073   99.215011   97.320232   87.610922   62.294533   89.893726   100.000000  100.000000  100.000000  100.000000  99.515149   100.000000  20.818697
2021-01-29  82.292270   80.581521   87.481611   92.795622   97.256100   70.575071   99.335197   93.571979   89.231346   58.588387   91.402937   92.293295   96.259225   96.302455   93.245683   95.127478   94.362002   20.405762
2021-02-26  91.587476   90.773715   91.445362   94.800335   98.102520   81.569651   95.674504   91.847156   97.434880   70.743028   97.713593   85.960528   89.612951   93.915749   88.721404   87.146839   88.763620   21.716141
2021-03-31  100.000000  100.000000  100.000000  100.000000  100.000000  91.807271   100.000000  97.903339   100.000000  81.996363   100.000000  94.200479   87.929251   89.484993   86.827664   86.035818   87.447754   19.689448

有什么方法可以做到这一点? 谢谢

如果要对每一列应用归一化,则必须使用 axis=0

Z 得分归一化

"计算z分数的公式是z = (x-μ)/σ,其中x是原始分数,μ是总体均值,σ 是总体标准差。如公式所示,z 分数就是原始分数减去总体均值,再除以总体标准差。“

#get mean each column
mean = df.mean(axis=0)
#get standard deviation
std = df.std(axis=0)
#normalization
normalization = ((df - mean) / std)

或一行

normalization = (df - df.mean()) / df.std()

最小-最大归一化

normalization = (df-df.min()) / (df.max()-df.min())

如果您想将值固定为 100,只需乘以 100

normalization = ( (df-df.min()) / (df.max()-df.min()) * 100 )