我如何规范化我的数据框,使我的线图从同一点开始?
how can I normalize my dataframe , in way that my line plots start from a same point?
我有一个如下所示的数据框(名为 net_asset),从 2015 年到今天
a b c d e f g h i j k l m n o p q r
Date
2015-04-30 162.20100 38.69620 98.88842 11.75094 8.92177 1.07767 112.81237 110.08090 NaN 4.20428 221.5440 NaN 1.63142 155.30297 8.19891 13.94684 7.40493 27.85345
2015-05-29 164.04053 39.19910 101.54701 11.97325 8.94295 1.12211 114.48715 113.24696 NaN 4.30719 215.7512 NaN 1.65257 154.85456 8.33938 14.29280 7.47724 27.32846
2015-06-30 163.17050 39.00262 101.77694 11.93908 8.96241 1.13880 114.23190 112.75483 10.0000 4.22515 207.5485 NaN 1.67049 158.25418 8.57353 14.13962 7.61546 26.99618
2015-07-31 160.73069 38.49814 102.63752 11.95354 8.93894 1.14438 111.00177 110.01403 10.1106 4.19375 205.0794 NaN 1.65833 161.83255 8.67075 14.25327 7.67866 27.31167
为了在绘图后更容易比较数据,我希望所有列都从同一点开始,这里是 100。(2015 年应该都是 100)
我试过下面的代码,但无法得到我想象的,2015 年是 100。
net_asset.apply(lambda x: (x - x.min()) / (x.max() - x.min()))
上面的代码returns。 net_asset.head()
Date
2015-04-30 29.481157 20.728226 12.566996 14.006493 24.887183 85.363231 11.168351 20.119944 NaN 26.292755 38.674209 NaN 19.586481 9.290352 5.570366 9.204228 4.566915 100.000000
2015-05-29 31.475018 22.683843 15.138121 16.334712 25.302741 95.113764 12.794772 25.172351 NaN 31.434296 34.177011 NaN 21.440216 9.022051 7.029734 11.419483 5.223939 95.558550
2015-06-30 30.531995 21.919795 15.360487 15.976855 25.684553 98.775698 12.546892 24.387008 26.207877 27.335452 27.808905 NaN 23.010851 11.056174 9.462360 10.438639 6.479836 92.747440
2015-07-31 27.887493 19.958033 16.192755 16.128292 25.224064 100.000000 9.410033 20.013232 27.427053 25.766660 25.892037 NaN 21.945063 13.197250 10.472396 11.166364 7.054085 95.416506
net_asset.tail()
2020-11-30 67.200005 72.608636 76.959357 85.856731 88.155809 57.219650 94.367147 84.263184 84.411962 49.771676 78.669830 91.698367 91.659509 95.793550 97.312319 100.000000 98.638703 12.572080
2020-12-31 79.321960 80.759312 87.806721 94.821595 96.394572 69.535073 99.215011 97.320232 87.610922 62.294533 89.893726 100.000000 100.000000 100.000000 100.000000 99.515149 100.000000 20.818697
2021-01-29 82.292270 80.581521 87.481611 92.795622 97.256100 70.575071 99.335197 93.571979 89.231346 58.588387 91.402937 92.293295 96.259225 96.302455 93.245683 95.127478 94.362002 20.405762
2021-02-26 91.587476 90.773715 91.445362 94.800335 98.102520 81.569651 95.674504 91.847156 97.434880 70.743028 97.713593 85.960528 89.612951 93.915749 88.721404 87.146839 88.763620 21.716141
2021-03-31 100.000000 100.000000 100.000000 100.000000 100.000000 91.807271 100.000000 97.903339 100.000000 81.996363 100.000000 94.200479 87.929251 89.484993 86.827664 86.035818 87.447754 19.689448
有什么方法可以做到这一点?
谢谢
- 有些列以 Nan 开头,但后来得到了值
- 在 excel 我通过将每一行除以第一行然后乘以一百来做到这一点。 =(A2/$A$2)*100
如果要对每一列应用归一化,则必须使用 axis=0
Z 得分归一化
"计算z分数的公式是z = (x-μ)/σ,其中x是原始分数,μ是总体均值,σ 是总体标准差。如公式所示,z 分数就是原始分数减去总体均值,再除以总体标准差。“
#get mean each column
mean = df.mean(axis=0)
#get standard deviation
std = df.std(axis=0)
#normalization
normalization = ((df - mean) / std)
或一行
normalization = (df - df.mean()) / df.std()
最小-最大归一化
normalization = (df-df.min()) / (df.max()-df.min())
如果您想将值固定为 100,只需乘以 100
normalization = ( (df-df.min()) / (df.max()-df.min()) * 100 )
我有一个如下所示的数据框(名为 net_asset),从 2015 年到今天
a b c d e f g h i j k l m n o p q r
Date
2015-04-30 162.20100 38.69620 98.88842 11.75094 8.92177 1.07767 112.81237 110.08090 NaN 4.20428 221.5440 NaN 1.63142 155.30297 8.19891 13.94684 7.40493 27.85345
2015-05-29 164.04053 39.19910 101.54701 11.97325 8.94295 1.12211 114.48715 113.24696 NaN 4.30719 215.7512 NaN 1.65257 154.85456 8.33938 14.29280 7.47724 27.32846
2015-06-30 163.17050 39.00262 101.77694 11.93908 8.96241 1.13880 114.23190 112.75483 10.0000 4.22515 207.5485 NaN 1.67049 158.25418 8.57353 14.13962 7.61546 26.99618
2015-07-31 160.73069 38.49814 102.63752 11.95354 8.93894 1.14438 111.00177 110.01403 10.1106 4.19375 205.0794 NaN 1.65833 161.83255 8.67075 14.25327 7.67866 27.31167
为了在绘图后更容易比较数据,我希望所有列都从同一点开始,这里是 100。(2015 年应该都是 100)
我试过下面的代码,但无法得到我想象的,2015 年是 100。
net_asset.apply(lambda x: (x - x.min()) / (x.max() - x.min()))
上面的代码returns。 net_asset.head()
Date
2015-04-30 29.481157 20.728226 12.566996 14.006493 24.887183 85.363231 11.168351 20.119944 NaN 26.292755 38.674209 NaN 19.586481 9.290352 5.570366 9.204228 4.566915 100.000000
2015-05-29 31.475018 22.683843 15.138121 16.334712 25.302741 95.113764 12.794772 25.172351 NaN 31.434296 34.177011 NaN 21.440216 9.022051 7.029734 11.419483 5.223939 95.558550
2015-06-30 30.531995 21.919795 15.360487 15.976855 25.684553 98.775698 12.546892 24.387008 26.207877 27.335452 27.808905 NaN 23.010851 11.056174 9.462360 10.438639 6.479836 92.747440
2015-07-31 27.887493 19.958033 16.192755 16.128292 25.224064 100.000000 9.410033 20.013232 27.427053 25.766660 25.892037 NaN 21.945063 13.197250 10.472396 11.166364 7.054085 95.416506
net_asset.tail()
2020-11-30 67.200005 72.608636 76.959357 85.856731 88.155809 57.219650 94.367147 84.263184 84.411962 49.771676 78.669830 91.698367 91.659509 95.793550 97.312319 100.000000 98.638703 12.572080
2020-12-31 79.321960 80.759312 87.806721 94.821595 96.394572 69.535073 99.215011 97.320232 87.610922 62.294533 89.893726 100.000000 100.000000 100.000000 100.000000 99.515149 100.000000 20.818697
2021-01-29 82.292270 80.581521 87.481611 92.795622 97.256100 70.575071 99.335197 93.571979 89.231346 58.588387 91.402937 92.293295 96.259225 96.302455 93.245683 95.127478 94.362002 20.405762
2021-02-26 91.587476 90.773715 91.445362 94.800335 98.102520 81.569651 95.674504 91.847156 97.434880 70.743028 97.713593 85.960528 89.612951 93.915749 88.721404 87.146839 88.763620 21.716141
2021-03-31 100.000000 100.000000 100.000000 100.000000 100.000000 91.807271 100.000000 97.903339 100.000000 81.996363 100.000000 94.200479 87.929251 89.484993 86.827664 86.035818 87.447754 19.689448
有什么方法可以做到这一点? 谢谢
- 有些列以 Nan 开头,但后来得到了值
- 在 excel 我通过将每一行除以第一行然后乘以一百来做到这一点。 =(A2/$A$2)*100
如果要对每一列应用归一化,则必须使用 axis=0
Z 得分归一化
"计算z分数的公式是z = (x-μ)/σ,其中x是原始分数,μ是总体均值,σ 是总体标准差。如公式所示,z 分数就是原始分数减去总体均值,再除以总体标准差。“
#get mean each column
mean = df.mean(axis=0)
#get standard deviation
std = df.std(axis=0)
#normalization
normalization = ((df - mean) / std)
或一行
normalization = (df - df.mean()) / df.std()
最小-最大归一化
normalization = (df-df.min()) / (df.max()-df.min())
如果您想将值固定为 100,只需乘以 100
normalization = ( (df-df.min()) / (df.max()-df.min()) * 100 )