Pandas 获取数据框的一部分并对值进行归一化
Pandas to take portions of data-frame and normalize values
如下两列的数据框。
我想通过给出日期来选择部分,并标准化(通过使用最小-最大方法)"Weight"。
这是我的计划:
import pandas as pd
data = {'Date': ["2000-02-01", "2000-03-01", "2000-04-03", "2000-05-01", "2000-06-01", "2000-07-03", "2000-08-01", "2000-09-01", "2000-10-02", "2000-11-01"],
'Weight' : [478, 26, 144, 9, 453, 24, 383, 314, 291, 286]}
df = pd.DataFrame(data)
df_1 = df.loc[df['Date'] >= "2000-04-01"]
df_1 = (df_1 - df_1.min()) / (df_1.max() - df_1.min())
print df_1
# the ideal output is two columns: 1 for Dates after "2000-04-01". 1 for their correspondent normalized "Weights".
给出错误:
TypeError: unsupported operand type(s) for -: 'str' and 'str'
如何实现?谢谢。
首先将值转换为日期时间,然后仅处理 Weight
列并覆盖 Weight
列:
df['Date'] = pd.to_datetime(df['Date'] )
df_1 = df.loc[df['Date'] >= "2000-04-01"]
a = (df_1['Weight'] - df_1['Weight'].min()) / (df_1['Weight'].max() - df_1['Weight'].min())
print (df_1.assign(Weight = a))
Date Weight
2 2000-04-03 0.304054
3 2000-05-01 0.000000
4 2000-06-01 1.000000
5 2000-07-03 0.033784
6 2000-08-01 0.842342
7 2000-09-01 0.686937
8 2000-10-02 0.635135
9 2000-11-01 0.623874
日期列的数据类型是string.so你必须改变它to.for你可以使用这种方法==>
df['Date']=pd.to_datetime(df['Date'])
如下两列的数据框。
我想通过给出日期来选择部分,并标准化(通过使用最小-最大方法)"Weight"。
这是我的计划:
import pandas as pd
data = {'Date': ["2000-02-01", "2000-03-01", "2000-04-03", "2000-05-01", "2000-06-01", "2000-07-03", "2000-08-01", "2000-09-01", "2000-10-02", "2000-11-01"],
'Weight' : [478, 26, 144, 9, 453, 24, 383, 314, 291, 286]}
df = pd.DataFrame(data)
df_1 = df.loc[df['Date'] >= "2000-04-01"]
df_1 = (df_1 - df_1.min()) / (df_1.max() - df_1.min())
print df_1
# the ideal output is two columns: 1 for Dates after "2000-04-01". 1 for their correspondent normalized "Weights".
给出错误:
TypeError: unsupported operand type(s) for -: 'str' and 'str'
如何实现?谢谢。
首先将值转换为日期时间,然后仅处理 Weight
列并覆盖 Weight
列:
df['Date'] = pd.to_datetime(df['Date'] )
df_1 = df.loc[df['Date'] >= "2000-04-01"]
a = (df_1['Weight'] - df_1['Weight'].min()) / (df_1['Weight'].max() - df_1['Weight'].min())
print (df_1.assign(Weight = a))
Date Weight
2 2000-04-03 0.304054
3 2000-05-01 0.000000
4 2000-06-01 1.000000
5 2000-07-03 0.033784
6 2000-08-01 0.842342
7 2000-09-01 0.686937
8 2000-10-02 0.635135
9 2000-11-01 0.623874
日期列的数据类型是string.so你必须改变它to.for你可以使用这种方法==>
df['Date']=pd.to_datetime(df['Date'])