How to normalize the date columns of one panda data frame (ValueError: could not convert string to float: '17-Aug-20 00:00:00')

Question

我已按时间列打开 pandas CSV 文件，如下所示：

所以我试图通过以下命令规范化数据（df 变量）：

import pandas as pd
from sklearn import preprocessing

import numpy as np
from sklearn.preprocessing import MinMaxScaler
import time


minmax = MinMaxScaler().fit(df.iloc[:].values.reshape((-1,1)))
df_log = MinMaxScaler().fit_transform(df.iloc[:].astype('float32'))

df.head()

或

df = pd.DataFrame(df.astype('float64'), columns=['Time'])

# specify your desired range (-1, 1)
scaler = MinMaxScaler(feature_range=(-1, 1))
scaled = scaler.fit_transform(df.values)
print(scaled)

但是我通过运行以上两个代码块得到了这个错误：

~/anaconda3/lib/python3.8/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     81 
     82     """
---> 83     return array(a, dtype, copy=False, order=order)
     84 
     85 

ValueError: could not convert string to float: '17-Aug-20 00:00:00'

所以如果可能的话，在这里询问如何规范化一个熊猫数据框的日期列。

谢谢。

Answer 1

你可以试试这个：

import pandas as pd
from sklearn.preprocessing import MinMaxScaler

df = pd.DataFrame(
    {
        "time": [
            "17-Aug-20 00:00:00",
            "17-Aug-20 00:01:00",
            "17-Aug-20 00:02:00",
            "17-Aug-20 00:03:00",
            "17-Aug-20 00:04:00",
        ],
    }
)

# Convert to datetime type
df["time"] = pd.to_datetime(df["time"])

# Convert to Unix timestamp seconds
df["time"] = (df["time"] - pd.Timestamp("1970-01-01")) // pd.Timedelta("1s")

# Scale values
scaler = MinMaxScaler(feature_range=(-1, 1))
scaled = scaler.fit_transform(df["time"].values.reshape(-1, 1))

print(scaled)
# Outputs
[[-1. ] 
 [-0.5] 
 [ 0. ] 
 [ 0.5] 
 [ 1. ]]

How to normalize the date columns of one panda data frame (ValueError: could not convert string to float: '17-Aug-20 00:00:00')

How to normalize the date columns of one panda data frame (ValueError: could not convert string to float: '17-Aug-20 00:00:00')

normalization

dataframe

pandas