如何从 pandasql 中的日期中提取日期特征?
How can one extract date features from a date in pandasql?
我需要使用 pandasql 从 pandas 数据框的日期列中提取日期特征(日、周、月、年)。我似乎无法找到 SQL pandasql 正在使用的版本,所以我不确定如何完成这项壮举。有没有其他人尝试过类似的东西?
这是我目前的情况:
#import the needed libraries
import numpy as np
import pandas as pd
import pandasql as psql
#establish dataset
doc = 'room_data.csv'
df = pd.read_csv(doc)
df.head()
df2 = psql.sqldf('''
SELECT
Timestamp
, EXTRACT (DAY FROM "Timestamp") AS Day --DOES NOT WORK IN THIS VERSION OF SQL
, Temperature
, Humidity
FROM df
''')
df2.head()
数据框示例:
给你:
df['year'] = pd.DatetimeIndex(df['date']).year
df['month'] = pd.DatetimeIndex(df['date']).month
df['day'] = pd.DatetimeIndex(df['date']).day
据我所知,SQLite 不支持 EXTRACT() 函数。
你可以试试strftime('%d', Timestamp)
psql.sqldf('''SELECT
Timestamp
, strftime('%d', Timestamp) AS Day
, Temperature
, Humidity
FROM df
''')
考虑以下演示上述查询的示例:
示例数据帧:
np.random.seed(123)
dates = pd.date_range('01-01-2020','01-05-2020',freq='H')
temp = np.random.randint(0,100,97)
humidity = np.random.randint(20,100,97)
df = pd.DataFrame({"Timestamp":dates,"Temperature":temp,"Humidity":humidity})
print(df.head())
Timestamp Temperature Humidity
0 2020-01-01 00:00:00 66 29
1 2020-01-01 01:00:00 92 43
2 2020-01-01 02:00:00 98 34
3 2020-01-01 03:00:00 17 58
4 2020-01-01 04:00:00 83 39
工作查询:
import pandasql as ps
query = '''SELECT
Timestamp
, strftime('%d', Timestamp) AS Day
, Temperature
, Humidity
FROM df'''
print(ps.sqldf(query).head())
Timestamp Day Temperature Humidity
0 2020-01-01 00:00:00.000000 01 66 29
1 2020-01-01 01:00:00.000000 01 92 43
2 2020-01-01 02:00:00.000000 01 98 34
3 2020-01-01 03:00:00.000000 01 17 58
4 2020-01-01 04:00:00.000000 01 83 39
您可以获取更多详情here获取更多日期提取函数,常用的如下:
import pandasql as ps
query = '''SELECT
Timestamp
, strftime('%d', Timestamp) AS Day
,strftime('%m', Timestamp) AS Month
,strftime('%Y', Timestamp) AS Year
,strftime('%H', Timestamp) AS Hour
, Temperature
, Humidity
FROM df'''
print(ps.sqldf(query).head())
Timestamp Day Month Year Hour Temperature Humidity
0 2020-01-01 00:00:00.000000 01 01 2020 00 66 29
1 2020-01-01 01:00:00.000000 01 01 2020 01 92 34
2 2020-01-01 02:00:00.000000 01 01 2020 02 98 90
3 2020-01-01 03:00:00.000000 01 01 2020 03 17 32
4 2020-01-01 04:00:00.000000 01 01 2020 04 83 74
我需要使用 pandasql 从 pandas 数据框的日期列中提取日期特征(日、周、月、年)。我似乎无法找到 SQL pandasql 正在使用的版本,所以我不确定如何完成这项壮举。有没有其他人尝试过类似的东西?
这是我目前的情况:
#import the needed libraries
import numpy as np
import pandas as pd
import pandasql as psql
#establish dataset
doc = 'room_data.csv'
df = pd.read_csv(doc)
df.head()
df2 = psql.sqldf('''
SELECT
Timestamp
, EXTRACT (DAY FROM "Timestamp") AS Day --DOES NOT WORK IN THIS VERSION OF SQL
, Temperature
, Humidity
FROM df
''')
df2.head()
数据框示例:
给你:
df['year'] = pd.DatetimeIndex(df['date']).year
df['month'] = pd.DatetimeIndex(df['date']).month
df['day'] = pd.DatetimeIndex(df['date']).day
据我所知,SQLite 不支持 EXTRACT() 函数。
你可以试试strftime('%d', Timestamp)
psql.sqldf('''SELECT
Timestamp
, strftime('%d', Timestamp) AS Day
, Temperature
, Humidity
FROM df
''')
考虑以下演示上述查询的示例:
示例数据帧:
np.random.seed(123)
dates = pd.date_range('01-01-2020','01-05-2020',freq='H')
temp = np.random.randint(0,100,97)
humidity = np.random.randint(20,100,97)
df = pd.DataFrame({"Timestamp":dates,"Temperature":temp,"Humidity":humidity})
print(df.head())
Timestamp Temperature Humidity
0 2020-01-01 00:00:00 66 29
1 2020-01-01 01:00:00 92 43
2 2020-01-01 02:00:00 98 34
3 2020-01-01 03:00:00 17 58
4 2020-01-01 04:00:00 83 39
工作查询:
import pandasql as ps
query = '''SELECT
Timestamp
, strftime('%d', Timestamp) AS Day
, Temperature
, Humidity
FROM df'''
print(ps.sqldf(query).head())
Timestamp Day Temperature Humidity
0 2020-01-01 00:00:00.000000 01 66 29
1 2020-01-01 01:00:00.000000 01 92 43
2 2020-01-01 02:00:00.000000 01 98 34
3 2020-01-01 03:00:00.000000 01 17 58
4 2020-01-01 04:00:00.000000 01 83 39
您可以获取更多详情here获取更多日期提取函数,常用的如下:
import pandasql as ps
query = '''SELECT
Timestamp
, strftime('%d', Timestamp) AS Day
,strftime('%m', Timestamp) AS Month
,strftime('%Y', Timestamp) AS Year
,strftime('%H', Timestamp) AS Hour
, Temperature
, Humidity
FROM df'''
print(ps.sqldf(query).head())
Timestamp Day Month Year Hour Temperature Humidity
0 2020-01-01 00:00:00.000000 01 01 2020 00 66 29
1 2020-01-01 01:00:00.000000 01 01 2020 01 92 34
2 2020-01-01 02:00:00.000000 01 01 2020 02 98 90
3 2020-01-01 03:00:00.000000 01 01 2020 03 17 32
4 2020-01-01 04:00:00.000000 01 01 2020 04 83 74