根据已转换为日期时间格式并从 POSIX 时间戳中剥离的时间从 pandas 帧中选择行,python
Selecting a rows from a pandas frame based on a time that has been converted into datetime format and stripped from a POSIX time stamp, python
我正在使用下面的代码从 Google Finance 中提取数据。时间戳是POSIX形式,所以转换成数据时间。当我尝试根据时间标准 (14:35:00) 过滤它时,它 returns 是一个空的 table。我怀疑它与 POSIX/ datetime 转换有关,但不知道如何解决它。
def get_intraday_data(symbol, interval_seconds=301, num_days=10):
# Specify URL string based on function inputs.
url_string = 'http://www.google.com/finance/getprices?q={0}'.format(symbol.upper())
url_string += "&i={0}&p={1}d&f=d,o,h,l,c,v".format(interval_seconds,num_days)
# Request the text, and split by each line
r = requests.get(url_string).text.split()
# Split each line by a comma, starting at the 8th line
r = [line.split(',') for line in r[7:]]
# Save data in Pandas DataFrame
df = pd.DataFrame(r, columns=['Datetime','Close','High','Low','Open','Volume'])
# Convert UNIX to Datetime format
df['Datetime'] = df['Datetime'].apply(lambda x: datetime.datetime.fromtimestamp(int(x[1:])))
#Seperate Date and Time
df['Time'],df['Date']= df['Datetime'].apply(lambda x:x.time()), df['Datetime'].apply(lambda x:x.date())
#Convert 'Close','High','Low','Open', deleting 'Volume'
''''df['Close'] = df['Close'].astype('float64')
df['High'] = df['High'].astype('float64')
df['Low'] = df['Low'].astype('float64')
df['Open'] = df['Open'].astype('float64')'''
del df['Volume']
del df['Datetime']
df[['Close','High','Low','Open']] = df[['Close','High','Low','Open']].astype('float64')
# Calculating %Change and Range
df['%pct'] = (df['Close'] - df['Open'])/df['Open']
df['Range'] = df['High'] - df['Low']
#Sort Columns
return df
我已将此函数的结果存储为 NAS
NAS = get_intraday_data('IXIC', interval_seconds=301, num_days= 100)
过滤条件是:
NAS[NAS['Time'] == '14:35:00']
我将不胜感激。
你可以使用这个
NAS.query('Datetime.dt.hour==14 and Datetime.dt.minute==35 and Datetime.dt.second==0')
编辑:
在日期时间系列而不是时间系列上应用 dt
raw_data = {'Datetime': ['2015-05-01T14:35:00', '2016-07-04T02:26:00', '2013-02-01T04:41:00']}
df = pd.DataFrame(raw_data, columns = ['Datetime'])
df["Datetime"] = pd.to_datetime(df["Datetime"])
df['Time'],df['Date']= df['Datetime'].apply(lambda x:x.time()), df['Datetime'].apply(lambda x:x.date())
df = df.set_index(df["Datetime"])
df['hour']=df['Datetime'].dt.hour
df['minute']=df['Datetime'].dt.minute
df['second']=df['Datetime'].dt.second
df.query('Datetime.dt.hour==14 and Datetime.dt.minute==35 and Datetime.dt.second==0')
我看到您将 timestamp
转换为 datetime
不正确。您正在呼叫 datetime
两次。
替换
df['Datetime'] = df['Datetime'].apply(lambda x: datetime.datetime.fromtimestamp(int(x[1:])))
和
df['Datetime'] = df['Datetime'].apply(lambda x: datetime.fromtimestamp(int(x[1:])))
在你问题的第二部分:
NAS = get_intraday_data('IXIC', interval_seconds=301, num_days= 100)
NAS[NAS['Time'] == '14:35:00']
您关心将 datetime.time
的实例与字符串进行比较,这是不正确的。尝试
NAS[NAS['Time'] == datetime.strptime('14:35:00', '%H:%M:%S').time()]
它应该按预期工作。
更新:
运行 具有建议更改的脚本将数据显示为:
Close High Low 开盘时间 日期 %pct<br>
60 5162.448 5165.124 5162.448 5165.057 14:35:00 2016-07-29 -0.000505<br>
138 5181.768 5183.184 5181.193 5181.404 14:35:00 2016-08-01 0.000070<br>
216 5130.514 5131.933 5130.434 5131.893 14:35:00 2016-08-02 -0.000269<br>
294 5146.608 5146.608 5143.827 5144.788 14:35:00 2016-08-03 0.000354<br>
372 5163.854 5164.154 5162.997 5164.021 14:35:00 2016-08-04 -0.000032<br>
450 5221.624 5221.911 5220.658 5220.789 14:35:00 2016-08-05 0.000160<br>
528 5204.111 5204.240 5202.476 5202.865 14:35:00 2016-08-08 0.000239<br>
.
.
.
3648 5282.999 5283.017 5279.008 5279.340 14:35:00 2016-10-04 0.000693<br>
3726 5324.450 5325.375 5323.628 5324.129 14:35:00 2016-10-05 0.000060<br>
3804 5310.945 5311.454 5310.194 5310.558 14:35:00 2016-10-06 0.000073<br>
3882 5295.064 5295.080 5292.184 5292.327 14:35:00 2016-10-07 0.000517
我正在使用下面的代码从 Google Finance 中提取数据。时间戳是POSIX形式,所以转换成数据时间。当我尝试根据时间标准 (14:35:00) 过滤它时,它 returns 是一个空的 table。我怀疑它与 POSIX/ datetime 转换有关,但不知道如何解决它。
def get_intraday_data(symbol, interval_seconds=301, num_days=10):
# Specify URL string based on function inputs.
url_string = 'http://www.google.com/finance/getprices?q={0}'.format(symbol.upper())
url_string += "&i={0}&p={1}d&f=d,o,h,l,c,v".format(interval_seconds,num_days)
# Request the text, and split by each line
r = requests.get(url_string).text.split()
# Split each line by a comma, starting at the 8th line
r = [line.split(',') for line in r[7:]]
# Save data in Pandas DataFrame
df = pd.DataFrame(r, columns=['Datetime','Close','High','Low','Open','Volume'])
# Convert UNIX to Datetime format
df['Datetime'] = df['Datetime'].apply(lambda x: datetime.datetime.fromtimestamp(int(x[1:])))
#Seperate Date and Time
df['Time'],df['Date']= df['Datetime'].apply(lambda x:x.time()), df['Datetime'].apply(lambda x:x.date())
#Convert 'Close','High','Low','Open', deleting 'Volume'
''''df['Close'] = df['Close'].astype('float64')
df['High'] = df['High'].astype('float64')
df['Low'] = df['Low'].astype('float64')
df['Open'] = df['Open'].astype('float64')'''
del df['Volume']
del df['Datetime']
df[['Close','High','Low','Open']] = df[['Close','High','Low','Open']].astype('float64')
# Calculating %Change and Range
df['%pct'] = (df['Close'] - df['Open'])/df['Open']
df['Range'] = df['High'] - df['Low']
#Sort Columns
return df
我已将此函数的结果存储为 NAS
NAS = get_intraday_data('IXIC', interval_seconds=301, num_days= 100)
过滤条件是:
NAS[NAS['Time'] == '14:35:00']
我将不胜感激。
你可以使用这个
NAS.query('Datetime.dt.hour==14 and Datetime.dt.minute==35 and Datetime.dt.second==0')
编辑: 在日期时间系列而不是时间系列上应用 dt
raw_data = {'Datetime': ['2015-05-01T14:35:00', '2016-07-04T02:26:00', '2013-02-01T04:41:00']}
df = pd.DataFrame(raw_data, columns = ['Datetime'])
df["Datetime"] = pd.to_datetime(df["Datetime"])
df['Time'],df['Date']= df['Datetime'].apply(lambda x:x.time()), df['Datetime'].apply(lambda x:x.date())
df = df.set_index(df["Datetime"])
df['hour']=df['Datetime'].dt.hour
df['minute']=df['Datetime'].dt.minute
df['second']=df['Datetime'].dt.second
df.query('Datetime.dt.hour==14 and Datetime.dt.minute==35 and Datetime.dt.second==0')
我看到您将 timestamp
转换为 datetime
不正确。您正在呼叫 datetime
两次。
替换
df['Datetime'] = df['Datetime'].apply(lambda x: datetime.datetime.fromtimestamp(int(x[1:])))
和
df['Datetime'] = df['Datetime'].apply(lambda x: datetime.fromtimestamp(int(x[1:])))
在你问题的第二部分:
NAS = get_intraday_data('IXIC', interval_seconds=301, num_days= 100)
NAS[NAS['Time'] == '14:35:00']
您关心将 datetime.time
的实例与字符串进行比较,这是不正确的。尝试
NAS[NAS['Time'] == datetime.strptime('14:35:00', '%H:%M:%S').time()]
它应该按预期工作。
更新:
运行 具有建议更改的脚本将数据显示为:
Close High Low 开盘时间 日期 %pct<br>
60 5162.448 5165.124 5162.448 5165.057 14:35:00 2016-07-29 -0.000505<br>
138 5181.768 5183.184 5181.193 5181.404 14:35:00 2016-08-01 0.000070<br>
216 5130.514 5131.933 5130.434 5131.893 14:35:00 2016-08-02 -0.000269<br>
294 5146.608 5146.608 5143.827 5144.788 14:35:00 2016-08-03 0.000354<br>
372 5163.854 5164.154 5162.997 5164.021 14:35:00 2016-08-04 -0.000032<br>
450 5221.624 5221.911 5220.658 5220.789 14:35:00 2016-08-05 0.000160<br>
528 5204.111 5204.240 5202.476 5202.865 14:35:00 2016-08-08 0.000239<br>
.
.
.
3648 5282.999 5283.017 5279.008 5279.340 14:35:00 2016-10-04 0.000693<br>
3726 5324.450 5325.375 5323.628 5324.129 14:35:00 2016-10-05 0.000060<br>
3804 5310.945 5311.454 5310.194 5310.558 14:35:00 2016-10-06 0.000073<br>
3882 5295.064 5295.080 5292.184 5292.327 14:35:00 2016-10-07 0.000517