python 从数组中的字典对象中提取日期和小时值
python extract date and hour value from dict object in array
我想提取日期和小时值(例如 2013-10-04 11、2013-10-04 04、2013-10-04 03 等)。当我使用 keys() 时,我可以获得 2013-10-04 而不是小时,而当我使用 items() 时,则可以提取所有数据。如果您知道提取日期和小时值的方法,请告诉我。此外,数据有 30 分钟的间隔。
u'Station_paris_2013-10-04': {'2013-10-04 11:00:00': array([ number, number, number, ...,
number, number, number]), '2013-10-04 04:00:00': array([ number, number, number, ...,
number, number, number]), '2013-10-04 03:00:00': array([ number, number, number, ...,
number, number, number]), '2013-10-04 14:30:00': array([ number, number, number, ...,
number, number, number]), '2013-10-04 20:00:00': array([ number, ....]...)
抱歉造成混淆..
这是 export_allcorr2
的代码,使用 allcorr[ccfid]
作为数据。
def export_allcorr2(session, ccfid, data):
output_folder = get_config(session, 'output_folder')
station1, station2, components, filterid, date = ccfid.split('_')
path = os.path.join(output_folder, "%02i" % int(filterid),
station1, station2, components)
if not os.path.isdir(path):
os.makedirs(path)
df = pd.DataFrame().from_dict(data).T
df.columns = get_t_axis(session)
df.to_hdf(os.path.join(path, date+'.h5'), 'data')
del df
return
if params.keep_all:
for ccfid in allcorr.keys():
export_allcorr2(db, ccfid, allcorr[ccfid])
这是我正在使用的一对电台的文件 (allcorr
) 的一部分,
'2013-10-27 10:30:00': array([ 583.55720165, 424.74395062, 244.40351166, ..., 244.40364883,
424.74411523, 583.55747599]), '2013-10-27 16:30:00': array([ 199.66430727, 18.39147977, -157.45584362, ..., -157.45602195,
18.39139403, 199.66432099]), '2013-10-27 16:00:00': array([ -97.27305213, -365.27786008, -621.36060357, ..., -621.36076818,
-365.27802469, -97.27297668]), '2013-10-27 21:30:00': array([-436.08005487, -389.74776406, -327.61319616, ..., -327.61300412,
-389.74773663, -436.07994513]), '2013-10-27 11:00:00': array([-649.70282579, -597.36164609, -523.04197531, ..., -523.04170096,
-597.36131687, -649.70266118]), '2013-10-27 20:30:00': array([ 347.37681756, 218.49106996, 88.03422497, ..., 88.03427298,
218.49113855, 347.37687243]), '2013-10-27 12:30:00': array([ 34.91324417, -93.73432099, 171.31466392, ..., 171.31396433,
-93.73384088, 34.91361454]), '2013-10-27 13:30:00': array([-289.4951989 , -404.48175583, -501.02052126, ..., -501.02046639,
-404.48170096, -289.49500686]), '2013-10-27 07:30:00': array([-108.69506859, -44.65974623, 7.96771948, ..., 7.96728738,
-44.65979424, -108.69509602]), '2013-10-27 09:30:00': array([-630.18035665, -614.95835391, -597.89119342, ..., -597.89113855,
-614.95807956, -630.18024691]), '2013-10-27 17:00:00': array([-276.81805213, -267.21061728, -246.72584362, ..., -246.72556927,
-267.21053498, -276.81794239]),
ccfid
显示 u'05.SS08_05.SS09_ZZ_01_2013-11-06'
即 'Net.Sta_Net.Sta_component_ccfid_date'。但我想要的是带小时的日期。
这是allcorr[ccfid]
。
{'2013-11-07 07:30:00': array([ 2.01912938e-08, -5.87221879e-08, 7.99213765e-08, ...,
9.93437383e-08, 4.46988525e-08, -4.40811423e-08]), '2013-11-07 14:30:00': array([ -7.76317889e-09, 1.72162791e-09, 1.76833389e-08, ...,
-4.17227052e-08, -8.08114523e-09, 7.22184605e-09]), '2013-11-07 00:00:00': array([ -1.67720752e-08, -4.86950919e-08, -3.92029027e-08, ...,
-4.25311992e-08, -1.43883637e-08, -1.86576377e-08]), '2013-11-07 16:00:00': array([ -1.54196405e-08, -6.50798506e-08, -3.71392759e-08, ...,
-3.63095301e-08, 4.17709433e-10, -1.11803857e-07]), '2013-11-07 15:30:00': array([ -4.30306800e-08, -8.02815645e-08, 1.83716952e-08, ...,
-3.71510132e-08, -5.32969688e-08, 5.72185107e-08])
而在export_allcorr2
代码中,我想要的是将数据文件格式从 Y-M-D 转换为 Y-M-D-H 或 Y-M-D H 格式。所以提取H(小时)数据,并加入相同站对、日期和小时的文件。
我想在同一时间在一个地方组织数组数据,16:00:00 和 16:30:00 也会在同一个地方。
最初,日期文件在 keep_all 代码中被提取为 'date.h5' 格式(例如 2013-10-14.h5)。此外,'2013-10-14.h5'中还有2013-10-14 00:00:00、2013-10-14 00:30:00、2013-10-14 01:00:00等.
因此,作为一种相同的格式,我想将小时提取为一个文件(例如 2013-10-14_01.h5、2013-10-14_02.h5)并在“2013-10-14_01.h5”文件中,会有 2013-10-14_01:00 和 2013-10-14_01:30.
因此,我在代码中想要的是知道关键字(与keep_all代码中的allcorr相关)来提取日期和小时并将每日格式替换为每小时格式。
已编辑..
if params.keep_all:
for ccfid in allcorr.keys():
final_data={}
for data_key in allcorr[ccfid]:
print('NET,STA,NET,STA,COMP,FILTERID,DATE', ccfid)
temp_date=dt.datetime.fromisoformat(data_key)
hh=temp_date.strftime('%H')
dh=temp_date.strftime('%Y-%m-%d %H')
dm=temp_date.strftime('%Y-%m-%d %H:%M')
data=allcorr[ccfid][data_key]
print('DATE AND HOUR', dh)
print('DATA_KEY', data_key)
print('ONLY HOUR',hh)
print('DATA RELATED TO THE DATE AND HOUR', allcorr[ccfid][data_key])
container = final_data.get(dh, False)
if not container:
container = []
final_data[dh] = container
container.extend(allcorr[ccfid][data_key])
print('THE LENGTH OF DATA', len(container))
export_allcorr2(db, ccfid, hh, container)
当我在代码中不包含 'container' 部分时,30 分钟数据覆盖 00 分钟数据。所以我 运行 带有容器部分的代码。结果..
NET,STA,NET,STA,COMP,FILTERID,DATE 05.SS01_05.SS01_ZZ_01_2013-10-08
DATE AND HOUR 2013-10-08 00
DATA_KEY 2013-10-08 00:00:00
ONLY HOUR 00
DATA RELATED TO THE DATE AND HOUR [ 9268.65717062 8616.97848119 7872.42382341 ..., 7872.42115785
8616.97759267 9268.6562821 ]
THE LENGTH OF DATA 4801
NET,STA,NET,STA,COMP,FILTERID,DATE 05.SS01_05.SS01_ZZ_01_2013-10-08
DATE AND HOUR 2013-10-08 00
DATA_KEY 2013-10-08 00:30:00
ONLY HOUR 00
DATA RELATED TO THE DATE AND HOUR [ 375.50442871 -1328.53555463 -3054.59036513 ..., -3054.58703318
-1328.53966403 375.50603915]
THE LENGTH OF DATA 9602
错误..
ValueError: Length mismatch: Expected axis has 9602 elements, new values have 4801 elements
您可以使用datetime
提取日、时等
inp = {'2013-10-04 11:00:00':["rand_stuff"],
'2013-10-04 04:00:00':["rand_stuff"]}
from datetime import datetime
for ts in inp.keys():
dt = datetime.strptime(ts,"%Y-%m-%d %H:%M:%S")
print(f"Date: {dt} Hour: {dt.hour} Day: {dt.day}")
Date: 2013-10-04 11:00:00 Hour: 11 Day: 4
Date: 2013-10-04 04:00:00 Hour: 4 Day: 4
这是我的解决方案:
import datetime as dt
import random as rnd
def fake_data():
temp = (rnd.random() * rnd.randint(10, 100)) * (-1 * rnd.choice([-1, 1]))
return round(temp, 6)
_max = 10
# creating some test data
sample = {
'2013-10-27 10:30:00' : [fake_data() for x in range(_max)],
'2013-10-27 16:00:00' : [fake_data() for x in range(_max)],
'2013-10-27 16:30:00' : [fake_data() for x in range(_max)],
'2013-10-27 11:00:00' : [fake_data() for x in range(_max)],
'2013-10-27 20:30:00' : [fake_data() for x in range(_max)],
}
def printer(data):
for key, value in data.items():
print(key)
print(value)
print("-------------------")
def main():
final_data = {}
for data_key in sample:
# create date from key
temp_date = dt.datetime.fromisoformat(data_key)
# extract only time data
temp_time = temp_date.time()
# if this hour is in our records
container = final_data.get(temp_time.hour, False)
# if not in the recors then record it
if not container:
container = []
# key can be string as well if you like
final_data[temp_time.hour] = container
# add the data the hours record
container.extend(sample[data_key])
printer(final_data)
main()
我把所有的东西都简单地加上评论,如果有什么令人困惑的地方或其他任何评论,我会编辑或回答。祝你好运!
我想提取日期和小时值(例如 2013-10-04 11、2013-10-04 04、2013-10-04 03 等)。当我使用 keys() 时,我可以获得 2013-10-04 而不是小时,而当我使用 items() 时,则可以提取所有数据。如果您知道提取日期和小时值的方法,请告诉我。此外,数据有 30 分钟的间隔。
u'Station_paris_2013-10-04': {'2013-10-04 11:00:00': array([ number, number, number, ...,
number, number, number]), '2013-10-04 04:00:00': array([ number, number, number, ...,
number, number, number]), '2013-10-04 03:00:00': array([ number, number, number, ...,
number, number, number]), '2013-10-04 14:30:00': array([ number, number, number, ...,
number, number, number]), '2013-10-04 20:00:00': array([ number, ....]...)
抱歉造成混淆..
这是 export_allcorr2
的代码,使用 allcorr[ccfid]
作为数据。
def export_allcorr2(session, ccfid, data):
output_folder = get_config(session, 'output_folder')
station1, station2, components, filterid, date = ccfid.split('_')
path = os.path.join(output_folder, "%02i" % int(filterid),
station1, station2, components)
if not os.path.isdir(path):
os.makedirs(path)
df = pd.DataFrame().from_dict(data).T
df.columns = get_t_axis(session)
df.to_hdf(os.path.join(path, date+'.h5'), 'data')
del df
return
if params.keep_all:
for ccfid in allcorr.keys():
export_allcorr2(db, ccfid, allcorr[ccfid])
这是我正在使用的一对电台的文件 (allcorr
) 的一部分,
'2013-10-27 10:30:00': array([ 583.55720165, 424.74395062, 244.40351166, ..., 244.40364883,
424.74411523, 583.55747599]), '2013-10-27 16:30:00': array([ 199.66430727, 18.39147977, -157.45584362, ..., -157.45602195,
18.39139403, 199.66432099]), '2013-10-27 16:00:00': array([ -97.27305213, -365.27786008, -621.36060357, ..., -621.36076818,
-365.27802469, -97.27297668]), '2013-10-27 21:30:00': array([-436.08005487, -389.74776406, -327.61319616, ..., -327.61300412,
-389.74773663, -436.07994513]), '2013-10-27 11:00:00': array([-649.70282579, -597.36164609, -523.04197531, ..., -523.04170096,
-597.36131687, -649.70266118]), '2013-10-27 20:30:00': array([ 347.37681756, 218.49106996, 88.03422497, ..., 88.03427298,
218.49113855, 347.37687243]), '2013-10-27 12:30:00': array([ 34.91324417, -93.73432099, 171.31466392, ..., 171.31396433,
-93.73384088, 34.91361454]), '2013-10-27 13:30:00': array([-289.4951989 , -404.48175583, -501.02052126, ..., -501.02046639,
-404.48170096, -289.49500686]), '2013-10-27 07:30:00': array([-108.69506859, -44.65974623, 7.96771948, ..., 7.96728738,
-44.65979424, -108.69509602]), '2013-10-27 09:30:00': array([-630.18035665, -614.95835391, -597.89119342, ..., -597.89113855,
-614.95807956, -630.18024691]), '2013-10-27 17:00:00': array([-276.81805213, -267.21061728, -246.72584362, ..., -246.72556927,
-267.21053498, -276.81794239]),
ccfid
显示 u'05.SS08_05.SS09_ZZ_01_2013-11-06'
即 'Net.Sta_Net.Sta_component_ccfid_date'。但我想要的是带小时的日期。
这是allcorr[ccfid]
。
{'2013-11-07 07:30:00': array([ 2.01912938e-08, -5.87221879e-08, 7.99213765e-08, ...,
9.93437383e-08, 4.46988525e-08, -4.40811423e-08]), '2013-11-07 14:30:00': array([ -7.76317889e-09, 1.72162791e-09, 1.76833389e-08, ...,
-4.17227052e-08, -8.08114523e-09, 7.22184605e-09]), '2013-11-07 00:00:00': array([ -1.67720752e-08, -4.86950919e-08, -3.92029027e-08, ...,
-4.25311992e-08, -1.43883637e-08, -1.86576377e-08]), '2013-11-07 16:00:00': array([ -1.54196405e-08, -6.50798506e-08, -3.71392759e-08, ...,
-3.63095301e-08, 4.17709433e-10, -1.11803857e-07]), '2013-11-07 15:30:00': array([ -4.30306800e-08, -8.02815645e-08, 1.83716952e-08, ...,
-3.71510132e-08, -5.32969688e-08, 5.72185107e-08])
而在export_allcorr2
代码中,我想要的是将数据文件格式从 Y-M-D 转换为 Y-M-D-H 或 Y-M-D H 格式。所以提取H(小时)数据,并加入相同站对、日期和小时的文件。
我想在同一时间在一个地方组织数组数据,16:00:00 和 16:30:00 也会在同一个地方。
最初,日期文件在 keep_all 代码中被提取为 'date.h5' 格式(例如 2013-10-14.h5)。此外,'2013-10-14.h5'中还有2013-10-14 00:00:00、2013-10-14 00:30:00、2013-10-14 01:00:00等.
因此,作为一种相同的格式,我想将小时提取为一个文件(例如 2013-10-14_01.h5、2013-10-14_02.h5)并在“2013-10-14_01.h5”文件中,会有 2013-10-14_01:00 和 2013-10-14_01:30.
因此,我在代码中想要的是知道关键字(与keep_all代码中的allcorr相关)来提取日期和小时并将每日格式替换为每小时格式。
已编辑..
if params.keep_all:
for ccfid in allcorr.keys():
final_data={}
for data_key in allcorr[ccfid]:
print('NET,STA,NET,STA,COMP,FILTERID,DATE', ccfid)
temp_date=dt.datetime.fromisoformat(data_key)
hh=temp_date.strftime('%H')
dh=temp_date.strftime('%Y-%m-%d %H')
dm=temp_date.strftime('%Y-%m-%d %H:%M')
data=allcorr[ccfid][data_key]
print('DATE AND HOUR', dh)
print('DATA_KEY', data_key)
print('ONLY HOUR',hh)
print('DATA RELATED TO THE DATE AND HOUR', allcorr[ccfid][data_key])
container = final_data.get(dh, False)
if not container:
container = []
final_data[dh] = container
container.extend(allcorr[ccfid][data_key])
print('THE LENGTH OF DATA', len(container))
export_allcorr2(db, ccfid, hh, container)
当我在代码中不包含 'container' 部分时,30 分钟数据覆盖 00 分钟数据。所以我 运行 带有容器部分的代码。结果..
NET,STA,NET,STA,COMP,FILTERID,DATE 05.SS01_05.SS01_ZZ_01_2013-10-08
DATE AND HOUR 2013-10-08 00
DATA_KEY 2013-10-08 00:00:00
ONLY HOUR 00
DATA RELATED TO THE DATE AND HOUR [ 9268.65717062 8616.97848119 7872.42382341 ..., 7872.42115785
8616.97759267 9268.6562821 ]
THE LENGTH OF DATA 4801
NET,STA,NET,STA,COMP,FILTERID,DATE 05.SS01_05.SS01_ZZ_01_2013-10-08
DATE AND HOUR 2013-10-08 00
DATA_KEY 2013-10-08 00:30:00
ONLY HOUR 00
DATA RELATED TO THE DATE AND HOUR [ 375.50442871 -1328.53555463 -3054.59036513 ..., -3054.58703318
-1328.53966403 375.50603915]
THE LENGTH OF DATA 9602
错误..
ValueError: Length mismatch: Expected axis has 9602 elements, new values have 4801 elements
您可以使用datetime
提取日、时等
inp = {'2013-10-04 11:00:00':["rand_stuff"],
'2013-10-04 04:00:00':["rand_stuff"]}
from datetime import datetime
for ts in inp.keys():
dt = datetime.strptime(ts,"%Y-%m-%d %H:%M:%S")
print(f"Date: {dt} Hour: {dt.hour} Day: {dt.day}")
Date: 2013-10-04 11:00:00 Hour: 11 Day: 4
Date: 2013-10-04 04:00:00 Hour: 4 Day: 4
这是我的解决方案:
import datetime as dt
import random as rnd
def fake_data():
temp = (rnd.random() * rnd.randint(10, 100)) * (-1 * rnd.choice([-1, 1]))
return round(temp, 6)
_max = 10
# creating some test data
sample = {
'2013-10-27 10:30:00' : [fake_data() for x in range(_max)],
'2013-10-27 16:00:00' : [fake_data() for x in range(_max)],
'2013-10-27 16:30:00' : [fake_data() for x in range(_max)],
'2013-10-27 11:00:00' : [fake_data() for x in range(_max)],
'2013-10-27 20:30:00' : [fake_data() for x in range(_max)],
}
def printer(data):
for key, value in data.items():
print(key)
print(value)
print("-------------------")
def main():
final_data = {}
for data_key in sample:
# create date from key
temp_date = dt.datetime.fromisoformat(data_key)
# extract only time data
temp_time = temp_date.time()
# if this hour is in our records
container = final_data.get(temp_time.hour, False)
# if not in the recors then record it
if not container:
container = []
# key can be string as well if you like
final_data[temp_time.hour] = container
# add the data the hours record
container.extend(sample[data_key])
printer(final_data)
main()
我把所有的东西都简单地加上评论,如果有什么令人困惑的地方或其他任何评论,我会编辑或回答。祝你好运!