python 从数组中的字典对象中提取日期和小时值

python extract date and hour value from dict object in array

我想提取日期和小时值(例如 2013-10-04 11、2013-10-04 04、2013-10-04 03 等)。当我使用 keys() 时,我可以获得 2013-10-04 而不是小时,而当我使用 items() 时,则可以提取所有数据。如果您知道提取日期和小时值的方法,请告诉我。此外,数据有 30 分钟的间隔。

u'Station_paris_2013-10-04': {'2013-10-04 11:00:00': array([ number,   number,   number, ...,
        number, number, number]), '2013-10-04 04:00:00': array([ number,   number, number, ...,
         number, number, number]), '2013-10-04 03:00:00': array([ number, number, number, ...,
        number, number, number]), '2013-10-04 14:30:00': array([ number, number, number, ...,
        number, number, number]), '2013-10-04 20:00:00': array([ number, ....]...)

抱歉造成混淆..

这是 export_allcorr2 的代码,使用 allcorr[ccfid] 作为数据。

 def export_allcorr2(session, ccfid, data):
        output_folder = get_config(session, 'output_folder')
        station1, station2, components, filterid, date = ccfid.split('_')
    
        path = os.path.join(output_folder, "%02i" % int(filterid),
                            station1, station2, components)
        if not os.path.isdir(path):
            os.makedirs(path)
    df = pd.DataFrame().from_dict(data).T
    df.columns = get_t_axis(session)
    df.to_hdf(os.path.join(path, date+'.h5'), 'data')
    del df
    return



if params.keep_all:
   for ccfid in allcorr.keys():
       export_allcorr2(db, ccfid, allcorr[ccfid])

这是我正在使用的一对电台的文件 (allcorr) 的一部分,

'2013-10-27 10:30:00': array([ 583.55720165,  424.74395062,  244.40351166, ...,  244.40364883,
        424.74411523,  583.55747599]), '2013-10-27 16:30:00': array([ 199.66430727,   18.39147977, -157.45584362, ..., -157.45602195,
         18.39139403,  199.66432099]), '2013-10-27 16:00:00': array([ -97.27305213, -365.27786008, -621.36060357, ..., -621.36076818,
       -365.27802469,  -97.27297668]), '2013-10-27 21:30:00': array([-436.08005487, -389.74776406, -327.61319616, ..., -327.61300412,
       -389.74773663, -436.07994513]), '2013-10-27 11:00:00': array([-649.70282579, -597.36164609, -523.04197531, ..., -523.04170096,
       -597.36131687, -649.70266118]), '2013-10-27 20:30:00': array([ 347.37681756,  218.49106996,   88.03422497, ...,   88.03427298,
        218.49113855,  347.37687243]), '2013-10-27 12:30:00': array([  34.91324417,  -93.73432099,  171.31466392, ...,  171.31396433,
        -93.73384088,   34.91361454]), '2013-10-27 13:30:00': array([-289.4951989 , -404.48175583, -501.02052126, ..., -501.02046639,
       -404.48170096, -289.49500686]), '2013-10-27 07:30:00': array([-108.69506859,  -44.65974623,    7.96771948, ...,    7.96728738,
        -44.65979424, -108.69509602]), '2013-10-27 09:30:00': array([-630.18035665, -614.95835391, -597.89119342, ..., -597.89113855,
       -614.95807956, -630.18024691]), '2013-10-27 17:00:00': array([-276.81805213, -267.21061728, -246.72584362, ..., -246.72556927,
       -267.21053498, -276.81794239]),

ccfid 显示 u'05.SS08_05.SS09_ZZ_01_2013-11-06' 即 'Net.Sta_Net.Sta_component_ccfid_date'。但我想要的是带小时的日期。

这是allcorr[ccfid]

 {'2013-11-07 07:30:00': array([  2.01912938e-08,  -5.87221879e-08,   7.99213765e-08, ...,
     9.93437383e-08,   4.46988525e-08,  -4.40811423e-08]), '2013-11-07 14:30:00': array([ -7.76317889e-09,   1.72162791e-09,   1.76833389e-08, ...,
    -4.17227052e-08,  -8.08114523e-09,   7.22184605e-09]), '2013-11-07 00:00:00': array([ -1.67720752e-08,  -4.86950919e-08,  -3.92029027e-08, ...,
    -4.25311992e-08,  -1.43883637e-08,  -1.86576377e-08]), '2013-11-07 16:00:00': array([ -1.54196405e-08,  -6.50798506e-08,  -3.71392759e-08, ...,
    -3.63095301e-08,   4.17709433e-10,  -1.11803857e-07]), '2013-11-07 15:30:00': array([ -4.30306800e-08,  -8.02815645e-08,   1.83716952e-08, ...,
    -3.71510132e-08,  -5.32969688e-08,   5.72185107e-08])

而在export_allcorr2代码中,我想要的是将数据文件格式从 Y-M-D 转换为 Y-M-D-H 或 Y-M-D H 格式。所以提取H(小时)数据,并加入相同站对、日期和小时的文件。

我想在同一时间在一个地方组织数组数据,16:00:00 和 16:30:00 也会在同一个地方。

最初,日期文件在 keep_all 代码中被提取为 'date.h5' 格式(例如 2013-10-14.h5)。此外,'2013-10-14.h5'中还有2013-10-14 00:00:00、2013-10-14 00:30:00、2013-10-14 01:00:00等.

因此,作为一种相同的格式,我想将小时提取为一个文件(例如 2013-10-14_01.h5、2013-10-14_02.h5)并在“2013-10-14_01.h5”文件中,会有 2013-10-14_01:00 和 2013-10-14_01:30.

因此,我在代码中想要的是知道关键字(与keep_all代码中的allcorr相关)来提取日期和小时并将每日格式替换为每小时格式。

已编辑..

    if params.keep_all:
    for ccfid in allcorr.keys():
        final_data={}
        for data_key in allcorr[ccfid]:
            print('NET,STA,NET,STA,COMP,FILTERID,DATE', ccfid)
            temp_date=dt.datetime.fromisoformat(data_key)
            hh=temp_date.strftime('%H')
            dh=temp_date.strftime('%Y-%m-%d %H')
            dm=temp_date.strftime('%Y-%m-%d %H:%M')
            data=allcorr[ccfid][data_key]
            print('DATE AND HOUR', dh)
            print('DATA_KEY', data_key)
            print('ONLY HOUR',hh)
            print('DATA RELATED TO THE DATE AND HOUR', allcorr[ccfid][data_key])
            container = final_data.get(dh, False)
            if not container:
                container = []
                final_data[dh] = container
            container.extend(allcorr[ccfid][data_key])
            print('THE LENGTH OF DATA', len(container))
            export_allcorr2(db, ccfid, hh, container)
                

当我在代码中不包含 'container' 部分时,30 分钟数据覆盖 00 分钟数据。所以我 运行 带有容器部分的代码。结果..

NET,STA,NET,STA,COMP,FILTERID,DATE 05.SS01_05.SS01_ZZ_01_2013-10-08
    DATE AND HOUR 2013-10-08 00
    DATA_KEY 2013-10-08 00:00:00
    ONLY HOUR 00
    DATA RELATED TO THE DATE AND HOUR [ 9268.65717062  8616.97848119  7872.42382341 ...,  7872.42115785
      8616.97759267  9268.6562821 ]
    THE LENGTH OF DATA 4801
    NET,STA,NET,STA,COMP,FILTERID,DATE 05.SS01_05.SS01_ZZ_01_2013-10-08
    DATE AND HOUR 2013-10-08 00
    DATA_KEY 2013-10-08 00:30:00
    ONLY HOUR 00
    DATA RELATED TO THE DATE AND HOUR [  375.50442871 -1328.53555463 -3054.59036513 ..., -3054.58703318
     -1328.53966403   375.50603915]
    THE LENGTH OF DATA 9602

错误..

ValueError: Length mismatch: Expected axis has 9602 elements, new values have 4801 elements

您可以使用datetime提取日、时等

inp = {'2013-10-04 11:00:00':["rand_stuff"],
      '2013-10-04 04:00:00':["rand_stuff"]}
from datetime import datetime
for ts in inp.keys():
    dt = datetime.strptime(ts,"%Y-%m-%d %H:%M:%S")
    print(f"Date: {dt} Hour: {dt.hour} Day: {dt.day}") 


Date: 2013-10-04 11:00:00 Hour: 11 Day: 4
Date: 2013-10-04 04:00:00 Hour: 4 Day: 4

这是我的解决方案:

import datetime as dt
import random as rnd

def fake_data():
    temp = (rnd.random() * rnd.randint(10, 100)) * (-1 * rnd.choice([-1, 1]))

    return round(temp, 6)

_max = 10

# creating some test data 
sample = {
    '2013-10-27 10:30:00' : [fake_data() for x in range(_max)],
    '2013-10-27 16:00:00' : [fake_data() for x in range(_max)],
    '2013-10-27 16:30:00' : [fake_data() for x in range(_max)],
    '2013-10-27 11:00:00' : [fake_data() for x in range(_max)],
    '2013-10-27 20:30:00' : [fake_data() for x in range(_max)],
    
    }


def printer(data):

    for key, value in data.items():
        print(key)
        print(value)
        print("-------------------")


def main():

    final_data = {}


    for data_key in sample:

        # create date from key 
        temp_date = dt.datetime.fromisoformat(data_key)

        # extract only time data
        temp_time = temp_date.time()

        # if this hour is in our records
        container = final_data.get(temp_time.hour, False)

        # if not in the recors then record it
        if not container:
            
            container = []
            
            # key can be string as well if you like
            final_data[temp_time.hour] = container

        # add the data the hours record
        container.extend(sample[data_key])

    printer(final_data)

main()

我把所有的东西都简单地加上评论,如果有什么令人困惑的地方或其他任何评论,我会编辑或回答。祝你好运!