如何优化使用 xarray 的代码以获得更好的性能?

How can I optimize a code that is using xarray for better performance?

我正在尝试从我拥有的各种 .nc 文件中提取气候数据,但是这个过程花费了非常长的时间,我怀疑这与我每天都在尝试提取数据有关接下来的 79 年的六月、七月、八月。但我是一名新手程序员,我意识到我可能有一些疏忽(效率方面)可能会导致性能稍微好一些。

这是片段

def calculateTemp(coords, year, model):

    """
    takes in all coordinates of a line between two grid stations and the year

    converts the year into date

    takes average of temperature of each day of the month of June for each 
    coordinate and then takes average of all coordinates to find average temp
    for that line for the month of June
    """
    print(year)

    # coords represents a list of different sets of coordinates between two grids

    temp3 = 0       # sum of all temps of all coordinates
    for i in range(0, len(coords)):
        temp2 = 0
        counter = 0

        # this loop represents that the 15 years data is being extracted for  
        # each coordinate set and average of those 15 years is being taken
        for p in range(0, 15):   

            temp1 = 0       # sum of all temps for one coordinate in all days of June, tuly, august
            if year+ p < 100:
                # this loop represents the months of jun, jul, aug
                for j in range(6, 9):
                    # 30 days of each month
                    for k in range(1, 31):
                        if k < 10:

                            # this if-else makes a string of date
                            date = '20'+str(year+p)+'-0'+str(j)+'-0'+str(k)
                        else:
                            date = '20'+str(year+p)+'-0'+str(j)+'-'+str(k)

                        # there are 3 variants of the climate model
                        # for years upto 2040, between 2041-2070
                        # and between 2071 and 2099
                        # hence this if else block

                        if year+p < 41:   
                            temp1 += model[0]['tasmax'].sel(
                                lon=coords[i][1], lat=coords[i][0], time=date, method='nearest').data[0]
                        elif year+p >= 41 and year+p <71:
                            temp1 += model[1]['tasmax'].sel(
                                lon=coords[i][1], lat=coords[i][0], time=date, method='nearest').data[0]
                        else:
                            temp1 += model[2]['tasmax'].sel(
                                lon=coords[i][1], lat=coords[i][0], time=date, method='nearest').data[0]
                counter += 1
                avg = temp1/(len(range(0,30))*len(range(6,9)))
                temp2 += avg
        temp3 += temp2/counter
    Tamb = temp3/len(coords)

    return Tamb

有没有办法提高这段代码的性能并对其进行优化?

我刚刚将最内层的循环 k in range(1,31)j in range(6,9) 替换为字典理解,以从您的 model 生成所有日期和相应的值。然后简单地为 p 的每个值和 coords 中的每个 coord 对这些值进行平均。

试一试。字典应该使处理速度更快。还要检查平均值是否正是您在函数中计算它们的方式。

def build_date(year,p,j,k):
    return '20'+str(year+p)+'-0'+str(j)+'-0'+str(k) if k<10 else '20'+str(year+p)+'-0'+str(j)+'-'+str(k)



def calculateTemp(coords, year, model):

    func2 = lambda x,date:model[x]['tasmax'].sel(lon=coords[i][1], 
                                                 lat=coords[i][0], 
                                                 time=date, 
                                                 method='nearest').data[0]

    print(year)

    out = {}
    for i in range(len(coords)):
        inner = {}
        for p in range(0,15):

            if year + p < 100:
                dates = {build_date(year,p,j,k):func2(0,build_date(year,p,j,k)) if year+p<41 \
                         else func2(1,build_date(year,p,j,k)) if (year+p >= 41 and year+p <71) \
                         else func2(2,build_date(year,p,j,k))
                         for j in range(6,9) \
                         for k in range(1,31) }

                inner[p] = sum([v for k,v in dates.items()])/len(dates)

        out[i] = inner

    coord_averages = {k : sum(v.values())/len(v) for k,v in out.items() }
    Tamb = sum([v for k,v in coord_averages.items()])/len(coord_averages)
    return Tamb