进行循环计算

Doing a loop calculation

多多=

C1Date C1Type C2Date C2Type ..... C10Date CType10 PolDate
dd-mm-yyyy :Proposer NaT NaN NaT NaN dd-mm-yyyy
dd-mm-yyyy :Proposer NaT NaN NaT NaN dd-mm-yyyy
dd-mm-yyyy :Other dd-mm-yyyy Proposer NaT NaN dd-mm-yyyy
dd-mm-yyyy :Proposer NaT NaN NaT NaN dd-mm-yyyy
dd-mm-yyyy :Other dd-mm-yyyy Other NaT NaN dd-mm-yyyy

其中 C 指的是 Claim 等等。即连续最多 10 Claims

我需要确定是否有任何 Claims 来自 Proposer,并且对于这些声明,它们是否发生在 PolDate 的 3 年内(PolDate 是总是大于任何 Cdate)

我能够执行以下操作,但我无法在循环中进行日期减法:

CLM = {}

for i in range(1 , 11):
    

    CLM[i] = toto.loc[toto[f'C{i}Type'] == 'Proposer']
    
    #can't get this date subtraction to work within the loop. But can do the subtraction outside of the loop.

    CLM[i]['diff'] = (CLM[i]['PolDate'].sub(CLM[i][f'C{i}Date'], 
    axis=0)).dt.days
   
    use_cols = ['CustomerID',  f'C{i}Type', f'C{1}Date', 'PolDate  ']
    CLM[i] = CLM[i][use_cols]
    
    print("Claim:" + f'{i}' +" "+ str(CLM[i].shape))

错误:

A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

此外,无法进行 3 年比较:

if (CLM[1]['diff'] > 1095): 
    #1095 = (365 * 3):
    CLM[1]['CLMLAST3'] = 0
else:
    CLM[1]['diff'] = 1

错误:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

简而言之,试试这个,它对我有用:(不太了解 pandas 所以也许效率是你的领域,我只是从发布的代码中删除了错误)

CLM = {}

for i in range(1 , 11):
    

    CLM[i] = toto.loc[toto[f'C{i}Type'] == 'Proposer']
    
    #can't get this date subtraction to work within the loop. But can do the subtraction outside of the loop.

    **CLM.get(i).loc[:, 'diff'] = (pd.to_datetime(CLM[i]['PolDate'],format='%d-%m-%Y').sub(pd.to_datetime(CLM[i][f'C{i}Date'],format='%d-%m-%Y'))).dt.days**
   
    use_cols = ['CustomerID',  f'C{i}Type', f'C{1}Date', 'PolDate  ']
    CLM[i] = CLM[i][use_cols]
    
    print("Claim:" + f'{i}' +" "+ str(CLM[i].shape))

注意事项:

  1. 警告“试图在 DataFrame 的切片副本上设置值。尝试使用 .loc[row_indexer,col_indexer] = value instead" 也出现在这段代码中。因为 CLM[i]['diff'] 不同于 CLM[i].loc['diff']。请参阅此处:

  2. CLM[i]['PolDate'] 是字符串的“列表”,所以你不会从一个字符串中减去一个字符串,但是你可以减去一个 pandas 来自另一个的日期时间对象。因此,先将它们转换为 datetime 对象,然后再减去。

与您比较列表与值的额外问题相同,请参阅此 简而言之,您很可能想要这个:“if (CLM[1]['diff'].all() > 1095)”,因此它比较系列中的每个值,而不是整个系列与一个值。