计算一年中两个日期之间的年龄,其中一列具有单个日期而另一列具有 python 中的日期列表

Calculate age between two dates in year where one column has single date and other column has list of dates in python

我有两列,一列有单个日期,可能有日期列表,也可以是空列表。我想计算第一列和第二列所有日期之间的年龄差异。

 column1             column2                     result

11-01-2014        [1975-12-16, 1980-07-24]      [39,34]
20-11-2014        [1985-08-05, 1983-03-16]      [29,31]
26-12-2016        [1966-05-22, 1958-04-13]      [50,58]
20-05-2016        [1981-04-21, 1983-12-25]      [35,33]
01-01-2016        [1993-10-29, 1966-06-27]      [23,50]

我有 column1column2 作为输入,我希望以 result.

的形式输出

使用DataFrame.explode for column instead lists, so possible subtract years by Series.dt.year,最后聚合lists:

df['column1'] = pd.to_datetime(df['column1'], dayfirst=True)
df1 = df.explode('column2')
df1['column2'] = pd.to_datetime(df1['column2'])

df1['result'] = df1['column1'].dt.year.sub(df1['column2'].dt.year)

df = df1.groupby([df1.index, 'column1']).agg(list).reset_index(level=1)
print (df)
     column1                                     column2    result
0 2014-01-11  [1975-12-16 00:00:00, 1980-07-24 00:00:00]  [39, 34]
1 2014-11-20  [1985-08-05 00:00:00, 1983-03-16 00:00:00]  [29, 31]
2 2016-12-26  [1966-05-22 00:00:00, 1958-04-13 00:00:00]  [50, 58]
3 2016-05-20  [1981-04-21 00:00:00, 1983-12-25 00:00:00]  [35, 33]
4 2016-01-01  [1993-10-29 00:00:00, 1966-06-27 00:00:00]  [23, 50]

或者使用 lambda 函数将列表转换为日期时间:

df['column1'] = pd.to_datetime(df['column1'], dayfirst=True)

f = lambda x: [x['column1'].year - y.year for y in  pd.to_datetime(x['column2'])]
df['result'] = df.apply(f, axis=1)

print (df)
     column1                   column2    result
0 2014-01-11  [1975-12-16, 1980-07-24]  [39, 34]
1 2014-11-20  [1985-08-05, 1983-03-16]  [29, 31]
2 2016-12-26  [1966-05-22, 1958-04-13]  [50, 58]
3 2016-05-20  [1981-04-21, 1983-12-25]  [35, 33]
4 2016-01-01  [1993-10-29, 1966-06-27]  [23, 50]