计算一年中两个日期之间的年龄,其中一列具有单个日期而另一列具有 python 中的日期列表
Calculate age between two dates in year where one column has single date and other column has list of dates in python
我有两列,一列有单个日期,可能有日期列表,也可以是空列表。我想计算第一列和第二列所有日期之间的年龄差异。
column1 column2 result
11-01-2014 [1975-12-16, 1980-07-24] [39,34]
20-11-2014 [1985-08-05, 1983-03-16] [29,31]
26-12-2016 [1966-05-22, 1958-04-13] [50,58]
20-05-2016 [1981-04-21, 1983-12-25] [35,33]
01-01-2016 [1993-10-29, 1966-06-27] [23,50]
我有 column1
和 column2
作为输入,我希望以 result
.
的形式输出
使用DataFrame.explode
for column instead lists, so possible subtract years by Series.dt.year
,最后聚合list
s:
df['column1'] = pd.to_datetime(df['column1'], dayfirst=True)
df1 = df.explode('column2')
df1['column2'] = pd.to_datetime(df1['column2'])
df1['result'] = df1['column1'].dt.year.sub(df1['column2'].dt.year)
df = df1.groupby([df1.index, 'column1']).agg(list).reset_index(level=1)
print (df)
column1 column2 result
0 2014-01-11 [1975-12-16 00:00:00, 1980-07-24 00:00:00] [39, 34]
1 2014-11-20 [1985-08-05 00:00:00, 1983-03-16 00:00:00] [29, 31]
2 2016-12-26 [1966-05-22 00:00:00, 1958-04-13 00:00:00] [50, 58]
3 2016-05-20 [1981-04-21 00:00:00, 1983-12-25 00:00:00] [35, 33]
4 2016-01-01 [1993-10-29 00:00:00, 1966-06-27 00:00:00] [23, 50]
或者使用 lambda 函数将列表转换为日期时间:
df['column1'] = pd.to_datetime(df['column1'], dayfirst=True)
f = lambda x: [x['column1'].year - y.year for y in pd.to_datetime(x['column2'])]
df['result'] = df.apply(f, axis=1)
print (df)
column1 column2 result
0 2014-01-11 [1975-12-16, 1980-07-24] [39, 34]
1 2014-11-20 [1985-08-05, 1983-03-16] [29, 31]
2 2016-12-26 [1966-05-22, 1958-04-13] [50, 58]
3 2016-05-20 [1981-04-21, 1983-12-25] [35, 33]
4 2016-01-01 [1993-10-29, 1966-06-27] [23, 50]
我有两列,一列有单个日期,可能有日期列表,也可以是空列表。我想计算第一列和第二列所有日期之间的年龄差异。
column1 column2 result
11-01-2014 [1975-12-16, 1980-07-24] [39,34]
20-11-2014 [1985-08-05, 1983-03-16] [29,31]
26-12-2016 [1966-05-22, 1958-04-13] [50,58]
20-05-2016 [1981-04-21, 1983-12-25] [35,33]
01-01-2016 [1993-10-29, 1966-06-27] [23,50]
我有 column1
和 column2
作为输入,我希望以 result
.
使用DataFrame.explode
for column instead lists, so possible subtract years by Series.dt.year
,最后聚合list
s:
df['column1'] = pd.to_datetime(df['column1'], dayfirst=True)
df1 = df.explode('column2')
df1['column2'] = pd.to_datetime(df1['column2'])
df1['result'] = df1['column1'].dt.year.sub(df1['column2'].dt.year)
df = df1.groupby([df1.index, 'column1']).agg(list).reset_index(level=1)
print (df)
column1 column2 result
0 2014-01-11 [1975-12-16 00:00:00, 1980-07-24 00:00:00] [39, 34]
1 2014-11-20 [1985-08-05 00:00:00, 1983-03-16 00:00:00] [29, 31]
2 2016-12-26 [1966-05-22 00:00:00, 1958-04-13 00:00:00] [50, 58]
3 2016-05-20 [1981-04-21 00:00:00, 1983-12-25 00:00:00] [35, 33]
4 2016-01-01 [1993-10-29 00:00:00, 1966-06-27 00:00:00] [23, 50]
或者使用 lambda 函数将列表转换为日期时间:
df['column1'] = pd.to_datetime(df['column1'], dayfirst=True)
f = lambda x: [x['column1'].year - y.year for y in pd.to_datetime(x['column2'])]
df['result'] = df.apply(f, axis=1)
print (df)
column1 column2 result
0 2014-01-11 [1975-12-16, 1980-07-24] [39, 34]
1 2014-11-20 [1985-08-05, 1983-03-16] [29, 31]
2 2016-12-26 [1966-05-22, 1958-04-13] [50, 58]
3 2016-05-20 [1981-04-21, 1983-12-25] [35, 33]
4 2016-01-01 [1993-10-29, 1966-06-27] [23, 50]