将一列日期时间和字符串转换为 pandas 中的句点

Convert a column of datetime and strings to period in pandas

我想了解是否可以将具有混合类型(DateTime 和字符串)的列转换为 PeriodIndex(例如月份)。

我有以下数据框:

booking_date          ...      credit           debit
None                  ...      10185.00     -10185.00
2017-01-01 00:00:00   ...       1796.00          0.00
2018-07-01 00:00:00   ...       7423.20        -11.54
2017-04-01 00:00:00   ...       1704.00          0.00
2017-12-01 00:00:00   ...       1938.60      -1938.60
2018-12-01 00:00:00   ...       1403.47       -102.01
2018-01-01 00:00:00   ...       2028.00        -76.38
2019-01-01 00:00:00   ...        800.00       -256.98
Total                 ...      10185.00     -10185.00

我正在尝试将 PeriodIndex 应用于 booking_date:

df['booking_date'] = pd.PeriodIndex(df['booking_date'].values, freq='M')

但是,我收到以下错误:

pandas._libs.tslibs.parsing.DateParseError: Unknown datetime string format, unable to parse: TOTAL

我能解决这个问题吗?

谢谢!

在这种情况下,您可能想要过滤掉总计行(可能还有 None,具体取决于它可能是什么)。 总数可能(显然我不知道确切的数据)可以通过将所有贷方/借方值相加得出,并且您可以随时再次这样做,因此如果您过滤总计,您不会丢失任何信息。为了保持尺寸清洁,您可能无论如何都不希望它出现在那里。总结一下,用df["credit"].sum()

像这样 booking_date 过滤总计 df = df[df["booking_date"] != "Total"]

有关筛选的更多信息:https://cmdlinetips.com/2018/02/how-to-subset-pandas-dataframe-based-on-values-of-a-column/

如果需要 Periods 只是不能和字符串混用:

df['booking_date'] = pd.to_datetime(df['booking_date'], errors='coerce').dt.to_period('m')
print (df)
  booking_date  ...    credit     debit
0          NaT  ...  10185.00 -10185.00
1      2017-01  ...   1796.00      0.00
2      2018-07  ...   7423.20    -11.54
3      2017-04  ...   1704.00      0.00
4      2017-12  ...   1938.60  -1938.60
5      2018-12  ...   1403.47   -102.01
6      2018-01  ...   2028.00    -76.38
7      2019-01  ...    800.00   -256.98
8          NaT  ...  10185.00 -10185.00

但有可能:

orig = df['booking_date']

df['booking_date'] = pd.to_datetime(df['booking_date'], errors='coerce').dt.to_period('m')

df.loc[df['booking_date'].isna(), 'booking_date'] = orig
print (df)
  booking_date  ...    credit     debit
0         None  ...  10185.00 -10185.00
1      2017-01  ...   1796.00      0.00
2      2018-07  ...   7423.20    -11.54
3      2017-04  ...   1704.00      0.00
4      2017-12  ...   1938.60  -1938.60
5      2018-12  ...   1403.47   -102.01
6      2018-01  ...   2028.00    -76.38
7      2019-01  ...    800.00   -256.98
8        Total  ...  10185.00 -10185.00

print (df['booking_date'].apply(type))
0                             <class 'NoneType'>
1    <class 'pandas._libs.tslibs.period.Period'>
2    <class 'pandas._libs.tslibs.period.Period'>
3    <class 'pandas._libs.tslibs.period.Period'>
4    <class 'pandas._libs.tslibs.period.Period'>
5    <class 'pandas._libs.tslibs.period.Period'>
6    <class 'pandas._libs.tslibs.period.Period'>
7    <class 'pandas._libs.tslibs.period.Period'>
8                                  <class 'str'>
Name: booking_date, dtype: object

new = pd.to_datetime(df['booking_date'], errors='coerce').dt.to_period('m')

df['booking_date'] = np.where(new.isna(), df['booking_date'], new)
print (df)
  booking_date  ...    credit     debit
0         None  ...  10185.00 -10185.00
1      2017-01  ...   1796.00      0.00
2      2018-07  ...   7423.20    -11.54
3      2017-04  ...   1704.00      0.00
4      2017-12  ...   1938.60  -1938.60
5      2018-12  ...   1403.47   -102.01
6      2018-01  ...   2028.00    -76.38
7      2019-01  ...    800.00   -256.98
8        Total  ...  10185.00 -10185.00

print (df['booking_date'].apply(type))
0                             <class 'NoneType'>
1    <class 'pandas._libs.tslibs.period.Period'>
2    <class 'pandas._libs.tslibs.period.Period'>
3    <class 'pandas._libs.tslibs.period.Period'>
4    <class 'pandas._libs.tslibs.period.Period'>
5    <class 'pandas._libs.tslibs.period.Period'>
6    <class 'pandas._libs.tslibs.period.Period'>
7    <class 'pandas._libs.tslibs.period.Period'>
8                                  <class 'str'>
Name: booking_date, dtype: object