将一列日期时间和字符串转换为 pandas 中的句点
Convert a column of datetime and strings to period in pandas
我想了解是否可以将具有混合类型(DateTime 和字符串)的列转换为 PeriodIndex(例如月份)。
我有以下数据框:
booking_date ... credit debit
None ... 10185.00 -10185.00
2017-01-01 00:00:00 ... 1796.00 0.00
2018-07-01 00:00:00 ... 7423.20 -11.54
2017-04-01 00:00:00 ... 1704.00 0.00
2017-12-01 00:00:00 ... 1938.60 -1938.60
2018-12-01 00:00:00 ... 1403.47 -102.01
2018-01-01 00:00:00 ... 2028.00 -76.38
2019-01-01 00:00:00 ... 800.00 -256.98
Total ... 10185.00 -10185.00
我正在尝试将 PeriodIndex 应用于 booking_date:
df['booking_date'] = pd.PeriodIndex(df['booking_date'].values, freq='M')
但是,我收到以下错误:
pandas._libs.tslibs.parsing.DateParseError: Unknown datetime string format, unable to parse: TOTAL
我能解决这个问题吗?
谢谢!
在这种情况下,您可能想要过滤掉总计行(可能还有 None,具体取决于它可能是什么)。
总数可能(显然我不知道确切的数据)可以通过将所有贷方/借方值相加得出,并且您可以随时再次这样做,因此如果您过滤总计,您不会丢失任何信息。为了保持尺寸清洁,您可能无论如何都不希望它出现在那里。总结一下,用df["credit"].sum()
像这样 booking_date 过滤总计 df = df[df["booking_date"] != "Total"]
有关筛选的更多信息:https://cmdlinetips.com/2018/02/how-to-subset-pandas-dataframe-based-on-values-of-a-column/
如果需要 Periods
只是不能和字符串混用:
df['booking_date'] = pd.to_datetime(df['booking_date'], errors='coerce').dt.to_period('m')
print (df)
booking_date ... credit debit
0 NaT ... 10185.00 -10185.00
1 2017-01 ... 1796.00 0.00
2 2018-07 ... 7423.20 -11.54
3 2017-04 ... 1704.00 0.00
4 2017-12 ... 1938.60 -1938.60
5 2018-12 ... 1403.47 -102.01
6 2018-01 ... 2028.00 -76.38
7 2019-01 ... 800.00 -256.98
8 NaT ... 10185.00 -10185.00
但有可能:
orig = df['booking_date']
df['booking_date'] = pd.to_datetime(df['booking_date'], errors='coerce').dt.to_period('m')
df.loc[df['booking_date'].isna(), 'booking_date'] = orig
print (df)
booking_date ... credit debit
0 None ... 10185.00 -10185.00
1 2017-01 ... 1796.00 0.00
2 2018-07 ... 7423.20 -11.54
3 2017-04 ... 1704.00 0.00
4 2017-12 ... 1938.60 -1938.60
5 2018-12 ... 1403.47 -102.01
6 2018-01 ... 2028.00 -76.38
7 2019-01 ... 800.00 -256.98
8 Total ... 10185.00 -10185.00
print (df['booking_date'].apply(type))
0 <class 'NoneType'>
1 <class 'pandas._libs.tslibs.period.Period'>
2 <class 'pandas._libs.tslibs.period.Period'>
3 <class 'pandas._libs.tslibs.period.Period'>
4 <class 'pandas._libs.tslibs.period.Period'>
5 <class 'pandas._libs.tslibs.period.Period'>
6 <class 'pandas._libs.tslibs.period.Period'>
7 <class 'pandas._libs.tslibs.period.Period'>
8 <class 'str'>
Name: booking_date, dtype: object
new = pd.to_datetime(df['booking_date'], errors='coerce').dt.to_period('m')
df['booking_date'] = np.where(new.isna(), df['booking_date'], new)
print (df)
booking_date ... credit debit
0 None ... 10185.00 -10185.00
1 2017-01 ... 1796.00 0.00
2 2018-07 ... 7423.20 -11.54
3 2017-04 ... 1704.00 0.00
4 2017-12 ... 1938.60 -1938.60
5 2018-12 ... 1403.47 -102.01
6 2018-01 ... 2028.00 -76.38
7 2019-01 ... 800.00 -256.98
8 Total ... 10185.00 -10185.00
print (df['booking_date'].apply(type))
0 <class 'NoneType'>
1 <class 'pandas._libs.tslibs.period.Period'>
2 <class 'pandas._libs.tslibs.period.Period'>
3 <class 'pandas._libs.tslibs.period.Period'>
4 <class 'pandas._libs.tslibs.period.Period'>
5 <class 'pandas._libs.tslibs.period.Period'>
6 <class 'pandas._libs.tslibs.period.Period'>
7 <class 'pandas._libs.tslibs.period.Period'>
8 <class 'str'>
Name: booking_date, dtype: object
我想了解是否可以将具有混合类型(DateTime 和字符串)的列转换为 PeriodIndex(例如月份)。
我有以下数据框:
booking_date ... credit debit
None ... 10185.00 -10185.00
2017-01-01 00:00:00 ... 1796.00 0.00
2018-07-01 00:00:00 ... 7423.20 -11.54
2017-04-01 00:00:00 ... 1704.00 0.00
2017-12-01 00:00:00 ... 1938.60 -1938.60
2018-12-01 00:00:00 ... 1403.47 -102.01
2018-01-01 00:00:00 ... 2028.00 -76.38
2019-01-01 00:00:00 ... 800.00 -256.98
Total ... 10185.00 -10185.00
我正在尝试将 PeriodIndex 应用于 booking_date:
df['booking_date'] = pd.PeriodIndex(df['booking_date'].values, freq='M')
但是,我收到以下错误:
pandas._libs.tslibs.parsing.DateParseError: Unknown datetime string format, unable to parse: TOTAL
我能解决这个问题吗?
谢谢!
在这种情况下,您可能想要过滤掉总计行(可能还有 None,具体取决于它可能是什么)。
总数可能(显然我不知道确切的数据)可以通过将所有贷方/借方值相加得出,并且您可以随时再次这样做,因此如果您过滤总计,您不会丢失任何信息。为了保持尺寸清洁,您可能无论如何都不希望它出现在那里。总结一下,用df["credit"].sum()
像这样 booking_date 过滤总计 df = df[df["booking_date"] != "Total"]
有关筛选的更多信息:https://cmdlinetips.com/2018/02/how-to-subset-pandas-dataframe-based-on-values-of-a-column/
如果需要 Periods
只是不能和字符串混用:
df['booking_date'] = pd.to_datetime(df['booking_date'], errors='coerce').dt.to_period('m')
print (df)
booking_date ... credit debit
0 NaT ... 10185.00 -10185.00
1 2017-01 ... 1796.00 0.00
2 2018-07 ... 7423.20 -11.54
3 2017-04 ... 1704.00 0.00
4 2017-12 ... 1938.60 -1938.60
5 2018-12 ... 1403.47 -102.01
6 2018-01 ... 2028.00 -76.38
7 2019-01 ... 800.00 -256.98
8 NaT ... 10185.00 -10185.00
但有可能:
orig = df['booking_date']
df['booking_date'] = pd.to_datetime(df['booking_date'], errors='coerce').dt.to_period('m')
df.loc[df['booking_date'].isna(), 'booking_date'] = orig
print (df)
booking_date ... credit debit
0 None ... 10185.00 -10185.00
1 2017-01 ... 1796.00 0.00
2 2018-07 ... 7423.20 -11.54
3 2017-04 ... 1704.00 0.00
4 2017-12 ... 1938.60 -1938.60
5 2018-12 ... 1403.47 -102.01
6 2018-01 ... 2028.00 -76.38
7 2019-01 ... 800.00 -256.98
8 Total ... 10185.00 -10185.00
print (df['booking_date'].apply(type))
0 <class 'NoneType'>
1 <class 'pandas._libs.tslibs.period.Period'>
2 <class 'pandas._libs.tslibs.period.Period'>
3 <class 'pandas._libs.tslibs.period.Period'>
4 <class 'pandas._libs.tslibs.period.Period'>
5 <class 'pandas._libs.tslibs.period.Period'>
6 <class 'pandas._libs.tslibs.period.Period'>
7 <class 'pandas._libs.tslibs.period.Period'>
8 <class 'str'>
Name: booking_date, dtype: object
new = pd.to_datetime(df['booking_date'], errors='coerce').dt.to_period('m')
df['booking_date'] = np.where(new.isna(), df['booking_date'], new)
print (df)
booking_date ... credit debit
0 None ... 10185.00 -10185.00
1 2017-01 ... 1796.00 0.00
2 2018-07 ... 7423.20 -11.54
3 2017-04 ... 1704.00 0.00
4 2017-12 ... 1938.60 -1938.60
5 2018-12 ... 1403.47 -102.01
6 2018-01 ... 2028.00 -76.38
7 2019-01 ... 800.00 -256.98
8 Total ... 10185.00 -10185.00
print (df['booking_date'].apply(type))
0 <class 'NoneType'>
1 <class 'pandas._libs.tslibs.period.Period'>
2 <class 'pandas._libs.tslibs.period.Period'>
3 <class 'pandas._libs.tslibs.period.Period'>
4 <class 'pandas._libs.tslibs.period.Period'>
5 <class 'pandas._libs.tslibs.period.Period'>
6 <class 'pandas._libs.tslibs.period.Period'>
7 <class 'pandas._libs.tslibs.period.Period'>
8 <class 'str'>
Name: booking_date, dtype: object