将字符串列转换为 pandas 中的句点,保留字符串

Convert a string column to period in pandas preserving the string

我想了解是否可以将 string 列转换为 PeriodIndex(例如年份),保留字符串 (后缀).

我有以下数据框:

company            date                          ...       revenue         taxes
Facebook           2017-01-01 00:00:00 Total     ...       1796.00          0.00
Facebook           2018-07-01 00:00:00 Total     ...       7423.20        -11.54
Facebook Total     -                             ...       1704.00          0.00
Google             2017-12-01 00:00:00 Total     ...       1938.60      -1938.60
Google             2018-12-01 00:00:00 Total     ...       1403.47       -102.01
Google             2018-01-01 00:00:00 Total     ...       2028.00        -76.38
Google Total       -                             ...        800.00       -256.98

我正在尝试将 PeriodIndex 应用于 日期:

df['date'] = pd.PeriodIndex(df['date'].values, freq='Y')

但是,没有任何反应,因为 Pandas 无法将其转换为字符串。我无法从我的 DataFrame.

中删除总计一词

这是我期望达到的效果:

company            date                          ...       revenue         taxes
Facebook           2017 Total                    ...       1796.00          0.00
Facebook           2018 Total                    ...       7423.20        -11.54
Facebook Total     -                             ...       1704.00          0.00
Google             2017 Total                    ...       1938.60      -1938.60
Google             2018 Total                    ...       1403.47       -102.01
Google             2018 Total                    ...       2028.00        -76.38
Google Total       -                             ...        800.00       -256.98

有什么办法可以解决这个问题?

谢谢!

假设有一个虚拟数据框,与您的相似:

dictionary = {'company' : ['Facebook', 'Facebook', 'Facebook_Total','Google','Google_Total'],
              'date' : ['2019-09-14 09:00:08.279000+09:00 Total',
                       '2020-09-14 09:00:08.279000+09:00 Total',
                       '-',
                       '2021-09-14 09:00:08.279000+09:00 Total',
                       '-'],
             'revenue' : [10,20,30,40,50]}
df = pd.DataFrame(dictionary)

我使用 regex 模块删除了 year 列后面的 Total 如下:

substring = ' Total'
for i in range(len(df)):
    if re.search(substring, df['date'][i] , flags=re.IGNORECASE):
        df['date'][i] = df['date'][i].replace(' Total','')
    else: pass 

然后,我使用 pd.PeriodIndex 如下:

for i in range(len(df)) :
    if df['date'][i] == '-':
        pass
    else:
        df['date'][i] = pd.PeriodIndex(pd.Series(df['date'][i]), freq='Y')[0]
        
for i in range(len(df)):
    if df['date'][i] == '-':
        pass
    else:
        df['date'][i] = str(df['date'][i]) + ' Total'

上面的代码returns:

Out[1]: 
          company        date  revenue
0        Facebook  2019 Total       10
1        Facebook  2020 Total       20
2  Facebook_Total           -       30
3          Google  2021 Total       40
4    Google_Total           -       50