将字符串列转换为 pandas 中的句点，保留字符串

Question

我想了解是否可以将 string 列转换为 PeriodIndex（例如年份），保留字符串 (后缀).

我有以下数据框：

company            date                          ...       revenue         taxes
Facebook           2017-01-01 00:00:00 Total     ...       1796.00          0.00
Facebook           2018-07-01 00:00:00 Total     ...       7423.20        -11.54
Facebook Total     -                             ...       1704.00          0.00
Google             2017-12-01 00:00:00 Total     ...       1938.60      -1938.60
Google             2018-12-01 00:00:00 Total     ...       1403.47       -102.01
Google             2018-01-01 00:00:00 Total     ...       2028.00        -76.38
Google Total       -                             ...        800.00       -256.98

我正在尝试将 PeriodIndex 应用于日期:

df['date'] = pd.PeriodIndex(df['date'].values, freq='Y')

但是，没有任何反应，因为 Pandas 无法将其转换为字符串。我无法从我的 DataFrame.

中删除总计一词

这是我期望达到的效果：

company            date                          ...       revenue         taxes
Facebook           2017 Total                    ...       1796.00          0.00
Facebook           2018 Total                    ...       7423.20        -11.54
Facebook Total     -                             ...       1704.00          0.00
Google             2017 Total                    ...       1938.60      -1938.60
Google             2018 Total                    ...       1403.47       -102.01
Google             2018 Total                    ...       2028.00        -76.38
Google Total       -                             ...        800.00       -256.98

有什么办法可以解决这个问题？

谢谢！

Answer 1

假设有一个虚拟数据框，与您的相似：

dictionary = {'company' : ['Facebook', 'Facebook', 'Facebook_Total','Google','Google_Total'],
              'date' : ['2019-09-14 09:00:08.279000+09:00 Total',
                       '2020-09-14 09:00:08.279000+09:00 Total',
                       '-',
                       '2021-09-14 09:00:08.279000+09:00 Total',
                       '-'],
             'revenue' : [10,20,30,40,50]}
df = pd.DataFrame(dictionary)

我使用 regex 模块删除了 year 列后面的 Total 如下：

substring = ' Total'
for i in range(len(df)):
    if re.search(substring, df['date'][i] , flags=re.IGNORECASE):
        df['date'][i] = df['date'][i].replace(' Total','')
    else: pass

然后，我使用 pd.PeriodIndex 如下：

for i in range(len(df)) :
    if df['date'][i] == '-':
        pass
    else:
        df['date'][i] = pd.PeriodIndex(pd.Series(df['date'][i]), freq='Y')[0]
        
for i in range(len(df)):
    if df['date'][i] == '-':
        pass
    else:
        df['date'][i] = str(df['date'][i]) + ' Total'

上面的代码returns：

Out[1]: 
          company        date  revenue
0        Facebook  2019 Total       10
1        Facebook  2020 Total       20
2  Facebook_Total           -       30
3          Google  2021 Total       40
4    Google_Total           -       50

将字符串列转换为 pandas 中的句点，保留字符串

Convert a string column to period in pandas preserving the string

python

datetime

period

dataframe

pandas