将字符串列转换为 pandas 中的句点,保留字符串
Convert a string column to period in pandas preserving the string
我想了解是否可以将 string 列转换为 PeriodIndex(例如年份),保留字符串 (后缀).
我有以下数据框:
company date ... revenue taxes
Facebook 2017-01-01 00:00:00 Total ... 1796.00 0.00
Facebook 2018-07-01 00:00:00 Total ... 7423.20 -11.54
Facebook Total - ... 1704.00 0.00
Google 2017-12-01 00:00:00 Total ... 1938.60 -1938.60
Google 2018-12-01 00:00:00 Total ... 1403.47 -102.01
Google 2018-01-01 00:00:00 Total ... 2028.00 -76.38
Google Total - ... 800.00 -256.98
我正在尝试将 PeriodIndex 应用于 日期:
df['date'] = pd.PeriodIndex(df['date'].values, freq='Y')
但是,没有任何反应,因为 Pandas 无法将其转换为字符串。我无法从我的 DataFrame.
中删除总计一词
这是我期望达到的效果:
company date ... revenue taxes
Facebook 2017 Total ... 1796.00 0.00
Facebook 2018 Total ... 7423.20 -11.54
Facebook Total - ... 1704.00 0.00
Google 2017 Total ... 1938.60 -1938.60
Google 2018 Total ... 1403.47 -102.01
Google 2018 Total ... 2028.00 -76.38
Google Total - ... 800.00 -256.98
有什么办法可以解决这个问题?
谢谢!
假设有一个虚拟数据框,与您的相似:
dictionary = {'company' : ['Facebook', 'Facebook', 'Facebook_Total','Google','Google_Total'],
'date' : ['2019-09-14 09:00:08.279000+09:00 Total',
'2020-09-14 09:00:08.279000+09:00 Total',
'-',
'2021-09-14 09:00:08.279000+09:00 Total',
'-'],
'revenue' : [10,20,30,40,50]}
df = pd.DataFrame(dictionary)
我使用 regex
模块删除了 year 列后面的 Total 如下:
substring = ' Total'
for i in range(len(df)):
if re.search(substring, df['date'][i] , flags=re.IGNORECASE):
df['date'][i] = df['date'][i].replace(' Total','')
else: pass
然后,我使用 pd.PeriodIndex
如下:
for i in range(len(df)) :
if df['date'][i] == '-':
pass
else:
df['date'][i] = pd.PeriodIndex(pd.Series(df['date'][i]), freq='Y')[0]
for i in range(len(df)):
if df['date'][i] == '-':
pass
else:
df['date'][i] = str(df['date'][i]) + ' Total'
上面的代码returns:
Out[1]:
company date revenue
0 Facebook 2019 Total 10
1 Facebook 2020 Total 20
2 Facebook_Total - 30
3 Google 2021 Total 40
4 Google_Total - 50
我想了解是否可以将 string 列转换为 PeriodIndex(例如年份),保留字符串 (后缀).
我有以下数据框:
company date ... revenue taxes
Facebook 2017-01-01 00:00:00 Total ... 1796.00 0.00
Facebook 2018-07-01 00:00:00 Total ... 7423.20 -11.54
Facebook Total - ... 1704.00 0.00
Google 2017-12-01 00:00:00 Total ... 1938.60 -1938.60
Google 2018-12-01 00:00:00 Total ... 1403.47 -102.01
Google 2018-01-01 00:00:00 Total ... 2028.00 -76.38
Google Total - ... 800.00 -256.98
我正在尝试将 PeriodIndex 应用于 日期:
df['date'] = pd.PeriodIndex(df['date'].values, freq='Y')
但是,没有任何反应,因为 Pandas 无法将其转换为字符串。我无法从我的 DataFrame.
中删除总计一词这是我期望达到的效果:
company date ... revenue taxes
Facebook 2017 Total ... 1796.00 0.00
Facebook 2018 Total ... 7423.20 -11.54
Facebook Total - ... 1704.00 0.00
Google 2017 Total ... 1938.60 -1938.60
Google 2018 Total ... 1403.47 -102.01
Google 2018 Total ... 2028.00 -76.38
Google Total - ... 800.00 -256.98
有什么办法可以解决这个问题?
谢谢!
假设有一个虚拟数据框,与您的相似:
dictionary = {'company' : ['Facebook', 'Facebook', 'Facebook_Total','Google','Google_Total'],
'date' : ['2019-09-14 09:00:08.279000+09:00 Total',
'2020-09-14 09:00:08.279000+09:00 Total',
'-',
'2021-09-14 09:00:08.279000+09:00 Total',
'-'],
'revenue' : [10,20,30,40,50]}
df = pd.DataFrame(dictionary)
我使用 regex
模块删除了 year 列后面的 Total 如下:
substring = ' Total'
for i in range(len(df)):
if re.search(substring, df['date'][i] , flags=re.IGNORECASE):
df['date'][i] = df['date'][i].replace(' Total','')
else: pass
然后,我使用 pd.PeriodIndex
如下:
for i in range(len(df)) :
if df['date'][i] == '-':
pass
else:
df['date'][i] = pd.PeriodIndex(pd.Series(df['date'][i]), freq='Y')[0]
for i in range(len(df)):
if df['date'][i] == '-':
pass
else:
df['date'][i] = str(df['date'][i]) + ' Total'
上面的代码returns:
Out[1]:
company date revenue
0 Facebook 2019 Total 10
1 Facebook 2020 Total 20
2 Facebook_Total - 30
3 Google 2021 Total 40
4 Google_Total - 50