如何修改数据框中新创建的列的多个行范围?
How do I modify multiple range of rows for a newly created Column in my dataframe?
我在尝试修改数据框中新创建的列的多个行值范围时遇到了问题,希望得到一些帮助。如果之前有人问过这个问题,我深表歉意,如果你能正确指出我,我将不胜感激 direction.I 我是 python 编码
的新手
因此,我从多家公司的损益电子表格中导入了一堆数据,这些数据合并为一个总和;并在进行上述修改以供进一步分析之前对其进行清理:
import pandas as pd
from tabulate import tabulate
dftabulate = lambda df:tabulate(df,headers='keys',tablefmt='psql')
CleanCols = [5,7,8,9,10,11,12,13,14,15,17]
SummaryRows = [0,39,44,58,62,79,87]
VA = pd.read_excel('Columnar BU P&L.xlsx', sheet_name = 'Variance by Co')
VA = VA[98:197]
VA = VA.iloc[:,CleanCols]
VA.columns = ['Expense','A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'All Companies']
VA = VA.dropna(axis = 0, how = 'any')
VA = VA.reset_index(drop = True)
VAtcols = VA.columns.drop('Expense')
VA[VAtcols] = VA[VAtcols].astype(int)
VM = VA.iloc[SummaryRows]
VA['Exp Category'] = 'NA'
print(dftabulate(VA.head()))
输出如下所示:
Expense A ... All Companies Exp Category
0 General and Administrative Expenses (G&A) -4550 ... 133886 NA
1 Communications -17 ... -4793 NA
2 Fuel - Travel 0 ... -1274 NA
3 Mileage & Auto 449 ... -251 NA
4 Travel 0 ... 1187 NA
我想要实现的是根据行索引将新创建的 Exp Category 列更改为多个值。例如,我想将 1:12 行更改为 Travel & Entertainment 等。当我使用以下代码创建此分类时,它不会引发错误,但不会更改 NA 分配给该列的值,我似乎无法弄清楚我在这里做错了什么。
VA[1:12]['Exp Category'] = 'Travel & Entertainment'
VA[13:18]['Exp Category'] = 'Office Supplies & Expenses'
VA[19:24]['Exp Category'] = 'Professional Fees'
VA[25:28]['Exp Category'] = 'Fees & Assessments'
VA[29:30]['Exp Category'] = 'IT Expense'
VA[31:32]['Exp Category'] = 'Bad Debt Expense'
VA[33:38]['Exp Category'] = 'Misc Expense'
VA[40:43]['Exp Category'] = 'Marketing Expenses'
VA[45:57]['Exp Category'] = 'Payroll & Related Expenses'
VA[59:61]['Exp Category'] = 'Utilities Expenses'
VA[63:69]['Exp Category'] = 'Equip Maint & Rental Expenses'
VA[70:78]['Exp Category'] = 'Mill Expenses'
VA[80:82]['Exp Category'] = 'Taxes'
VA[83:86]['Exp Category'] = 'Insurance'
VA[88:89]['Exp Category'] = 'Incentive Compensation'
VA[89:90]['Exp Category'] = 'Strategic Initiative'
输出仍然看起来像这样,带有关于返回视图与副本的警告消息:
Expense A ... All Companies Exp Category
0 General and Administrative Expenses (G&A) -4550 ... 133886 NA
1 Communications -17 ... -4793 NA
2 Fuel - Travel 0 ... -1274 NA
3 Mileage & Auto 449 ... -251 NA
4 Travel 0 ... 1187 NA
我试图查看 "SettingWithCopyWarning" 消息,但尽管阅读了 material 我还是不明白如何解决它,非常感谢任何反馈!
提前致谢!
使用 pd.loc 可能会达到您的要求:
示例数据帧
import pandas as pd
d = {'a': [1, 2, 3, 4],
'b': ['NA', 'NA', 'NA', 'NA']}
df = pd.DataFrame(data = d)
df
a b
0 1 NA
1 2 NA
2 3 NA
3 4 NA
将pd.loc应用到DataFrame
df.loc[0:2, 'b'] = 'Test'
df
a b
0 1 Test
1 2 Test
2 3 Test
3 4 NA
以您的数据为例
# Python indexing starts at 0, so row 1 = position 0
VA.loc[0:11, 'Exp Category'] = 'Travel & Entertainment'
希望对您有所帮助!
我在尝试修改数据框中新创建的列的多个行值范围时遇到了问题,希望得到一些帮助。如果之前有人问过这个问题,我深表歉意,如果你能正确指出我,我将不胜感激 direction.I 我是 python 编码
的新手因此,我从多家公司的损益电子表格中导入了一堆数据,这些数据合并为一个总和;并在进行上述修改以供进一步分析之前对其进行清理:
import pandas as pd
from tabulate import tabulate
dftabulate = lambda df:tabulate(df,headers='keys',tablefmt='psql')
CleanCols = [5,7,8,9,10,11,12,13,14,15,17]
SummaryRows = [0,39,44,58,62,79,87]
VA = pd.read_excel('Columnar BU P&L.xlsx', sheet_name = 'Variance by Co')
VA = VA[98:197]
VA = VA.iloc[:,CleanCols]
VA.columns = ['Expense','A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'All Companies']
VA = VA.dropna(axis = 0, how = 'any')
VA = VA.reset_index(drop = True)
VAtcols = VA.columns.drop('Expense')
VA[VAtcols] = VA[VAtcols].astype(int)
VM = VA.iloc[SummaryRows]
VA['Exp Category'] = 'NA'
print(dftabulate(VA.head()))
输出如下所示:
Expense A ... All Companies Exp Category
0 General and Administrative Expenses (G&A) -4550 ... 133886 NA
1 Communications -17 ... -4793 NA
2 Fuel - Travel 0 ... -1274 NA
3 Mileage & Auto 449 ... -251 NA
4 Travel 0 ... 1187 NA
我想要实现的是根据行索引将新创建的 Exp Category 列更改为多个值。例如,我想将 1:12 行更改为 Travel & Entertainment 等。当我使用以下代码创建此分类时,它不会引发错误,但不会更改 NA 分配给该列的值,我似乎无法弄清楚我在这里做错了什么。
VA[1:12]['Exp Category'] = 'Travel & Entertainment'
VA[13:18]['Exp Category'] = 'Office Supplies & Expenses'
VA[19:24]['Exp Category'] = 'Professional Fees'
VA[25:28]['Exp Category'] = 'Fees & Assessments'
VA[29:30]['Exp Category'] = 'IT Expense'
VA[31:32]['Exp Category'] = 'Bad Debt Expense'
VA[33:38]['Exp Category'] = 'Misc Expense'
VA[40:43]['Exp Category'] = 'Marketing Expenses'
VA[45:57]['Exp Category'] = 'Payroll & Related Expenses'
VA[59:61]['Exp Category'] = 'Utilities Expenses'
VA[63:69]['Exp Category'] = 'Equip Maint & Rental Expenses'
VA[70:78]['Exp Category'] = 'Mill Expenses'
VA[80:82]['Exp Category'] = 'Taxes'
VA[83:86]['Exp Category'] = 'Insurance'
VA[88:89]['Exp Category'] = 'Incentive Compensation'
VA[89:90]['Exp Category'] = 'Strategic Initiative'
输出仍然看起来像这样,带有关于返回视图与副本的警告消息:
Expense A ... All Companies Exp Category
0 General and Administrative Expenses (G&A) -4550 ... 133886 NA
1 Communications -17 ... -4793 NA
2 Fuel - Travel 0 ... -1274 NA
3 Mileage & Auto 449 ... -251 NA
4 Travel 0 ... 1187 NA
我试图查看 "SettingWithCopyWarning" 消息,但尽管阅读了 material 我还是不明白如何解决它,非常感谢任何反馈!
提前致谢!
使用 pd.loc 可能会达到您的要求:
示例数据帧
import pandas as pd
d = {'a': [1, 2, 3, 4],
'b': ['NA', 'NA', 'NA', 'NA']}
df = pd.DataFrame(data = d)
df
a b
0 1 NA
1 2 NA
2 3 NA
3 4 NA
将pd.loc应用到DataFrame
df.loc[0:2, 'b'] = 'Test'
df
a b
0 1 Test
1 2 Test
2 3 Test
3 4 NA
以您的数据为例
# Python indexing starts at 0, so row 1 = position 0
VA.loc[0:11, 'Exp Category'] = 'Travel & Entertainment'
希望对您有所帮助!