循环遍历数据帧列表中的列时出现 TypeError 问题
Issue with TypeError when looping through columns in a list of data frames
我有一个数据框列表dataframes
一个名称列表keeplist
和一个字典Hydrocap
。
我试图根据列名 keeplist
遍历每个数据框的列,同时在列循环中应用 where 函数以将列中的值替换为字典值中的值 (对于其各自的键)如果它大于字典值。问题是我 运行 变成了 TypeError: '>=' not supported between instances of 'str' and 'int'
,我不确定如何解决这个问题。
keeplist = ['BOUND','GCOUL','CHIEF','ROCKY','WANAP','PRIRA','LGRAN','LMONU','ICEHA','MCNAR','DALLE']
HydroCap = {'BOUND':55000,'GCOUL':280000,'CHIEF':219000,'ROCKY':220000,'WANAP':161000,'PRIRA':162000,'LGRAN':130000,'LMONU':130000,'ICEHA':106000,'MCNAR':232000,'DALLE':375000}
for i in dataframes:
for c in i[keeplist]:
c = np.where(c >= HydroCap[c], HydroCap[c], c)
任何朝着正确方向的推动将不胜感激。我认为问题在于它期望 HydroCap[1]
而不是 HydroCap[c]
的索引值,但是,这是一种预感。
dataframe[0]
的前 7 列
Week Month Day Year BOUND GCOUL CHIEF \
0 1 8 5 1979 44999.896673 161241.036388 166497.578098
1 2 8 12 1979 15309.259762 58219.122747 63413.204052
2 3 8 19 1979 15316.965781 56072.024363 60606.956215
3 4 8 26 1979 14371.269016 58574.003087 63311.569888
import pandas as pd
import numpy as np
# Since I don't have all of the dataframes, I just use the sample you shared
df = pd.read_csv('dataframe.tsv', sep = "\t")
# Note, I've changed some values so you can see something actually happens
keeplist = ['BOUND','GCOUL','CHIEF']
HydroCap = {'BOUND':5500,'GCOUL':280000,'CHIEF':21900}
# The inside of the loop has been changed to accomplish the actual goal
# First, there are now two variables inside the loop: col, and c
# col is the column
# c represents a single element in that column at a time
# The code operates over a column at a time,
# using a list comprehension to cycle over each element
# and replace the full column with the new values at once
for col in df[keeplist]:
df[col] = [np.where(c >= HydroCap[col], HydroCap[col], c) for c in df[col]]
产生:
df
Week
Month
Day
Year
BOUND
GCOUL
CHIEF
0
1
8
5
1979
5500.0
161241.036388
21900.0
1
2
8
12
1979
5500.0
58219.122747
21900.0
2
3
8
19
1979
5500.0
56072.024363
21900.0
3
4
8
26
1979
5500.0
58574.003087
21900.0
为了替换数据框中的元素,您需要一次替换一整列,或者将值重新分配给由行和列坐标指定的单元格。在您的原始代码中重新分配 c
变量——假设它代表您想到的单元格值,而不是列名——不会改变数据框中的任何内容。
我有一个数据框列表dataframes
一个名称列表keeplist
和一个字典Hydrocap
。
我试图根据列名 keeplist
遍历每个数据框的列,同时在列循环中应用 where 函数以将列中的值替换为字典值中的值 (对于其各自的键)如果它大于字典值。问题是我 运行 变成了 TypeError: '>=' not supported between instances of 'str' and 'int'
,我不确定如何解决这个问题。
keeplist = ['BOUND','GCOUL','CHIEF','ROCKY','WANAP','PRIRA','LGRAN','LMONU','ICEHA','MCNAR','DALLE']
HydroCap = {'BOUND':55000,'GCOUL':280000,'CHIEF':219000,'ROCKY':220000,'WANAP':161000,'PRIRA':162000,'LGRAN':130000,'LMONU':130000,'ICEHA':106000,'MCNAR':232000,'DALLE':375000}
for i in dataframes:
for c in i[keeplist]:
c = np.where(c >= HydroCap[c], HydroCap[c], c)
任何朝着正确方向的推动将不胜感激。我认为问题在于它期望 HydroCap[1]
而不是 HydroCap[c]
的索引值,但是,这是一种预感。
dataframe[0]
Week Month Day Year BOUND GCOUL CHIEF \
0 1 8 5 1979 44999.896673 161241.036388 166497.578098
1 2 8 12 1979 15309.259762 58219.122747 63413.204052
2 3 8 19 1979 15316.965781 56072.024363 60606.956215
3 4 8 26 1979 14371.269016 58574.003087 63311.569888
import pandas as pd
import numpy as np
# Since I don't have all of the dataframes, I just use the sample you shared
df = pd.read_csv('dataframe.tsv', sep = "\t")
# Note, I've changed some values so you can see something actually happens
keeplist = ['BOUND','GCOUL','CHIEF']
HydroCap = {'BOUND':5500,'GCOUL':280000,'CHIEF':21900}
# The inside of the loop has been changed to accomplish the actual goal
# First, there are now two variables inside the loop: col, and c
# col is the column
# c represents a single element in that column at a time
# The code operates over a column at a time,
# using a list comprehension to cycle over each element
# and replace the full column with the new values at once
for col in df[keeplist]:
df[col] = [np.where(c >= HydroCap[col], HydroCap[col], c) for c in df[col]]
产生:
df
Week | Month | Day | Year | BOUND | GCOUL | CHIEF | |
---|---|---|---|---|---|---|---|
0 | 1 | 8 | 5 | 1979 | 5500.0 | 161241.036388 | 21900.0 |
1 | 2 | 8 | 12 | 1979 | 5500.0 | 58219.122747 | 21900.0 |
2 | 3 | 8 | 19 | 1979 | 5500.0 | 56072.024363 | 21900.0 |
3 | 4 | 8 | 26 | 1979 | 5500.0 | 58574.003087 | 21900.0 |
为了替换数据框中的元素,您需要一次替换一整列,或者将值重新分配给由行和列坐标指定的单元格。在您的原始代码中重新分配 c
变量——假设它代表您想到的单元格值,而不是列名——不会改变数据框中的任何内容。