基于切片更新 pandas 数据帧?
Update pandas dataframe based on slice?
我已经看到 - 但我找不到适合我的用例的答案。
考虑这段代码,其中我的起始 table 包含“通道”和“值”列:
import sys
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
import pandas as pd
TESTDATA = StringIO("""channel,value
A,10
A,11
A,12
A,13
B,20
B,22
B,24
B,26
B,28
C,100
C,105
C,110
C,115
C,120
C,125
C,130
""")
mychans = ["A", "B", "C"]
df = pd.read_csv(TESTDATA)
df.insert (2, "value_rel", df["value"] - df["value"][0])
print("Starting:")
print(df.head())
for tchan in mychans:
this_ch_data = df[df["channel"]==tchan]
df.loc[this_ch_data.index, "value_rel"] = this_ch_data["value"] - this_ch_data["value"][0]
最后,我想获得相同的 table 和一个额外的“value_rel”列,该列将显示相对于该通道中第一个值 的值(切片);即:
A, 10, 0
A, 11, 1
A, 12, 2
A, 13, 3
B, 20, 0
B, 22, 2
B, 24, 4
B, 26, 6
B, 28, 8
C,100, 0
C,105, 5
...
如果我只是在 for
循环中使用 this_ch_data["value_rel"] = this_ch_data["value"] - this_ch_data["value"][0]
,我会得到“试图在 DataFrame 的切片副本上设置一个值",这是有道理的。
然而,当 运行 代码时,我得到:
$ python3 test1.py
Starting:
channel value value_rel
0 A 10 0
1 A 11 1
2 A 12 2
3 A 13 3
4 B 20 10
Traceback (most recent call last):
File "C:/msys64/mingw64/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 2131, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 2140, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\msys64\tmp\test1.py", line 38, in <module>
df.loc[this_ch_data.index, "value_rel"] = this_ch_data["value"] - this_ch_data["value"][0]
File "C:/msys64/mingw64/lib/python3.9/site-packages/pandas/core/series.py", line 942, in __getitem__
return self._get_value(key)
File "C:/msys64/mingw64/lib/python3.9/site-packages/pandas/core/series.py", line 1051, in _get_value
loc = self.index.get_loc(label)
File "C:/msys64/mingw64/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 0
那么,我如何根据对同一 DataFrame 的(复制的)切片进行的计算来更新此 DataFrame?
您需要使用iloc
,因为索引号0不存在于所有tchan
。
for tchan in mychans:
this_ch_data = df[df["channel"]==tchan]
df.loc[this_ch_data.index, "value_rel"] = \
this_ch_data["value"] - this_ch_data["value"].iloc[0]
也就是说,groupby.transform
和 first
是很好的用例。所以不需要循环,你可以做
df['value_rel'] = df['value'] - df.groupby('channel')['value'].transform('first')
您可以应用自定义函数将通道组减去组的第一个值。
df['value_rel'] = df.groupby('channel')['value'].apply(lambda x: x - x.iloc[0])
print(df)
# Output:
channel value value_rel
0 A 10 0
1 A 11 1
2 A 12 2
3 A 13 3
4 B 20 0
5 B 22 2
6 B 24 4
7 B 26 6
8 B 28 8
9 C 100 0
10 C 105 5
11 C 110 10
12 C 115 15
13 C 120 20
14 C 125 25
15 C 130 30
我已经看到
考虑这段代码,其中我的起始 table 包含“通道”和“值”列:
import sys
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
import pandas as pd
TESTDATA = StringIO("""channel,value
A,10
A,11
A,12
A,13
B,20
B,22
B,24
B,26
B,28
C,100
C,105
C,110
C,115
C,120
C,125
C,130
""")
mychans = ["A", "B", "C"]
df = pd.read_csv(TESTDATA)
df.insert (2, "value_rel", df["value"] - df["value"][0])
print("Starting:")
print(df.head())
for tchan in mychans:
this_ch_data = df[df["channel"]==tchan]
df.loc[this_ch_data.index, "value_rel"] = this_ch_data["value"] - this_ch_data["value"][0]
最后,我想获得相同的 table 和一个额外的“value_rel”列,该列将显示相对于该通道中第一个值 的值(切片);即:
A, 10, 0
A, 11, 1
A, 12, 2
A, 13, 3
B, 20, 0
B, 22, 2
B, 24, 4
B, 26, 6
B, 28, 8
C,100, 0
C,105, 5
...
如果我只是在 for
循环中使用 this_ch_data["value_rel"] = this_ch_data["value"] - this_ch_data["value"][0]
,我会得到“试图在 DataFrame 的切片副本上设置一个值",这是有道理的。
然而,当 运行 代码时,我得到:
$ python3 test1.py
Starting:
channel value value_rel
0 A 10 0
1 A 11 1
2 A 12 2
3 A 13 3
4 B 20 10
Traceback (most recent call last):
File "C:/msys64/mingw64/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 2131, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 2140, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\msys64\tmp\test1.py", line 38, in <module>
df.loc[this_ch_data.index, "value_rel"] = this_ch_data["value"] - this_ch_data["value"][0]
File "C:/msys64/mingw64/lib/python3.9/site-packages/pandas/core/series.py", line 942, in __getitem__
return self._get_value(key)
File "C:/msys64/mingw64/lib/python3.9/site-packages/pandas/core/series.py", line 1051, in _get_value
loc = self.index.get_loc(label)
File "C:/msys64/mingw64/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 0
那么,我如何根据对同一 DataFrame 的(复制的)切片进行的计算来更新此 DataFrame?
您需要使用iloc
,因为索引号0不存在于所有tchan
。
for tchan in mychans:
this_ch_data = df[df["channel"]==tchan]
df.loc[this_ch_data.index, "value_rel"] = \
this_ch_data["value"] - this_ch_data["value"].iloc[0]
也就是说,groupby.transform
和 first
是很好的用例。所以不需要循环,你可以做
df['value_rel'] = df['value'] - df.groupby('channel')['value'].transform('first')
您可以应用自定义函数将通道组减去组的第一个值。
df['value_rel'] = df.groupby('channel')['value'].apply(lambda x: x - x.iloc[0])
print(df)
# Output:
channel value value_rel
0 A 10 0
1 A 11 1
2 A 12 2
3 A 13 3
4 B 20 0
5 B 22 2
6 B 24 4
7 B 26 6
8 B 28 8
9 C 100 0
10 C 105 5
11 C 110 10
12 C 115 15
13 C 120 20
14 C 125 25
15 C 130 30