如何根据列名、另一列中的值和索引范围替换数据框中列中的值?
How to substitute values in a column in a dataframe based on its column name, values in another column and index range?
我有一个具有这些特征的数据框(索引是浮点值):
import pandas as pd
d = {'A': [1,2,3,4,5,6,7,8,9,10],
'B': [1,2,3,4,5,6,7,8,9,10],
'C': [1,2,3,4,5,6,7,8,9,10],
'D': ['one','one','one','one','one','two','two','two','two','two']}
df = pd.DataFrame(data=d)
df
A B C D
50.0 1 1 1 one
50.2 2 2 2 one
50.4 3 3 3 one
50.6 4 4 4 one
50.8 5 5 5 one
51.0 6 6 6 two
51.2 7 7 7 two
51.4 8 8 8 two
51.6 9 9 9 two
51.8 10 10 10 two
以及具有这些值的偏移量列表(它们也是浮点数):
offsets = [[0.4, 0.6, 0.8], [0.2, 0.4, 0.6]]
我需要在 A、B 和 C 列上遍历我的数据框,从 D 列中选择分类值,根据它们与我的列表,结果是这样的数据框:
A B C D
50.0 1 1 1 one
50.2 2 2 nan one
50.4 3 nan nan one
50.6 nan nan nan one
50.8 nan nan nan one
51.0 6 6 6 two
51.2 7 7 7 two
51.4 8 8 nan two
51.6 9 nan nan two
51.8 nan nan nan two
偏移量的值是指从下往上必须设置nan的值。例如:offsets[0][0]=0.4,那么对于A列当D=='one'时,自下而上的两个值必须设置为nan(第4行和第3行,50.8-0.4 = 50.4 - 50.4 不变)。对于A,当D == 'two'时,offsets[1][0]=0.2,所以自下而上的一个值必须设置为nan(第9行,51.8-0.2 = 51.6 - 51.6不变). Offsets[1][0]=0.6,所以对于B列,当D=='one'时,自下而上的三个值必须设置为nan(第4、3、2行,50.8-0.6=50.2- 50.2 不变)。对于B,当D == 'two'时,offsets[1][1]=0.4,所以自下而上的两个值必须设置为nan(第9行和第8行,51.8-0.4 = 51.4 - 51.4不'改变)。 C 列相同。
知道怎么做吗?快速评论 - 我想在数据框本身中替换这些值,而不创建新值。
一种方法是使用 apply
将每列的最后一个值设置为 NaN:
import pandas as pd
# toy data
df = pd.DataFrame(data={'A': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'B': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'C': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'D': ['one', 'one', 'one', 'one', 'one', 'two', 'two', 'two', 'two', 'two']})
offsets = [2, 3, 4]
offset_lookup = dict(zip(df.columns[:3], offsets))
def funny_shift(x, ofs=None):
"""This function shift each column by the given offset in the ofs parameter"""
for column, offset in ofs.items():
x.loc[x.index[-1 * offset:], column] = None
return x
df.loc[:, ["A", "B", "C"]] = df.groupby("D").apply(funny_shift, ofs=offset_lookup)
print(df)
输出
A B C D
0 1.0 1.0 1.0 one
1 2.0 2.0 NaN one
2 3.0 NaN NaN one
3 NaN NaN NaN one
4 NaN NaN NaN one
5 6.0 6.0 6.0 two
6 7.0 7.0 NaN two
7 8.0 NaN NaN two
8 NaN NaN NaN two
9 NaN NaN NaN two
更新
如果每个组有多个更新,您可以这样做:
offsets = [[2, 3, 4], [1, 2, 3]]
offset_lookup = (dict(zip(df.columns[:3], offset)) for offset in offsets)
def funny_shift(x, ofs=None):
"""This function shift each column by the given offset in the ofs parameter"""
current = next(ofs)
for column, offset in current.items():
x.loc[x.index[-1 * offset:], column] = None
return x
df.loc[:, ["A", "B", "C"]] = df.groupby("D").apply(funny_shift, ofs=offset_lookup)
print(df)
我有一个具有这些特征的数据框(索引是浮点值):
import pandas as pd
d = {'A': [1,2,3,4,5,6,7,8,9,10],
'B': [1,2,3,4,5,6,7,8,9,10],
'C': [1,2,3,4,5,6,7,8,9,10],
'D': ['one','one','one','one','one','two','two','two','two','two']}
df = pd.DataFrame(data=d)
df
A B C D
50.0 1 1 1 one
50.2 2 2 2 one
50.4 3 3 3 one
50.6 4 4 4 one
50.8 5 5 5 one
51.0 6 6 6 two
51.2 7 7 7 two
51.4 8 8 8 two
51.6 9 9 9 two
51.8 10 10 10 two
以及具有这些值的偏移量列表(它们也是浮点数):
offsets = [[0.4, 0.6, 0.8], [0.2, 0.4, 0.6]]
我需要在 A、B 和 C 列上遍历我的数据框,从 D 列中选择分类值,根据它们与我的列表,结果是这样的数据框:
A B C D
50.0 1 1 1 one
50.2 2 2 nan one
50.4 3 nan nan one
50.6 nan nan nan one
50.8 nan nan nan one
51.0 6 6 6 two
51.2 7 7 7 two
51.4 8 8 nan two
51.6 9 nan nan two
51.8 nan nan nan two
偏移量的值是指从下往上必须设置nan的值。例如:offsets[0][0]=0.4,那么对于A列当D=='one'时,自下而上的两个值必须设置为nan(第4行和第3行,50.8-0.4 = 50.4 - 50.4 不变)。对于A,当D == 'two'时,offsets[1][0]=0.2,所以自下而上的一个值必须设置为nan(第9行,51.8-0.2 = 51.6 - 51.6不变). Offsets[1][0]=0.6,所以对于B列,当D=='one'时,自下而上的三个值必须设置为nan(第4、3、2行,50.8-0.6=50.2- 50.2 不变)。对于B,当D == 'two'时,offsets[1][1]=0.4,所以自下而上的两个值必须设置为nan(第9行和第8行,51.8-0.4 = 51.4 - 51.4不'改变)。 C 列相同。
知道怎么做吗?快速评论 - 我想在数据框本身中替换这些值,而不创建新值。
一种方法是使用 apply
将每列的最后一个值设置为 NaN:
import pandas as pd
# toy data
df = pd.DataFrame(data={'A': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'B': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'C': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'D': ['one', 'one', 'one', 'one', 'one', 'two', 'two', 'two', 'two', 'two']})
offsets = [2, 3, 4]
offset_lookup = dict(zip(df.columns[:3], offsets))
def funny_shift(x, ofs=None):
"""This function shift each column by the given offset in the ofs parameter"""
for column, offset in ofs.items():
x.loc[x.index[-1 * offset:], column] = None
return x
df.loc[:, ["A", "B", "C"]] = df.groupby("D").apply(funny_shift, ofs=offset_lookup)
print(df)
输出
A B C D
0 1.0 1.0 1.0 one
1 2.0 2.0 NaN one
2 3.0 NaN NaN one
3 NaN NaN NaN one
4 NaN NaN NaN one
5 6.0 6.0 6.0 two
6 7.0 7.0 NaN two
7 8.0 NaN NaN two
8 NaN NaN NaN two
9 NaN NaN NaN two
更新
如果每个组有多个更新,您可以这样做:
offsets = [[2, 3, 4], [1, 2, 3]]
offset_lookup = (dict(zip(df.columns[:3], offset)) for offset in offsets)
def funny_shift(x, ofs=None):
"""This function shift each column by the given offset in the ofs parameter"""
current = next(ofs)
for column, offset in current.items():
x.loc[x.index[-1 * offset:], column] = None
return x
df.loc[:, ["A", "B", "C"]] = df.groupby("D").apply(funny_shift, ofs=offset_lookup)
print(df)