在 pandas 中的何处添加多索引级别?
Where to add multiindex level in pandas?
我有一个多索引数据框,我提供了最小的可重现样本。
import pandas as pd
import numpy as np
import string
index = pd.MultiIndex.from_product(
([2020], [1, 2, 3, 4]), names=['year', 'q']
)
columns = pd.MultiIndex.from_product(
(['Items1', 'Items2', 'Items3'], ['new', 'old']),
names=['Items', 'type']
)
data = np.random.seed(123)
data = list(np.random.choice(list(string.ascii_lowercase), (4,6)))
Ldata = pd.DataFrame(data, index=index, columns=columns)
Ldata
我想在数据框中标记某些值,我是这样做的。但是我不知道把multiindexlevel=1
放在哪里。我试过了,但出现错误,我可以在哪里添加级别?
def highlight_cells(x):
c = 'background-color: white'
c1 = 'background-color: red'
c2 = 'background-color: blue'
c3 = 'background-color: green'
k1 = Ldata['new'].str.contains("a", na=False)
k2 = Ldata['new'].str.contains("b", na=False)
k3 = Ldata['new'].str.contains("c", na=False)
colordata = pd.DataFrame(c, index=x.index, columns=x.columns)
colordata.loc[k1, 'new'] = c1
colordata.loc[k2, 'new'] = c2
colordata.loc[k3, 'new'] = c3
return colordata
end = Ldata.style.apply(highlight_cells,axis=None)
end
提前致谢!
也许最简单的方法是在应用函数之前像这样堆叠 DataFrame:
Ldata = Ldata.stack('Items')
然后您可以保留其余代码。
您似乎正试图跨 "Items"
列级别修改每个 "new"
列。
您可以使用 IndexSlice
访问 MultiIndex 的内部级别:
def highlight_cells(x):
c = 'background-color: white'
c1 = 'background-color: red'
c2 = 'background-color: blue'
c3 = 'background-color: green'
def ldata_contains(value):
return (
Ldata
.loc[:, pd.IndexSlice[:, 'new']]
.apply(lambda y: y.str.contains(value, na=False))
)
k1 = ldata_contains("a")
k2 = ldata_contains("b")
k3 = ldata_contains("c")
colordata = pd.DataFrame(c, index=x.index, columns=x.columns)
colordata.loc[k1, pd.IndexSlice[:, 'new']] = c1
colordata.loc[k2, pd.IndexSlice[:, 'new']] = c2
colordata.loc[k3, pd.IndexSlice[:, 'new']] = c3
return colordata
最直接的修改是使用 Styler.apply
in conjunction with pd.IndexSlice
的 subset
参数。然后将 MultiIndex 中与子集匹配的每一列的 DataFrame 应用样式减少到 Series 级别应用样式:
def highlight_cells(x):
c = 'background-color: white'
c1 = 'background-color: red'
c2 = 'background-color: blue'
c3 = 'background-color: green'
k1 = x.str.contains("a", na=False)
k2 = x.str.contains("b", na=False)
k3 = x.str.contains("c", na=False)
# Build Series for _column_ level Styles
color_data = pd.Series(c, index=x.index)
color_data[k1] = c1
color_data[k2] = c2
color_data[k3] = c3
return color_data
idx = pd.IndexSlice
end = Ldata.style.apply(highlight_cells, subset=idx[:, idx[:, 'new']])
end
当然这也可以在没有单独变量的情况下完成:
def highlight_cells(x):
color_data = pd.Series('background-color: white', index=x.index)
color_data[x.str.contains("a", na=False)] = 'background-color: red'
color_data[x.str.contains("b", na=False)] = 'background-color: blue'
color_data[x.str.contains("c", na=False)] = 'background-color: green'
return color_data
idx = pd.IndexSlice
end = Ldata.style.apply(highlight_cells, subset=idx[:, idx[:, 'new']])
end
或者不使用 np.select
构建索引结构:
def highlight_cells(x):
return np.select(
[x.str.contains("a", na=False),
x.str.contains("b", na=False),
x.str.contains("c", na=False)],
['background-color: red',
'background-color: blue',
'background-color: green'],
default='background-color: white'
)
idx = pd.IndexSlice
end = Ldata.style.apply(highlight_cells, subset=idx[:, idx[:, 'new']])
end
所有选项产生:
我有一个多索引数据框,我提供了最小的可重现样本。
import pandas as pd
import numpy as np
import string
index = pd.MultiIndex.from_product(
([2020], [1, 2, 3, 4]), names=['year', 'q']
)
columns = pd.MultiIndex.from_product(
(['Items1', 'Items2', 'Items3'], ['new', 'old']),
names=['Items', 'type']
)
data = np.random.seed(123)
data = list(np.random.choice(list(string.ascii_lowercase), (4,6)))
Ldata = pd.DataFrame(data, index=index, columns=columns)
Ldata
我想在数据框中标记某些值,我是这样做的。但是我不知道把multiindexlevel=1
放在哪里。我试过了,但出现错误,我可以在哪里添加级别?
def highlight_cells(x):
c = 'background-color: white'
c1 = 'background-color: red'
c2 = 'background-color: blue'
c3 = 'background-color: green'
k1 = Ldata['new'].str.contains("a", na=False)
k2 = Ldata['new'].str.contains("b", na=False)
k3 = Ldata['new'].str.contains("c", na=False)
colordata = pd.DataFrame(c, index=x.index, columns=x.columns)
colordata.loc[k1, 'new'] = c1
colordata.loc[k2, 'new'] = c2
colordata.loc[k3, 'new'] = c3
return colordata
end = Ldata.style.apply(highlight_cells,axis=None)
end
提前致谢!
也许最简单的方法是在应用函数之前像这样堆叠 DataFrame:
Ldata = Ldata.stack('Items')
然后您可以保留其余代码。
您似乎正试图跨 "Items"
列级别修改每个 "new"
列。
您可以使用 IndexSlice
访问 MultiIndex 的内部级别:
def highlight_cells(x):
c = 'background-color: white'
c1 = 'background-color: red'
c2 = 'background-color: blue'
c3 = 'background-color: green'
def ldata_contains(value):
return (
Ldata
.loc[:, pd.IndexSlice[:, 'new']]
.apply(lambda y: y.str.contains(value, na=False))
)
k1 = ldata_contains("a")
k2 = ldata_contains("b")
k3 = ldata_contains("c")
colordata = pd.DataFrame(c, index=x.index, columns=x.columns)
colordata.loc[k1, pd.IndexSlice[:, 'new']] = c1
colordata.loc[k2, pd.IndexSlice[:, 'new']] = c2
colordata.loc[k3, pd.IndexSlice[:, 'new']] = c3
return colordata
最直接的修改是使用 Styler.apply
in conjunction with pd.IndexSlice
的 subset
参数。然后将 MultiIndex 中与子集匹配的每一列的 DataFrame 应用样式减少到 Series 级别应用样式:
def highlight_cells(x):
c = 'background-color: white'
c1 = 'background-color: red'
c2 = 'background-color: blue'
c3 = 'background-color: green'
k1 = x.str.contains("a", na=False)
k2 = x.str.contains("b", na=False)
k3 = x.str.contains("c", na=False)
# Build Series for _column_ level Styles
color_data = pd.Series(c, index=x.index)
color_data[k1] = c1
color_data[k2] = c2
color_data[k3] = c3
return color_data
idx = pd.IndexSlice
end = Ldata.style.apply(highlight_cells, subset=idx[:, idx[:, 'new']])
end
当然这也可以在没有单独变量的情况下完成:
def highlight_cells(x):
color_data = pd.Series('background-color: white', index=x.index)
color_data[x.str.contains("a", na=False)] = 'background-color: red'
color_data[x.str.contains("b", na=False)] = 'background-color: blue'
color_data[x.str.contains("c", na=False)] = 'background-color: green'
return color_data
idx = pd.IndexSlice
end = Ldata.style.apply(highlight_cells, subset=idx[:, idx[:, 'new']])
end
或者不使用 np.select
构建索引结构:
def highlight_cells(x):
return np.select(
[x.str.contains("a", na=False),
x.str.contains("b", na=False),
x.str.contains("c", na=False)],
['background-color: red',
'background-color: blue',
'background-color: green'],
default='background-color: white'
)
idx = pd.IndexSlice
end = Ldata.style.apply(highlight_cells, subset=idx[:, idx[:, 'new']])
end
所有选项产生: