绘制带有分数的项目的层次结构
Drawing hierarchy of items with scores
我是 Python 和一般编程的新手,非常感谢任何帮助。我正在尝试使用 Python(最好使用 Pandas)来执行以下操作:
数据
我有一个 table 看起来像这样:
+--------------------+-------+
| Parent:Child | Score |
+--------------------+-------+
| Life:Work | 3 |
| Work:Money | 2 |
| Work:Hours | 3 |
| Work:Hours | 2 |
| Life:Health | 2 |
| Money:Life savings | 3 |
+--------------------+-------+
期望输出
- Table:
确定独特的项目并计算平均分:
如果有多个条目,则分数取平均值
+--------------+---------------+
| Unique item | Average score |
+--------------+---------------+
| Life | NaN |
| Work | 3 |
| Health | 2 |
| Money | 2 |
| Hours | 2.5 |
| Life savings | 3 |
+--------------+---------------+
- 树:
a) 确定项目的层次结构:
生活 > 工作 > 金钱 > 毕生积蓄
生活 > 工作 > 工作时间
生活 > 健康
b) 用项目和平均分数画树:
Life (NaN)
/ \
Work (3) Health (2)
/ \
Money (2) Hours (2.5)
|
Life savings (3)
一些注意事项:
在数据中,冒号(“:”)表示项目之间的关系。格式为 Parent:Child
“生命”没有分数,所以它应该 return NaN
“小时”在数据中有两个条目,因此显示平均值“(2+3)/2 = 2.5
非常感谢您的帮助!
已编辑
感谢 AKX 的有用回复。只有一部分没有解决,所以我会在这里澄清一下。对于 2) 树:a) 确定项目的层次结构:
原始数据没有说明Parent:Child在哪一层。这里的问题是编写可以解决这个问题的代码并 link 它们。从“Life:Work”和“Work:Money”,我们需要找出第一个条目(“工作”)的子项与第二个条目(“金钱”)的父项相匹配。即:
发件人:
Life:Work
Work:Money
合并为:
Life:Work:金钱
最终,根据原始数据:
+--------------------+-------+
| Parent:Child | Score |
+--------------------+-------+
| Life:Work | 3 |
| Work:Money | 2 |
| Work:Hours | 3 |
| Work:Hours | 2 |
| Life:Health | 2 |
| Money:Life savings | 3 |
+--------------------+-------+
像这样创建一个table:
+--------+--------+--------+--------------+-----------+------------------------------------------------------------------------------------------------------------------------------------+
| Layer1 | Layer2 | Layer3 | Layer4 | Avg Score | #Comments |
+--------+--------+--------+--------------+-----------+------------------------------------------------------------------------------------------------------------------------------------+
| Life | Work | | | 3 | #Directly from "Life:Work" in raw data |
| Life | Work | Money | | 2 | #Entry Work:Money has score 2. Since there is an entry "Life:Work", we know "Work" isn't an ultimate parent, and sits below "Life" |
| Life | Work | Money | Life savings | 3 | #Entry "Money:Life savings" has score 3. Similarly, we know from other entries that the hierarchy is Life > Work > Money |
| Life | Work | Hours | | 2.5 | #There're entries "Work:Money" and another "Work:Hours", so we know both "Money" and "Hours" are direct children of "Work" |
| Life | Health | | | 2 | #Directly from "Life:Health" which has score 2. And there is no entry above "Life", which makes it the top of the hierarchy |
| Life | | | | NaN | #There is no entry where "Life" is a child, so "Life" is an ultimate parent. Also, no entry tells us the score for "Life" |
+--------+--------+--------+--------------+-----------+------------------------------------------------------------------------------------------------------------------------------------+
然后从这个table,我们应该能够创建树(格式无关紧要)。
Life (NaN)
/ \
Work (3) Health (2)
/ \
Money (2) Hours (2.5)
|
Life savings (3)
再次感谢您的帮助!
这是一个使用我提到的 asciitree
库的东西。
事实证明,它使得为每个树节点打印出自定义值变得相当容易,这正是我们在这里想要的。
我尽量添加有用的评论。
from asciitree import LeftAligned, DictTraversal
import pandas as pd
from collections import defaultdict
class ShowValueTraversal(DictTraversal):
def __init__(self, values):
self.values = values
def get_text(self, node):
key = node[0]
if key in self.values:
return f"{key} ({self.values[key]})"
return key
def treeify(averages_dict):
# Make a recursive tree we can just add children to
make_tree = lambda: defaultdict(make_tree)
tree = make_tree()
for tag, value in averages_dict.items():
parent = tree
parts = tag.split(":")
for i in range(len(parts) + 1):
joined_tag = ":".join(parts[:i])
parent = parent[joined_tag]
return tree
def fixup_names(dct):
# Break down the keys on colons
dct = {tuple(key.split(":")): value for (key, value) in dct.items()}
# Get a mapping of the last "atoms" of each known name to their full name
last_atom_map = {p[-1]: p for p in dct}
# Walk through the original dictionary, replacing any known first atom with
# an entry from the last atom map if possible and reconstitute the keys
new_dct = {}
for key, value in dct.items():
key_parts = list(key)
while key_parts[0] in last_atom_map:
# Slice in the new prefix
key_parts[0:1] = last_atom_map[key_parts[0]]
new_key = ":".join(key_parts)
new_dct[new_key] = value
return new_dct
df = pd.DataFrame(
[
("Life:Work", 3),
("Work:Money", 2),
("Work:Hours", 3),
("Work:Hours", 2),
("Life:Health", 2),
("Money:Life savings", 3),
("Money:Something", 2),
("Money:Something:Deeper", 1),
],
columns=["tag", "value"],
)
print("# Original data")
print(df)
print()
print("# Averages")
df_averages = df.groupby("tag").mean()
print(df_averages)
print()
# Turn the averages into a dict of tag -> value
averages_dict = dict(df_averages.itertuples())
# Fix up the names (to infer hierarchy)
averages_dict = fixup_names(averages_dict)
# Generate a tree out of the flat data
tree = treeify(averages_dict)
# Instantiate a custom asciitree traversal object that knows how to
# look up the values from the dict
traverse = ShowValueTraversal(values=averages_dict)
# Print it out!
print("# Tree")
print(LeftAligned(traverse=traverse)(tree))
输出为
# Original data
tag value
0 Life:Work 3
1 Work:Money 2
2 Work:Hours 3
3 Work:Hours 2
4 Life:Health 2
5 Money:Life savings 3
6 Money:Something 2
7 Money:Something:Deeper 1
# Averages
value
tag
Life:Health 2.0
Life:Work 3.0
Money:Life savings 3.0
Money:Something 2.0
Money:Something:Deeper 1.0
Work:Hours 2.5
Work:Money 2.0
# Tree
+-- Life
+-- Life:Health (2.0)
+-- Life:Work (3.0)
+-- Life:Work:Money (2.0)
| +-- Life:Work:Money:Life savings (3.0)
| +-- Life:Work:Money:Something (2.0)
| +-- Life:Work:Money:Something:Deeper (1.0)
我是 Python 和一般编程的新手,非常感谢任何帮助。我正在尝试使用 Python(最好使用 Pandas)来执行以下操作:
数据
我有一个 table 看起来像这样:
+--------------------+-------+
| Parent:Child | Score |
+--------------------+-------+
| Life:Work | 3 |
| Work:Money | 2 |
| Work:Hours | 3 |
| Work:Hours | 2 |
| Life:Health | 2 |
| Money:Life savings | 3 |
+--------------------+-------+
期望输出
- Table: 确定独特的项目并计算平均分:
如果有多个条目,则分数取平均值
+--------------+---------------+
| Unique item | Average score |
+--------------+---------------+
| Life | NaN |
| Work | 3 |
| Health | 2 |
| Money | 2 |
| Hours | 2.5 |
| Life savings | 3 |
+--------------+---------------+
- 树:
a) 确定项目的层次结构:
生活 > 工作 > 金钱 > 毕生积蓄
生活 > 工作 > 工作时间
生活 > 健康
b) 用项目和平均分数画树:
Life (NaN)
/ \
Work (3) Health (2)
/ \
Money (2) Hours (2.5)
|
Life savings (3)
一些注意事项:
在数据中,冒号(“:”)表示项目之间的关系。格式为 Parent:Child
“生命”没有分数,所以它应该 return NaN
“小时”在数据中有两个条目,因此显示平均值“(2+3)/2 = 2.5
非常感谢您的帮助!
已编辑 感谢 AKX 的有用回复。只有一部分没有解决,所以我会在这里澄清一下。对于 2) 树:a) 确定项目的层次结构:
原始数据没有说明Parent:Child在哪一层。这里的问题是编写可以解决这个问题的代码并 link 它们。从“Life:Work”和“Work:Money”,我们需要找出第一个条目(“工作”)的子项与第二个条目(“金钱”)的父项相匹配。即:
发件人:
Life:Work
Work:Money
合并为:
Life:Work:金钱
最终,根据原始数据:
+--------------------+-------+
| Parent:Child | Score |
+--------------------+-------+
| Life:Work | 3 |
| Work:Money | 2 |
| Work:Hours | 3 |
| Work:Hours | 2 |
| Life:Health | 2 |
| Money:Life savings | 3 |
+--------------------+-------+
像这样创建一个table:
+--------+--------+--------+--------------+-----------+------------------------------------------------------------------------------------------------------------------------------------+
| Layer1 | Layer2 | Layer3 | Layer4 | Avg Score | #Comments |
+--------+--------+--------+--------------+-----------+------------------------------------------------------------------------------------------------------------------------------------+
| Life | Work | | | 3 | #Directly from "Life:Work" in raw data |
| Life | Work | Money | | 2 | #Entry Work:Money has score 2. Since there is an entry "Life:Work", we know "Work" isn't an ultimate parent, and sits below "Life" |
| Life | Work | Money | Life savings | 3 | #Entry "Money:Life savings" has score 3. Similarly, we know from other entries that the hierarchy is Life > Work > Money |
| Life | Work | Hours | | 2.5 | #There're entries "Work:Money" and another "Work:Hours", so we know both "Money" and "Hours" are direct children of "Work" |
| Life | Health | | | 2 | #Directly from "Life:Health" which has score 2. And there is no entry above "Life", which makes it the top of the hierarchy |
| Life | | | | NaN | #There is no entry where "Life" is a child, so "Life" is an ultimate parent. Also, no entry tells us the score for "Life" |
+--------+--------+--------+--------------+-----------+------------------------------------------------------------------------------------------------------------------------------------+
然后从这个table,我们应该能够创建树(格式无关紧要)。
Life (NaN)
/ \
Work (3) Health (2)
/ \
Money (2) Hours (2.5)
|
Life savings (3)
再次感谢您的帮助!
这是一个使用我提到的 asciitree
库的东西。
事实证明,它使得为每个树节点打印出自定义值变得相当容易,这正是我们在这里想要的。
我尽量添加有用的评论。
from asciitree import LeftAligned, DictTraversal
import pandas as pd
from collections import defaultdict
class ShowValueTraversal(DictTraversal):
def __init__(self, values):
self.values = values
def get_text(self, node):
key = node[0]
if key in self.values:
return f"{key} ({self.values[key]})"
return key
def treeify(averages_dict):
# Make a recursive tree we can just add children to
make_tree = lambda: defaultdict(make_tree)
tree = make_tree()
for tag, value in averages_dict.items():
parent = tree
parts = tag.split(":")
for i in range(len(parts) + 1):
joined_tag = ":".join(parts[:i])
parent = parent[joined_tag]
return tree
def fixup_names(dct):
# Break down the keys on colons
dct = {tuple(key.split(":")): value for (key, value) in dct.items()}
# Get a mapping of the last "atoms" of each known name to their full name
last_atom_map = {p[-1]: p for p in dct}
# Walk through the original dictionary, replacing any known first atom with
# an entry from the last atom map if possible and reconstitute the keys
new_dct = {}
for key, value in dct.items():
key_parts = list(key)
while key_parts[0] in last_atom_map:
# Slice in the new prefix
key_parts[0:1] = last_atom_map[key_parts[0]]
new_key = ":".join(key_parts)
new_dct[new_key] = value
return new_dct
df = pd.DataFrame(
[
("Life:Work", 3),
("Work:Money", 2),
("Work:Hours", 3),
("Work:Hours", 2),
("Life:Health", 2),
("Money:Life savings", 3),
("Money:Something", 2),
("Money:Something:Deeper", 1),
],
columns=["tag", "value"],
)
print("# Original data")
print(df)
print()
print("# Averages")
df_averages = df.groupby("tag").mean()
print(df_averages)
print()
# Turn the averages into a dict of tag -> value
averages_dict = dict(df_averages.itertuples())
# Fix up the names (to infer hierarchy)
averages_dict = fixup_names(averages_dict)
# Generate a tree out of the flat data
tree = treeify(averages_dict)
# Instantiate a custom asciitree traversal object that knows how to
# look up the values from the dict
traverse = ShowValueTraversal(values=averages_dict)
# Print it out!
print("# Tree")
print(LeftAligned(traverse=traverse)(tree))
输出为
# Original data
tag value
0 Life:Work 3
1 Work:Money 2
2 Work:Hours 3
3 Work:Hours 2
4 Life:Health 2
5 Money:Life savings 3
6 Money:Something 2
7 Money:Something:Deeper 1
# Averages
value
tag
Life:Health 2.0
Life:Work 3.0
Money:Life savings 3.0
Money:Something 2.0
Money:Something:Deeper 1.0
Work:Hours 2.5
Work:Money 2.0
# Tree
+-- Life
+-- Life:Health (2.0)
+-- Life:Work (3.0)
+-- Life:Work:Money (2.0)
| +-- Life:Work:Money:Life savings (3.0)
| +-- Life:Work:Money:Something (2.0)
| +-- Life:Work:Money:Something:Deeper (1.0)