根据键名和值组合动态排序数据
Sorting data dynamically based upon key names and value combos
所以,首先我会说我不擅长编程。但是,我已经接触过足够多的 优秀 程序员,我觉得必须有一个更优雅的解决方案来解决我正在尝试做的事情 - 我希望这里有人会知道一个.
我正在尝试找到一种对数据进行排序的好方法(我几乎无法控制)。数据作为字典数组传递——所有定义相同——我将根据以下标准进行排序:
任何键名都可能具有特定的属性,例如以特殊字符开头(在本例中为 "o")
单个字典中的一个或多个键可能具有此属性
如果该属性存在,将这些字典组合在一起,其中 所有 拥有该属性的键值相同
数据呈现和返回的顺序并不重要
例如,给定以下字典格式的输入数据:
+--------+---------+-------+--------------+
| o_last | first | o_zip | likes |
+--------+---------+-------+--------------+
| Smith | Bob | 12345 | Apples |
| Smith | Alice | 12345 | Peaches |
| Smith | Marvin | 54321 | Strawberries |
| Jones | Steve | 98765 | Potatoes |
| Jones | Harold | 98765 | Beets |
| White | Carol | 00001 | Fish |
+--------+---------+-------+--------------+
将输出以下组:
+--------+---------+-------+--------------+
| Smith | Bob | 12345 | Apples |
| Smith | Alice | 12345 | Peaches |
+--------+---------+-------+--------------+
+--------+---------+-------+--------------+
| Smith | Marvin | 54321 | Strawberries |
+--------+---------+-------+--------------+
+--------+---------+-------+--------------+
| Jones | Steve | 98765 | Potatoes |
| Jones | Harold | 98765 | Beets |
+--------+---------+-------+--------------+
+--------+---------+-------+--------------+
| White | Carol | 00001 | Fish |
+--------+---------+-------+--------------+
下面是我为实现这一目标而准备的当前功能,到目前为止它似乎工作正常。但是,正如我上面提到的,我不得不相信可以使用我不知道的更优雅的库或设计模式。
def sort_data(input_d):
fields = []
has_prop = False
prop_fields = []
prop_dict = {}
out_list = []
# create a list of keys that have the property
[fields.append(x) for x in input_d[0].keys()]
for field in fields:
if re.match("^o[a-np-zA-NP-Z]*_", field):
has_prop = True
prop_fields.append(field)
# if keys are found:
if has_prop:
for d in input_d:
prop_vals = ""
for f in prop_fields:
prop_vals += d[f]
# create an md5 hash of unique values for keys with property
# and use it to group dicts with the same value combinations
prop_vals_hash = hashlib.md5(prop_vals).hexdigest()
if prop_vals_hash in prop_dict:
prop_dict[prop_vals_hash].append(d)
else:
prop_dict[prop_vals_hash] = [d]
# return data as an array of arrays, with each index
# in that array a grouping of dicts with unique value combinations
for k in prop_dict.keys():
out_list.append(prop_dict[k])
# default for input data that does not have keys possessing
# our property of interest
else:
for d in input_d:
output_list.append([d])
return output_list
我很乐意听到任何人愿意提供的所有回复、建议、批评或反馈。感谢阅读!
我想了解是否可以假设所有 prop 字段都相同的记录已经彼此相邻排序——我将假设是这样,因为否则将需要排序,我可以'不要猜测您要使用的排序标准。所以...:[=13=]
output_list = []
prop_fields = [k for k in input_d[0] if k.startswith('o')]
# if keys are found:
if prop_fields:
for k, g in itertools.group_by(input_d, key=operator.itemgetter(*prop_fields)):
output_list.append(list(g))
else:
output_list = [[d] for d in input_d]
return output_list
如果 "order is fine and must be maintained" 条件不适用,则必须在 if prop_fields:
之后添加
input_d.sort(key=operator.itemgetter(*prop_fields))
但是,这 不 符合您示例的保序特征(我相信,您提供的代码也不符合)。
所以,首先我会说我不擅长编程。但是,我已经接触过足够多的 优秀 程序员,我觉得必须有一个更优雅的解决方案来解决我正在尝试做的事情 - 我希望这里有人会知道一个.
我正在尝试找到一种对数据进行排序的好方法(我几乎无法控制)。数据作为字典数组传递——所有定义相同——我将根据以下标准进行排序:
任何键名都可能具有特定的属性,例如以特殊字符开头(在本例中为 "o")
单个字典中的一个或多个键可能具有此属性
如果该属性存在,将这些字典组合在一起,其中 所有 拥有该属性的键值相同
数据呈现和返回的顺序并不重要
例如,给定以下字典格式的输入数据:
+--------+---------+-------+--------------+
| o_last | first | o_zip | likes |
+--------+---------+-------+--------------+
| Smith | Bob | 12345 | Apples |
| Smith | Alice | 12345 | Peaches |
| Smith | Marvin | 54321 | Strawberries |
| Jones | Steve | 98765 | Potatoes |
| Jones | Harold | 98765 | Beets |
| White | Carol | 00001 | Fish |
+--------+---------+-------+--------------+
将输出以下组:
+--------+---------+-------+--------------+
| Smith | Bob | 12345 | Apples |
| Smith | Alice | 12345 | Peaches |
+--------+---------+-------+--------------+
+--------+---------+-------+--------------+
| Smith | Marvin | 54321 | Strawberries |
+--------+---------+-------+--------------+
+--------+---------+-------+--------------+
| Jones | Steve | 98765 | Potatoes |
| Jones | Harold | 98765 | Beets |
+--------+---------+-------+--------------+
+--------+---------+-------+--------------+
| White | Carol | 00001 | Fish |
+--------+---------+-------+--------------+
下面是我为实现这一目标而准备的当前功能,到目前为止它似乎工作正常。但是,正如我上面提到的,我不得不相信可以使用我不知道的更优雅的库或设计模式。
def sort_data(input_d):
fields = []
has_prop = False
prop_fields = []
prop_dict = {}
out_list = []
# create a list of keys that have the property
[fields.append(x) for x in input_d[0].keys()]
for field in fields:
if re.match("^o[a-np-zA-NP-Z]*_", field):
has_prop = True
prop_fields.append(field)
# if keys are found:
if has_prop:
for d in input_d:
prop_vals = ""
for f in prop_fields:
prop_vals += d[f]
# create an md5 hash of unique values for keys with property
# and use it to group dicts with the same value combinations
prop_vals_hash = hashlib.md5(prop_vals).hexdigest()
if prop_vals_hash in prop_dict:
prop_dict[prop_vals_hash].append(d)
else:
prop_dict[prop_vals_hash] = [d]
# return data as an array of arrays, with each index
# in that array a grouping of dicts with unique value combinations
for k in prop_dict.keys():
out_list.append(prop_dict[k])
# default for input data that does not have keys possessing
# our property of interest
else:
for d in input_d:
output_list.append([d])
return output_list
我很乐意听到任何人愿意提供的所有回复、建议、批评或反馈。感谢阅读!
我想了解是否可以假设所有 prop 字段都相同的记录已经彼此相邻排序——我将假设是这样,因为否则将需要排序,我可以'不要猜测您要使用的排序标准。所以...:[=13=]
output_list = []
prop_fields = [k for k in input_d[0] if k.startswith('o')]
# if keys are found:
if prop_fields:
for k, g in itertools.group_by(input_d, key=operator.itemgetter(*prop_fields)):
output_list.append(list(g))
else:
output_list = [[d] for d in input_d]
return output_list
如果 "order is fine and must be maintained" 条件不适用,则必须在 if prop_fields:
input_d.sort(key=operator.itemgetter(*prop_fields))
但是,这 不 符合您示例的保序特征(我相信,您提供的代码也不符合)。