pandas 中的复杂部分字符串匹配

Complex partial string matching in pandas

给定具有以下结构和值的数据框 json_path -

json_path Reporting Group Entity/Grouping
data.attributes.total.children.[0] Christian Family Abraham Family
data.attributes.total.children.[0].children.[0] Christian Family In Estate
data.attributes.total.children.[0].children.[0].children.[0].children.[0] Christian Family Cash
data.attributes.total.children.[0].children.[0].children.[1].children.[0] Christian Family Investment Grade Fixed Income

我如何过滤包含 children 四次的 json_path 行?即,我想过滤索引位置 2-3 -

json_path Reporting Group Entity/Grouping
data.attributes.total.children.[0].children.[0].children.[0].children.[0] Christian Family Cash
data.attributes.total.children.[0].children.[0].children.[1].children.[0] Christian Family Investment Grade Fixed Income

我知道如何获得部分匹配,但是方括号中的整数会不一致,所以我的直觉告诉我以某种方式计算 children 实例的逻辑(即 children 出现 4x) 并以此为基础进行过滤。

关于如何实现此目标的任何建议或资源?

正如您所说,一种天真的方法是计算 .children 的出现次数并将计数与 4 进行比较以创建可用于过滤行的布尔掩码

df[df['json_path'].str.count(r'\.children').eq(4)]

一种更可靠的方法是检查连续出现的 4 children

df[df['json_path'].str.contains(r'(\.children\.\[\d+\]){4}')]

                                                                   json_path   Reporting Group                Entity/Grouping
2  data.attributes.total.children.[0].children.[0].children.[0].children.[0]  Christian Family                           Cash
3  data.attributes.total.children.[0].children.[0].children.[1].children.[0]  Christian Family  Investment Grade Fixed Income