Kusto 删除部分重复项

Kusto Remove partial duplicate

使用 table 存储数据,我试图删除行“Target TargetCheese 4” 这里的逻辑是,如果给定商店的同一产品有两个或更多条目,它将根据其他行选择最适合该商店的 StoreNumber。如果 StoreNumber 不匹配但它不是重复的 Product,则该编号不会更改;例如,SafewayEggs 的 StoreNumber 将等于 1,即使有更多的 StoreNumber 为 6 的 Safeway 条目,因为只有一行 SafewayEggs。

let storedata=
datatable (Store:string,    Product:string  ,StoreNumber:string)
["Target",  "TargetCheese", "4",
"Target",   "TargetCheese", "5",
"Target",   "TargetApple",  "5",
"Target",   "TargetCorn",   "5",
"Target",   "TargetEggs",   "5",
"Kroger",   "KrogerApple",  "2",
"Kroger",   "KrogerCorn",   "2",
"Kroger",   "KrogerEggs",   "2",
"Safeway",  "SafewayApple", "6",
"Safeway",  "SafewayCorn",  "6",
"Safeway",   "SafewayEggs", "1"
];

我希望从存储数据中看到这个结果 table table:

Store   Product StoreNumber
Target  TargetCheese 5
Target  TargetApple 5
Target  TargetCorn  5
Target  TargetEggs  5
Kroger  KrogerApple 2
Kroger  KrogerCorn  2
Kroger  KrogerEggs  2
Safeway SafewayApple 6
Safeway SafewayCorn 6
Safeway SafewayEggs 1

我不明白你想要删除以下行的逻辑:

"Target", "TargetCheese", "4"

但是如果要Store和Product取最高值,那么可以使用下面的方法:

storedata
| summarize max(StoreNumber) by Store, Product

您可能需要不同的步骤:

  1. 找到“最适合”的 StoreNumber - 在我下面的示例中,出现次数最多的那个,使用 arg_max
  2. 必须使用 (1) 清理的数据集,每个商店和产品出现超过 1 次,使用计数
  3. 不需要清理的数据集,每个商店和产品只出现一次
  4. (3) 和修正后的数据集的并集
let storedata=
datatable (Store:string,    Product:string  ,StoreNumber:string)
["Target",  "TargetCheese", "5",
"Target",   "TargetCheese", "4",
"Target",   "TargetApple",  "5",
"Target",   "TargetCorn",   "5",
"Target",   "TargetEggs",   "5",
"Kroger",   "KrogerApple",  "2",
"Kroger",   "KrogerCorn",   "2",
"Kroger",   "KrogerEggs",   "2",
"Safeway",  "SafewayApple", "6",
"Safeway",  "SafewayCorn",  "6",
"Safeway",   "SafewayEggs", "1"
];
// (1) evaluate best-fit StoreNumber
let storenumber =
storedata
| order by Store,  StoreNumber
| summarize occ= count () by Store, StoreNumber
| summarize  arg_max(occ, *) by Store;
// (2) dataset to be cleaned = more than one occurence per store and product
let cleanup =
storedata
| summarize occ = count () by Store,  Product
| where occ > 1
| project-away occ;
// (3) dataset with only one occurrence 
let okdata =
storedata
| summarize occ= count () by Store, Product
| where occ==1
| project-away occ;
// (4) final dataset 
let res1 =storenumber
| join cleanup on Store
| project Store, Product, StoreNumber;
let res2 = storedata
| join okdata on Store, Product
| project-away Store1, Product1;
res1
| union res2;