Kusto 删除部分重复项
Kusto Remove partial duplicate
使用 table 存储数据,我试图删除行“Target TargetCheese 4”
这里的逻辑是,如果给定商店的同一产品有两个或更多条目,它将根据其他行选择最适合该商店的 StoreNumber。如果 StoreNumber 不匹配但它不是重复的 Product,则该编号不会更改;例如,SafewayEggs 的 StoreNumber 将等于 1,即使有更多的 StoreNumber 为 6 的 Safeway 条目,因为只有一行 SafewayEggs。
let storedata=
datatable (Store:string, Product:string ,StoreNumber:string)
["Target", "TargetCheese", "4",
"Target", "TargetCheese", "5",
"Target", "TargetApple", "5",
"Target", "TargetCorn", "5",
"Target", "TargetEggs", "5",
"Kroger", "KrogerApple", "2",
"Kroger", "KrogerCorn", "2",
"Kroger", "KrogerEggs", "2",
"Safeway", "SafewayApple", "6",
"Safeway", "SafewayCorn", "6",
"Safeway", "SafewayEggs", "1"
];
我希望从存储数据中看到这个结果 table table:
Store Product StoreNumber
Target TargetCheese 5
Target TargetApple 5
Target TargetCorn 5
Target TargetEggs 5
Kroger KrogerApple 2
Kroger KrogerCorn 2
Kroger KrogerEggs 2
Safeway SafewayApple 6
Safeway SafewayCorn 6
Safeway SafewayEggs 1
我不明白你想要删除以下行的逻辑:
"Target", "TargetCheese", "4"
但是如果要Store和Product取最高值,那么可以使用下面的方法:
storedata
| summarize max(StoreNumber) by Store, Product
您可能需要不同的步骤:
- 找到“最适合”的 StoreNumber - 在我下面的示例中,出现次数最多的那个,使用 arg_max
- 必须使用 (1) 清理的数据集,每个商店和产品出现超过 1 次,使用计数
- 不需要清理的数据集,每个商店和产品只出现一次
- (3) 和修正后的数据集的并集
let storedata=
datatable (Store:string, Product:string ,StoreNumber:string)
["Target", "TargetCheese", "5",
"Target", "TargetCheese", "4",
"Target", "TargetApple", "5",
"Target", "TargetCorn", "5",
"Target", "TargetEggs", "5",
"Kroger", "KrogerApple", "2",
"Kroger", "KrogerCorn", "2",
"Kroger", "KrogerEggs", "2",
"Safeway", "SafewayApple", "6",
"Safeway", "SafewayCorn", "6",
"Safeway", "SafewayEggs", "1"
];
// (1) evaluate best-fit StoreNumber
let storenumber =
storedata
| order by Store, StoreNumber
| summarize occ= count () by Store, StoreNumber
| summarize arg_max(occ, *) by Store;
// (2) dataset to be cleaned = more than one occurence per store and product
let cleanup =
storedata
| summarize occ = count () by Store, Product
| where occ > 1
| project-away occ;
// (3) dataset with only one occurrence
let okdata =
storedata
| summarize occ= count () by Store, Product
| where occ==1
| project-away occ;
// (4) final dataset
let res1 =storenumber
| join cleanup on Store
| project Store, Product, StoreNumber;
let res2 = storedata
| join okdata on Store, Product
| project-away Store1, Product1;
res1
| union res2;
使用 table 存储数据,我试图删除行“Target TargetCheese 4” 这里的逻辑是,如果给定商店的同一产品有两个或更多条目,它将根据其他行选择最适合该商店的 StoreNumber。如果 StoreNumber 不匹配但它不是重复的 Product,则该编号不会更改;例如,SafewayEggs 的 StoreNumber 将等于 1,即使有更多的 StoreNumber 为 6 的 Safeway 条目,因为只有一行 SafewayEggs。
let storedata=
datatable (Store:string, Product:string ,StoreNumber:string)
["Target", "TargetCheese", "4",
"Target", "TargetCheese", "5",
"Target", "TargetApple", "5",
"Target", "TargetCorn", "5",
"Target", "TargetEggs", "5",
"Kroger", "KrogerApple", "2",
"Kroger", "KrogerCorn", "2",
"Kroger", "KrogerEggs", "2",
"Safeway", "SafewayApple", "6",
"Safeway", "SafewayCorn", "6",
"Safeway", "SafewayEggs", "1"
];
我希望从存储数据中看到这个结果 table table:
Store Product StoreNumber
Target TargetCheese 5
Target TargetApple 5
Target TargetCorn 5
Target TargetEggs 5
Kroger KrogerApple 2
Kroger KrogerCorn 2
Kroger KrogerEggs 2
Safeway SafewayApple 6
Safeway SafewayCorn 6
Safeway SafewayEggs 1
我不明白你想要删除以下行的逻辑:
"Target", "TargetCheese", "4"
但是如果要Store和Product取最高值,那么可以使用下面的方法:
storedata
| summarize max(StoreNumber) by Store, Product
您可能需要不同的步骤:
- 找到“最适合”的 StoreNumber - 在我下面的示例中,出现次数最多的那个,使用 arg_max
- 必须使用 (1) 清理的数据集,每个商店和产品出现超过 1 次,使用计数
- 不需要清理的数据集,每个商店和产品只出现一次
- (3) 和修正后的数据集的并集
let storedata=
datatable (Store:string, Product:string ,StoreNumber:string)
["Target", "TargetCheese", "5",
"Target", "TargetCheese", "4",
"Target", "TargetApple", "5",
"Target", "TargetCorn", "5",
"Target", "TargetEggs", "5",
"Kroger", "KrogerApple", "2",
"Kroger", "KrogerCorn", "2",
"Kroger", "KrogerEggs", "2",
"Safeway", "SafewayApple", "6",
"Safeway", "SafewayCorn", "6",
"Safeway", "SafewayEggs", "1"
];
// (1) evaluate best-fit StoreNumber
let storenumber =
storedata
| order by Store, StoreNumber
| summarize occ= count () by Store, StoreNumber
| summarize arg_max(occ, *) by Store;
// (2) dataset to be cleaned = more than one occurence per store and product
let cleanup =
storedata
| summarize occ = count () by Store, Product
| where occ > 1
| project-away occ;
// (3) dataset with only one occurrence
let okdata =
storedata
| summarize occ= count () by Store, Product
| where occ==1
| project-away occ;
// (4) final dataset
let res1 =storenumber
| join cleanup on Store
| project Store, Product, StoreNumber;
let res2 = storedata
| join okdata on Store, Product
| project-away Store1, Product1;
res1
| union res2;