对 postgreSQL 中特定列值的计数进行采样

Question

我有一个带有一组值的 table，table 例如

ID  |  Customer_name  | workorder
1   |    abc          | dispatch
2   |    xyz          | not_dispatch
3   |    jdk          | dispatch

总共有 100 万行。现在我想将此数据集采样到 5000 行，我想要 3400 个工单作为 'not_dispatch' 和 1600 个 'dispatch' 在样本中。如何在 PostgreSQL 中完成此操作。

Answer 1

远非高效，但有效：

SELECT *
FROM (
  SELECT * FROM my_table
  WHERE workorder = 'dispatch' -- other filters
  ORDER BY random() LIMIT 1600) sub1
UNION
SELECT *
FROM (
  SELECT * FROM my_table
  WHERE workorder = 'not_dispatch' -- other filters
  ORDER BY random() LIMIT 3400) sub2;

对 postgreSQL 中特定列值的计数进行采样

Sampling on count of specific column value in postgreSQL

postgresql

sampling