随机加入 SQL 服务器,随机结果数量不等

random join in SQL Server with a varying number of random results

SQL 服务器的 SQL 变得如此聪明,以至于看起来可能需要程序解决方案的事情通常可以用纯 SQL 来完成。我想知道这是否是其中之一。
假设我们有一个 STATES table 和一个 CITIES table.

STATES:
State:  NY

CITIES
State: NY
City:  Armonk

现在让我们用第三个 table 使事情复杂化:说明

INSTRUCTIONS
State: NY
HowMany: 17

State: NJ
HowMany: 11

有什么办法在SQL服务器SQL到selectHowMany城市随机从CITIEStable当三个table 加入 State?

我们事先不知道"top N"。它因州而异。

当然,States table 将包含所有 50 个州,Cities table 每个州的所有城市,Instructions 将每个州有一条记录,标识来自多少个城市需要那个状态(随机选择)。

P.S。示例所需结果(假设纽约的指令是 HowMany=5,新泽西州的指令是 HowMany = 4,并且 order by STATES.state):

NJ.....Princeton
NJ.....Newark
NJ.....Camden
NJ.....Princeton
NY.....Armonk
NY.....Schenectady
NY.....White Plains
NY.....Niagara Falls
NY.....Rochester
with
  states as (
    select 'NY' state union
    select 'NJ' state
  ),
  instructions as (
    select 'NY' state, 2 howmany union
    select 'NJ' state, 3 howmany
  ),
  cities as (
    select 'NJ' state,'Princeton' city union
    select 'NJ' state,'Newark' city union
    select 'NJ' state,'Camden' city union
    select 'NJ' state,'Hamilton' city union
    select 'NY' state,'Armonk' city union
    select 'NY' state,'Schenectady' city union
    select 'NY' state,'White Plains' city union
    select 'NY' state,'Niagara Falls' city union
    select 'NY' state,'Rochester' city
  ),
  cities_rnd as (
    select c.*,rand() rnd from cities c
  ),
  cities_ranked as (
    select c.*, dense_rank() over (partition by c.state order by c.rnd) rank from cities_rnd c
  )
select c.*,i.howmany
from cities_ranked c
join instructions  i on i.state=c.state
join states        s on s.state=c.state --needless line
where c.rank <= i.howmany;

与使用不使用 DDL(数据定义语言)的查询的另一个答案不同

SET @row_num2= 0;
SELECT *,@row_num2 := @row_num2+1 as rownum2 FROM (SELECT States.State,Cities.City,Instructions.HowMany
from States,Cities, Instructions
WHERE States.State = Cities.State and States.State = Instructions.State
ORDER BY RAND()) as t HAVING rownum2 >= t.HowMany

http://sqlfiddle.com/#!9/b96d3b/37

我发现其他答案中使用的 RAND() 函数由于不是每行的新随机数而导致一些问题。

CHECKSUM(NEWID()) 在这种情况下对我来说效果很好。 (参见 RAND not different for every row in T-SQL UPDATE

我认为这个解决方案很好很整洁:

SELECT
RandomCities.[State]
,[RandomCities].City
FROM
    (
        SELECT 
        s.[state]
        ,city
        ,ROW_NUMBER() OVER (PARTITION BY s.[State] ORDER BY CHECKSUM(NEWID())) AS [RandomOrder]
        FROM
        States s
        INNER JOIN Cities c ON c.[state]=s.[state]
    ) AS RandomCities
INNER JOIN instructions i ON i.[state]=RandomCities.[state]
WHERE RandomCities.RandomOrder<=i.HowMany