Postgresql - 如何使用循环(或在 r 中应用)来减小 r 脚本的大小

Postgresql - how to make use of loop (or apply in r) to decrease the r-script size

我的数据库中有两个表。两个表中的列及其数据类型如下所示。假设两个表都存储 3 台机器的数据。每台机器都有两个 s_id,通过使用它们,我将 select 特定机器所需的数据。

每台机器的s_id是

   m1   59,07
   m2   60,92
   m3   95,109



                                    Table "public.table_a"
    Column    |            Type             | Modifiers | Storage | Stats target | Description
--------------+-----------------------------+-----------+---------+--------------+-------------
 ettime       | timestamp without time zone |           | plain   |              |
 sn           | numeric                     |           | main    |              |
 s_id1        | numeric                     |           | main    |              |
 e_id1        | numeric                     |           | main    |              |
Indexes:
    "table_a_sn_key" UNIQUE CONSTRAINT, btree (sn)
Has OIDs: no


                               Table "public.table_b"
    Column    |            Type             | Modifiers | Storage  | Stats target | Description
--------------+-----------------------------+-----------+----------+--------------+-------------
 sn           | numeric                     |           | main     |              |
 ettime       | timestamp without time zone |           | plain    |              |
 value        | text                        |           | extended |              |
 comment      | text                        |           | extended |              |
 l_id         | numeric                     |           | main     |              |
 n_id         | numeric                     |           | main     |              |
 ettime.y     | timestamp without time zone |           | plain    |              |
 s_id2        | numeric                     |           | main     |              |
 e_id2        | numeric                     |           | main     |              |
Indexes:
    "table_b_sn_key" UNIQUE CONSTRAINT, btree (sn)
Has OIDs: no

通过使用下面的脚本我会得到想要的结果。

库(RPostgreSQL)

M1 <- dbGetQuery(con, "select 
    a.r_date::date date, 
    downgraded,
    total, 
    round(downgraded::numeric/total* 100, 2) percentage
from (
    select date_trunc('day', eventtime) r_date, count(*) downgraded
    from table_b
    where s_id2 in (59,07) 
    group by 1
    ) b
join (
    select date_trunc('day', eventtime) r_date, count(*) total
    from table_a
    where s_id1 in (59,07)
    group by 1
    ) a
using (r_date)
order by 1")

由于我不是编程出身,所以我对每台机器都使用上面的完整查询语句

M2 <-  dbGetQuery(con, "select 
        a.r_date::date date, 
        downgraded,
        total, ........
       ........

M3 <- dbGetQuery(con, "select 
        a.r_date::date date, 
        downgraded,
        total,.....
        .........

在我的例子中,是否可以使用循环而不是对每台机器使用查询。这样在一个查询中我将获得所有机器数据。

谁能告诉我如何用我的例子做这些。实际上我需要 运行 6 个单独的脚本,在每个脚本中我需要三台不同机器的数据。

只需修改您的 SQL 查询以考虑其他机器。不需要 R 解决方法。事实上,您将所有数据重组和处理都保留在 SQL 引擎上。具体来说,将 s_id 添加到派生表中的 where 子句(和 group by)或使用联合查询。在两者中,都添加了 machine 和 s_id 列以在导入的数据框中识别它们:

条款更改的地方

strsql <- "select a.machine, 
                  a.r_date::date date, 
                  downgraded,
                  total, 
                  round(downgraded::numeric/total* 100, 2) percentage
           from (
                  select CASE WHEN s_id2 IN (59,07) THEN 'M1'
                              WHEN s_id2 IN (60,92) THEN 'M2'
                              WHEN s_id2 IN (95,109) THEN 'M3'
                         END As machine, date_trunc('day', eventtime) r_date, 
                         count(*) downgraded
                  from table_b
                  where s_id2 in (59,07,60,92,95,109) 
                  group by CASE WHEN s_id2 IN (59,07) THEN 'M1'
                                WHEN s_id2 IN (60,92) THEN 'M2'
                                WHEN s_id2 IN (95,109) THEN 'M3'
                           END, date_trunc('day', eventtime)
                  ) b
            inner join (
                  select CASE WHEN s_id1 IN (59,07) THEN 'M1'
                              WHEN s_id1 IN (60,92) THEN 'M2'
                              WHEN s_id1 IN (95,109) THEN 'M3'
                         END As machine, date_trunc('day', eventtime) r_date,
                         count(*) total
                  from table_a
                  where s_id1 in (59,07,60,92,95,109) 
                  group by CASE WHEN s_id1 IN (59,07) THEN 'M1'
                                WHEN s_id1 IN (60,92) THEN 'M2'
                                WHEN s_id1 IN (95,109) THEN 'M3'
                           END, date_trunc('day', eventtime)
                  ) a
            on a.machine = b.machine and a.r_date = b.r_date
            order by a.r_date;"

machinesdf <- dbGetQuery(con, strsql) 

联合查询

strsql <- "select a.machine, 
                   a.r_date::date date, 
                   downgraded,
                   total, 
                   round(downgraded::numeric/total* 100, 2) percentage
            from (
                   select 'M1' as machine, date_trunc('day', eventtime) r_date, 
                          count(*) downgraded
                   from table_b
                   where s_id2 in (59,07) 
                   group by date_trunc('day', eventtime)
                 ) b
            inner join (
                   select 'M1' as machine, date_trunc('day', eventtime) r_date, 
                          count(*) total
                   from table_a
                   where s_id1 in (59,07) 
                   group by date_trunc('day', eventtime)
                 ) a
            on a.machine = b.machine and a.r_date = b.r_date
            order by a.r_date

            union

            select a.machine, 
                    a.r_date::date date, 
                    downgraded,
                    total, 
                    round(downgraded::numeric/total* 100, 2) percentage
             from (
                    select 'M2' as machine, date_trunc('day', eventtime) r_date, 
                           count(*) downgraded
                    from table_b
                    where s_id2 in (60,92) 
                    group by date_trunc('day', eventtime)
                  ) b
             inner join (
                    select 'M2' as machine, date_trunc('day', eventtime) r_date, 
                           count(*) total
                    from table_a
                    where s_id1 in (60,92) 
                    group by date_trunc('day', eventtime)
                  ) a
             on a.machine = b.machine and a.r_date = b.r_date

             union

             select a.machine, 
                     a.r_date::date date, 
                     downgraded,
                     total, 
                     round(downgraded::numeric/total* 100, 2) percentage
              from (
                     select 'M3' as machine, date_trunc('day', eventtime) r_date, 
                            count(*) downgraded
                     from table_b
                     where s_id2 in (95,109) 
                     group by date_trunc('day', eventtime)
                   ) b
              inner join (
                     select 'M3' as machine, date_trunc('day', eventtime) r_date, 
                            count(*) total
                     from table_a
                     where s_id1 in (95,109) 
                     group by date_trunc('day', eventtime)
                   ) a
              on a.machine = b.machine and a.r_date = b.r_date;"

machinesdf <- dbGetQuery(con, strsql)