如何根据包含从 0 到 n 的数字循环的特定给定列生成新变量（列），其中 n 是正整数

Question

数据集包含有关 COVID-19 患者的数据。它有 EXCEL 和 CSV 两种文件格式，包含 多个变量 和超过 7000 条记录（行），这使得手动解决的问题极其困难且非常耗时。以下是解决问题所需的 4 个最重要的变量（列）； 1：id 用于识别每条记录（行），2：day_at_hosp 用于每天住院的患者，3：sex 患者，4：death患者最终是死亡还是幸存。我想创建一个新变量 total_days_at_hosp，其中应包含患者住院的总天数。

Old Table:
 _______________________________________
|   id  | day_at_hosp |  sex   | death  |
|_______|_____________|________|________|
|   1   |     0       |  male  |   no   |
|   2   |     1       |        |        |
|   3   |     2       |        |        |
|   4   |     0       | female |   no   |
|   5   |     1       |        |        |
|   6   |     0       |  male  |   no   |
|   7   |     0       | female |   no   |
|   8   |     0       |  male  |   no   |
|   9   |     1       |        |        |
|  10   |     2       |        |        |
|  11   |     3       |        |        |
|  12   |     4       |        |        |
| ...   |    ...      |   ...  |  ...   |
| 7882  |     0       | female |   no   |
| 7883  |     1       |        |        |
|_______|_____________|________|________|

New Table:
I want to convert table above into table below:
 ____________________________________________
|   id  |total_days_at_hosp|  sex   | death  |
|_______|__________________|________|________|
|   1   |        3         |  male  |   no   |
|   4   |        2         |  male  |   yes  |
|   6   |        1         |  male  |   yes  |
|   7   |        1         | female |   no   |
|   8   |        5         |  male  |   no   |
| ...   |       ...        |   ...  |  ...   |
| 2565  |        2         | female |   no   |
|_______|__________________|________|________|

注意：id 列是针对每条输入的记录，并且根据患者在医院住院的时间长短，为每位患者输入了多条记录。 day_at_hosp 变量包含天数：0=住院的第一天，1=住院的第二天，...，n=住院的最后一天。变量（列）day_at_hosp为0的记录（行）对应其他列的所有条目，如果day_at_hosp的记录（行）为* 不是0，说1,2,3,...,5 那么它属于右上角的患者，所有对应的变量（列）留空。但是，我需要的数据集应该类似于下面的 table。它应该包括一个名为 total_days_at_hosp 的新变量（列），该变量（列）是从变量（列）day_at_hosp 生成的。新变量（列）total_days_at_hosp在进行统计检验时更有用，将取代变量（列）day_at_hosp，这样可以删除所有空白行。要从旧的 table 移动到新的 table，所需的程序应该执行以下操作：

day_at_hosp ===> total_days_at_hosp
    0                                 
    1        --->        3               
    2
-------------------------------------                              
    0        --->        2           
    1                              
-------------------------------------
    0        --->        1          
-------------------------------------
    0        --->        1
-------------------------------------
    0                                  
    1                                  
    2        --->        5                
    3
    4                                  
-------------------------------------
   ...                                 
------------------------------------- 
    0         --->       2                                              
    1
-------------------------------------

我怎样才能做到这一点？

Answer 1

很明显，您的数据是按患者排序的，并且您想要的 table 会很多 'shorter' - 因此，此答案的起点是应用 AutoFilter到您的原始数据，将筛选条件设置为 days_at_hospital = 0，然后将此录取筛选复制到 F 列：删除旧的G列数据后，可以在单元格G2中输入下面的公式并向下复制

=INDEX(B:B,MATCH(F3,A:A,0)-1)+1

为了保持公式简单，相同虚拟最大值应在旧和新 [=24] 的两端输入=]s.

Answer 2

Old/New Table.

末尾没有虚拟值的另一个公式选项

1] 通过 >>

创建新 Table

将所有旧 Table 数据复制并粘贴到未使用的区域
点击“自动筛选”
在“days_at_hospital”列中 select =0 值
将录取过滤器复制并粘贴到新 Table 列 F
删除G列所有行的0

然后，

2]在G2中，公式复制下来：

=IF(F2="","",IF(F3="",MATCH(9^9,A:A)+1,MATCH(F3,A:A,0))-MATCH(F2,A:A,0))

备注：如果您的“ID 列”是文本值，公式更改为：

=IF(F2="","",IF(F3="",MATCH("zzz",A:A)+1,MATCH(F3,A:A,0))-MATCH(F2,A:A,0))

如何根据包含从 0 到 n 的数字循环的特定给定列生成新变量（列），其中 n 是正整数

How to generate new variable (column) based on a specific given column containing cycles of numbers from 0 to n, where n is a positive integer

python

variables

excel

vba

excel-formula