Power Query - 为多项研究创建跟踪器

Power Query - creating a tracker for multiple studies

Power Query 的新手并尝试了解如何更多地使用它来优化工作流。

在这里,我有多个 tables(每个研究 1 个),我正在尝试创建一个跟踪器来识别受试者、他们是否参与特定研究以及其他数据(例如,研究 ID)。每个table有:

  1. 唯一的受试者标识符(可用于 link 他们跨研究)
  2. 研究特定标识符
  3. 其他杂项信息(测试 1、测试 2 等的日期)

我正在尝试通过唯一的主题标识符对我所有 table 的列进行分组,然后可以为“Study1”、“Study 2”、“Study 3”添加列一个简单的 Y/N 来显示他们是否参与了该研究。如果是,则还显示特定于研究的标识符。但是,我卡住了。

我已将所有 3 个 table 附加到 1 个主 table - 这会导致重复,因为参与者可以参与多项研究。当我尝试“分组依据”功能时,我按“唯一主题标识符”对新列 Study1、Study2、Study3 等进行分组,并操作“所有行”。然而,当我展开这些时,它会创建多个重复行,这违背了“分组依据”功能的目的。

如果您有任何建议,我们将不胜感激。

例子

每个 table 都有这些列的一些变化:

Unique participant identifier Unique Study identifier Date of Test 1, etc
A 100 01-Apr-2022
B 101 02-Apr-2022
C 102 03-Apr-2022

假设参与者 A 仅参加研究 1。参与者 B 正在进行研究 2(研究 ID 201)和研究 3(研究 ID 301)。参与者 C 在所有 3 个上(研究 ID 分别为 102、202 和 302)。

我正在尝试制作一个 table 将显示:

Unique participant identifier Study 1 Study 1 Identifier Study 2 Study 2 Identifier Study 3 Study 3 Identifier
A Y 100
B Y 101 Y 201 Y 301
C Y 102 Y 302

在测试日期旁边(未显示,但概念相同)。这些 tables 会随着我们的进行而更新,因此 power query 将从 tables 中提取这些数据以创建“实时”跟踪器。

我在高级编辑器中的当前代码是:

let
    Source = Table.Combine({STUDY1, STUDY2, STUDY3}),
    #"Grouped Rows" = Table.Group(#"Removed Columns", {"Unique participant identifier"}, {{"STUDY12", each _, type table}, {"STUDY22", each _, type table}, {"STUDY32", each _, type table}}),
    #"Expanded STUDY12" = Table.ExpandTableColumn(#"Grouped Rows", "STUDY12", {"STUDY1-Study ID", "STUDY1-ABC Study ID", "STUDY1"}, {"STUDY12.STUDY1-Study ID", "STUDY12.STUDY1-ABC Study ID", "STUDY12.STUDY1"}),
    #"Expanded STUDY22" = Table.ExpandTableColumn(#"Expanded STUDY12", "STUDY22", {"STUDY2-Study ID", "STUDY2-OC Study ID", "STUDY2"}, {"STUDY22.STUDY2-Study ID", "STUDY22.STUDY2-ABC Study ID", "STUDY22.STUDY2"}),
    #"Expanded STUDY32" = Table.ExpandTableColumn(#"Expanded STUDY22", "STUDY32", {"STUDY3-Study ID", "STUDY3"}, {"STUDY32.STUDY3-Study ID", "STUDY32.STUDY3"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Expanded STUDY32",{" Unique participant identifier ", "STUDY12.STUDY1-Study ID", "STUDY12.STUDY1", "STUDY12.STUDY1-ABC Study ID", "STUDY22.STUDY2", "STUDY22.STUDY2-Study ID", "STUDY22.STUDY2-ABC Study ID", "STUDY32.STUDY3", "STUDY32.STUDY3-Study ID"})
in
    #"Reordered Columns"

通过“分组依据”功能完成。不过,这会为每个参与者创建多行。

这是一种方法。您的样本数据不是很好。您也没有提到如何将唯一研究标识符划分到桶中。我假设它只是数字除以 100 的整数部分。为了让步骤变得明显,这有点长

let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Unique participant identifier", type text}, {"Unique Study identifier", Int64.Type}, {"Date of Test", type date}}),
// there needs to be some way to identify groups of studies. I am using the hundreds place to find that
#"Added Custom" = Table.AddColumn(#"Changed Type", "StudyNumber", each Number.RoundDown([Unique Study identifier]/100)),
// first table -- studies only
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Date of Test"}),
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Removed Columns", {{"StudyNumber", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Removed Columns", {{"StudyNumber", type text}}, "en-US")[StudyNumber]), "StudyNumber", "Unique Study identifier", List.Sum),
// second tabe -- study dates only
#"Removed Columns2" = Table.RemoveColumns(#"Added Custom",{"Unique Study identifier"}),
#"Pivoted Column1" = Table.Pivot(Table.TransformColumnTypes(#"Removed Columns2", {{"StudyNumber", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Removed Columns2", {{"StudyNumber", type text}}, "en-US")[StudyNumber]), "StudyNumber", "Date of Test"),
RenamedColumns = Table.TransformColumnNames(#"Pivoted Column1", each if _="Unique participant identifier" then _ else _&" Date"),
#"Removed Columns1" = Table.RemoveColumns(RenamedColumns,{"Unique participant identifier"}),
// combine the tables
combined = Table.FromColumns(Table.ToColumns(#"Pivoted Column")&Table.ToColumns( #"Removed Columns1"),Table.ColumnNames(#"Pivoted Column") &Table.ColumnNames(#"Removed Columns1"))
in combined

yes/no

的代码版本
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Unique participant identifier", type text}, {"Unique Study identifier", Int64.Type}, {"Date of Test", type date}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "StudyNumber", each Number.RoundDown([Unique Study identifier]/100)),
// first table -- studies only
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Date of Test"}),
#"Rounded Off" = Table.TransformColumns(#"Removed Columns",{{"StudyNumber", each Text.From(_) &" Identifier"}}),
Part1 = Table.Pivot(Table.TransformColumnTypes(#"Rounded Off", {{"StudyNumber", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Rounded Off", {{"StudyNumber", type text}}, "en-US")[StudyNumber]), "StudyNumber", "Unique Study identifier", List.Sum),
// 2nd  table -- YN only
#"Change to Y" = Table.TransformColumns(#"Removed Columns",{{"Unique Study identifier",each "Y" }}),
#"Rounded Off1" = Table.TransformColumns(#"Change to Y",{{"StudyNumber", each "Study " &Text.From(_)}}),
Pivot = Table.Pivot(Table.TransformColumnTypes(#"Rounded Off1", {{"StudyNumber", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Rounded Off1", {{"StudyNumber", type text}}, "en-US")[StudyNumber]), "StudyNumber", "Unique Study identifier"),
Part2 = Table.RemoveColumns(Pivot,{"Unique participant identifier"}),
// 3rd table -- study dates only
#"Removed Columns2" = Table.RemoveColumns(#"Added Custom",{"Unique Study identifier"}),
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Removed Columns2", {{"StudyNumber", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Removed Columns2", {{"StudyNumber", type text}}, "en-US")[StudyNumber]), "StudyNumber", "Date of Test"),
RenamedColumns3 = Table.TransformColumnNames(#"Pivoted Column", each if _="Unique participant identifier" then _ else _&" Date"),
Part3 = Table.RemoveColumns(RenamedColumns3,{"Unique participant identifier"}),
// combine the tables
combined = Table.FromColumns(Table.ToColumns(Part1)&Table.ToColumns( Part2)&Table.ToColumns( Part3),Table.ColumnNames(Part1) &Table.ColumnNames(Part2)&Table.ColumnNames(Part3))
in combined