Power Query - 为多项研究创建跟踪器
Power Query - creating a tracker for multiple studies
Power Query 的新手并尝试了解如何更多地使用它来优化工作流。
在这里,我有多个 tables(每个研究 1 个),我正在尝试创建一个跟踪器来识别受试者、他们是否参与特定研究以及其他数据(例如,研究 ID)。每个table有:
- 唯一的受试者标识符(可用于 link 他们跨研究)
- 研究特定标识符
- 其他杂项信息(测试 1、测试 2 等的日期)
我正在尝试通过唯一的主题标识符对我所有 table 的列进行分组,然后可以为“Study1”、“Study 2”、“Study 3”添加列一个简单的 Y/N 来显示他们是否参与了该研究。如果是,则还显示特定于研究的标识符。但是,我卡住了。
我已将所有 3 个 table 附加到 1 个主 table - 这会导致重复,因为参与者可以参与多项研究。当我尝试“分组依据”功能时,我按“唯一主题标识符”对新列 Study1、Study2、Study3 等进行分组,并操作“所有行”。然而,当我展开这些时,它会创建多个重复行,这违背了“分组依据”功能的目的。
如果您有任何建议,我们将不胜感激。
例子
每个 table 都有这些列的一些变化:
Unique participant identifier
Unique Study identifier
Date of Test 1, etc
A
100
01-Apr-2022
B
101
02-Apr-2022
C
102
03-Apr-2022
假设参与者 A 仅参加研究 1。参与者 B 正在进行研究 2(研究 ID 201)和研究 3(研究 ID 301)。参与者 C 在所有 3 个上(研究 ID 分别为 102、202 和 302)。
我正在尝试制作一个 table 将显示:
Unique participant identifier
Study 1
Study 1 Identifier
Study 2
Study 2 Identifier
Study 3
Study 3 Identifier
A
Y
100
B
Y
101
Y
201
Y
301
C
Y
102
Y
302
在测试日期旁边(未显示,但概念相同)。这些 tables 会随着我们的进行而更新,因此 power query 将从 tables 中提取这些数据以创建“实时”跟踪器。
我在高级编辑器中的当前代码是:
let
Source = Table.Combine({STUDY1, STUDY2, STUDY3}),
#"Grouped Rows" = Table.Group(#"Removed Columns", {"Unique participant identifier"}, {{"STUDY12", each _, type table}, {"STUDY22", each _, type table}, {"STUDY32", each _, type table}}),
#"Expanded STUDY12" = Table.ExpandTableColumn(#"Grouped Rows", "STUDY12", {"STUDY1-Study ID", "STUDY1-ABC Study ID", "STUDY1"}, {"STUDY12.STUDY1-Study ID", "STUDY12.STUDY1-ABC Study ID", "STUDY12.STUDY1"}),
#"Expanded STUDY22" = Table.ExpandTableColumn(#"Expanded STUDY12", "STUDY22", {"STUDY2-Study ID", "STUDY2-OC Study ID", "STUDY2"}, {"STUDY22.STUDY2-Study ID", "STUDY22.STUDY2-ABC Study ID", "STUDY22.STUDY2"}),
#"Expanded STUDY32" = Table.ExpandTableColumn(#"Expanded STUDY22", "STUDY32", {"STUDY3-Study ID", "STUDY3"}, {"STUDY32.STUDY3-Study ID", "STUDY32.STUDY3"}),
#"Reordered Columns" = Table.ReorderColumns(#"Expanded STUDY32",{" Unique participant identifier ", "STUDY12.STUDY1-Study ID", "STUDY12.STUDY1", "STUDY12.STUDY1-ABC Study ID", "STUDY22.STUDY2", "STUDY22.STUDY2-Study ID", "STUDY22.STUDY2-ABC Study ID", "STUDY32.STUDY3", "STUDY32.STUDY3-Study ID"})
in
#"Reordered Columns"
通过“分组依据”功能完成。不过,这会为每个参与者创建多行。
这是一种方法。您的样本数据不是很好。您也没有提到如何将唯一研究标识符划分到桶中。我假设它只是数字除以 100 的整数部分。为了让步骤变得明显,这有点长
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Unique participant identifier", type text}, {"Unique Study identifier", Int64.Type}, {"Date of Test", type date}}),
// there needs to be some way to identify groups of studies. I am using the hundreds place to find that
#"Added Custom" = Table.AddColumn(#"Changed Type", "StudyNumber", each Number.RoundDown([Unique Study identifier]/100)),
// first table -- studies only
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Date of Test"}),
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Removed Columns", {{"StudyNumber", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Removed Columns", {{"StudyNumber", type text}}, "en-US")[StudyNumber]), "StudyNumber", "Unique Study identifier", List.Sum),
// second tabe -- study dates only
#"Removed Columns2" = Table.RemoveColumns(#"Added Custom",{"Unique Study identifier"}),
#"Pivoted Column1" = Table.Pivot(Table.TransformColumnTypes(#"Removed Columns2", {{"StudyNumber", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Removed Columns2", {{"StudyNumber", type text}}, "en-US")[StudyNumber]), "StudyNumber", "Date of Test"),
RenamedColumns = Table.TransformColumnNames(#"Pivoted Column1", each if _="Unique participant identifier" then _ else _&" Date"),
#"Removed Columns1" = Table.RemoveColumns(RenamedColumns,{"Unique participant identifier"}),
// combine the tables
combined = Table.FromColumns(Table.ToColumns(#"Pivoted Column")&Table.ToColumns( #"Removed Columns1"),Table.ColumnNames(#"Pivoted Column") &Table.ColumnNames(#"Removed Columns1"))
in combined
yes/no
的代码版本
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Unique participant identifier", type text}, {"Unique Study identifier", Int64.Type}, {"Date of Test", type date}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "StudyNumber", each Number.RoundDown([Unique Study identifier]/100)),
// first table -- studies only
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Date of Test"}),
#"Rounded Off" = Table.TransformColumns(#"Removed Columns",{{"StudyNumber", each Text.From(_) &" Identifier"}}),
Part1 = Table.Pivot(Table.TransformColumnTypes(#"Rounded Off", {{"StudyNumber", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Rounded Off", {{"StudyNumber", type text}}, "en-US")[StudyNumber]), "StudyNumber", "Unique Study identifier", List.Sum),
// 2nd table -- YN only
#"Change to Y" = Table.TransformColumns(#"Removed Columns",{{"Unique Study identifier",each "Y" }}),
#"Rounded Off1" = Table.TransformColumns(#"Change to Y",{{"StudyNumber", each "Study " &Text.From(_)}}),
Pivot = Table.Pivot(Table.TransformColumnTypes(#"Rounded Off1", {{"StudyNumber", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Rounded Off1", {{"StudyNumber", type text}}, "en-US")[StudyNumber]), "StudyNumber", "Unique Study identifier"),
Part2 = Table.RemoveColumns(Pivot,{"Unique participant identifier"}),
// 3rd table -- study dates only
#"Removed Columns2" = Table.RemoveColumns(#"Added Custom",{"Unique Study identifier"}),
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Removed Columns2", {{"StudyNumber", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Removed Columns2", {{"StudyNumber", type text}}, "en-US")[StudyNumber]), "StudyNumber", "Date of Test"),
RenamedColumns3 = Table.TransformColumnNames(#"Pivoted Column", each if _="Unique participant identifier" then _ else _&" Date"),
Part3 = Table.RemoveColumns(RenamedColumns3,{"Unique participant identifier"}),
// combine the tables
combined = Table.FromColumns(Table.ToColumns(Part1)&Table.ToColumns( Part2)&Table.ToColumns( Part3),Table.ColumnNames(Part1) &Table.ColumnNames(Part2)&Table.ColumnNames(Part3))
in combined
Power Query 的新手并尝试了解如何更多地使用它来优化工作流。
在这里,我有多个 tables(每个研究 1 个),我正在尝试创建一个跟踪器来识别受试者、他们是否参与特定研究以及其他数据(例如,研究 ID)。每个table有:
- 唯一的受试者标识符(可用于 link 他们跨研究)
- 研究特定标识符
- 其他杂项信息(测试 1、测试 2 等的日期)
我正在尝试通过唯一的主题标识符对我所有 table 的列进行分组,然后可以为“Study1”、“Study 2”、“Study 3”添加列一个简单的 Y/N 来显示他们是否参与了该研究。如果是,则还显示特定于研究的标识符。但是,我卡住了。
我已将所有 3 个 table 附加到 1 个主 table - 这会导致重复,因为参与者可以参与多项研究。当我尝试“分组依据”功能时,我按“唯一主题标识符”对新列 Study1、Study2、Study3 等进行分组,并操作“所有行”。然而,当我展开这些时,它会创建多个重复行,这违背了“分组依据”功能的目的。
如果您有任何建议,我们将不胜感激。
例子
每个 table 都有这些列的一些变化:
Unique participant identifier | Unique Study identifier | Date of Test 1, etc |
---|---|---|
A | 100 | 01-Apr-2022 |
B | 101 | 02-Apr-2022 |
C | 102 | 03-Apr-2022 |
假设参与者 A 仅参加研究 1。参与者 B 正在进行研究 2(研究 ID 201)和研究 3(研究 ID 301)。参与者 C 在所有 3 个上(研究 ID 分别为 102、202 和 302)。
我正在尝试制作一个 table 将显示:
Unique participant identifier | Study 1 | Study 1 Identifier | Study 2 | Study 2 Identifier | Study 3 | Study 3 Identifier |
---|---|---|---|---|---|---|
A | Y | 100 | ||||
B | Y | 101 | Y | 201 | Y | 301 |
C | Y | 102 | Y | 302 |
在测试日期旁边(未显示,但概念相同)。这些 tables 会随着我们的进行而更新,因此 power query 将从 tables 中提取这些数据以创建“实时”跟踪器。
我在高级编辑器中的当前代码是:
let
Source = Table.Combine({STUDY1, STUDY2, STUDY3}),
#"Grouped Rows" = Table.Group(#"Removed Columns", {"Unique participant identifier"}, {{"STUDY12", each _, type table}, {"STUDY22", each _, type table}, {"STUDY32", each _, type table}}),
#"Expanded STUDY12" = Table.ExpandTableColumn(#"Grouped Rows", "STUDY12", {"STUDY1-Study ID", "STUDY1-ABC Study ID", "STUDY1"}, {"STUDY12.STUDY1-Study ID", "STUDY12.STUDY1-ABC Study ID", "STUDY12.STUDY1"}),
#"Expanded STUDY22" = Table.ExpandTableColumn(#"Expanded STUDY12", "STUDY22", {"STUDY2-Study ID", "STUDY2-OC Study ID", "STUDY2"}, {"STUDY22.STUDY2-Study ID", "STUDY22.STUDY2-ABC Study ID", "STUDY22.STUDY2"}),
#"Expanded STUDY32" = Table.ExpandTableColumn(#"Expanded STUDY22", "STUDY32", {"STUDY3-Study ID", "STUDY3"}, {"STUDY32.STUDY3-Study ID", "STUDY32.STUDY3"}),
#"Reordered Columns" = Table.ReorderColumns(#"Expanded STUDY32",{" Unique participant identifier ", "STUDY12.STUDY1-Study ID", "STUDY12.STUDY1", "STUDY12.STUDY1-ABC Study ID", "STUDY22.STUDY2", "STUDY22.STUDY2-Study ID", "STUDY22.STUDY2-ABC Study ID", "STUDY32.STUDY3", "STUDY32.STUDY3-Study ID"})
in
#"Reordered Columns"
通过“分组依据”功能完成。不过,这会为每个参与者创建多行。
这是一种方法。您的样本数据不是很好。您也没有提到如何将唯一研究标识符划分到桶中。我假设它只是数字除以 100 的整数部分。为了让步骤变得明显,这有点长
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Unique participant identifier", type text}, {"Unique Study identifier", Int64.Type}, {"Date of Test", type date}}),
// there needs to be some way to identify groups of studies. I am using the hundreds place to find that
#"Added Custom" = Table.AddColumn(#"Changed Type", "StudyNumber", each Number.RoundDown([Unique Study identifier]/100)),
// first table -- studies only
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Date of Test"}),
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Removed Columns", {{"StudyNumber", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Removed Columns", {{"StudyNumber", type text}}, "en-US")[StudyNumber]), "StudyNumber", "Unique Study identifier", List.Sum),
// second tabe -- study dates only
#"Removed Columns2" = Table.RemoveColumns(#"Added Custom",{"Unique Study identifier"}),
#"Pivoted Column1" = Table.Pivot(Table.TransformColumnTypes(#"Removed Columns2", {{"StudyNumber", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Removed Columns2", {{"StudyNumber", type text}}, "en-US")[StudyNumber]), "StudyNumber", "Date of Test"),
RenamedColumns = Table.TransformColumnNames(#"Pivoted Column1", each if _="Unique participant identifier" then _ else _&" Date"),
#"Removed Columns1" = Table.RemoveColumns(RenamedColumns,{"Unique participant identifier"}),
// combine the tables
combined = Table.FromColumns(Table.ToColumns(#"Pivoted Column")&Table.ToColumns( #"Removed Columns1"),Table.ColumnNames(#"Pivoted Column") &Table.ColumnNames(#"Removed Columns1"))
in combined
yes/no
的代码版本let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Unique participant identifier", type text}, {"Unique Study identifier", Int64.Type}, {"Date of Test", type date}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "StudyNumber", each Number.RoundDown([Unique Study identifier]/100)),
// first table -- studies only
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Date of Test"}),
#"Rounded Off" = Table.TransformColumns(#"Removed Columns",{{"StudyNumber", each Text.From(_) &" Identifier"}}),
Part1 = Table.Pivot(Table.TransformColumnTypes(#"Rounded Off", {{"StudyNumber", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Rounded Off", {{"StudyNumber", type text}}, "en-US")[StudyNumber]), "StudyNumber", "Unique Study identifier", List.Sum),
// 2nd table -- YN only
#"Change to Y" = Table.TransformColumns(#"Removed Columns",{{"Unique Study identifier",each "Y" }}),
#"Rounded Off1" = Table.TransformColumns(#"Change to Y",{{"StudyNumber", each "Study " &Text.From(_)}}),
Pivot = Table.Pivot(Table.TransformColumnTypes(#"Rounded Off1", {{"StudyNumber", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Rounded Off1", {{"StudyNumber", type text}}, "en-US")[StudyNumber]), "StudyNumber", "Unique Study identifier"),
Part2 = Table.RemoveColumns(Pivot,{"Unique participant identifier"}),
// 3rd table -- study dates only
#"Removed Columns2" = Table.RemoveColumns(#"Added Custom",{"Unique Study identifier"}),
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Removed Columns2", {{"StudyNumber", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Removed Columns2", {{"StudyNumber", type text}}, "en-US")[StudyNumber]), "StudyNumber", "Date of Test"),
RenamedColumns3 = Table.TransformColumnNames(#"Pivoted Column", each if _="Unique participant identifier" then _ else _&" Date"),
Part3 = Table.RemoveColumns(RenamedColumns3,{"Unique participant identifier"}),
// combine the tables
combined = Table.FromColumns(Table.ToColumns(Part1)&Table.ToColumns( Part2)&Table.ToColumns( Part3),Table.ColumnNames(Part1) &Table.ColumnNames(Part2)&Table.ColumnNames(Part3))
in combined