如何删除不需要的行值,然后根据 ADF/Azure SQL 数据库中的键合并行?
How to remove unwanted row values and then merge rows based on a key in ADF/Azure SQL DB?
我有一个 table 看起来像这样:
ID
Supplier Number
Supplier Name
Address
Postcode
City
State
First Name
Last Name
1
13
Example.com
Uwanted Data
Unwanted Data
1
15 Example St
9999
Brisbane
QLD
1
Unwanted Data
Uwanted Data
John
Doe
2
16
New Example Services
Uwanted Data
Unwanted Data
2
15 Test Drive
6789
Melbourne
VIC
2
Unwanted Data
Uwanted Data
Jane
Test
其中“不需要的数据”是指最终数据集中不需要的值。
我想要做的是删除“不需要的数据”值,然后合并具有相同键的行,这样我们最终得到以下 table:
ID
Supplier Number
Supplier Name
Address
Postcode
City
State
First Name
Last Name
1
13
Example.com
15 Example St
9999
Brisbane
QLD
John
Doe
2
16
New Example Services
15 Test Drive
6789
Melbourne
VIC
Jane
Test
有没有办法在 Azure 数据工厂中执行此操作?或者,我可以连接到 Azure SQL 数据库以 运行 我能够完成工作的任何 SQL 命令。
非常感谢。
编辑:在某些情况下,不需要的数据可能与列中所需的值具有相似的类型或值。例如,在“名字”列中,一个特定 ID 可能有 Joe 和 John。但是,所需的值相对于每个 ID 位于完全相同的位置。也就是说,所需的名字在每个 ID 的第三行。
正如评论中提到的@Nick.McDermaid、@Anand Sowmithiran,识别不需要的数据,您可以将每列中的 unwanted data/uwanted data
替换为 blank 或NULL 并应用 max() 函数来获取聚合值。
您可以直接在 Azure SQL 数据库中编写查询,以获得如下预期结果。
select id ID,
max(case when [Supplier Number] = 'Uwanted Data' or [Supplier Number] = 'Unwanted Data' then '' else [Supplier Number] end) [Supplier Number],
max(case when [Supplier Name]= 'Uwanted Data' or [Supplier Name] = 'Unwanted Data' then '' else [Supplier Name] end) [Supplier Name],
max(case when [Address] = 'Uwanted Data' or [Address] = 'Unwanted Data' then '' else [Address] end) [Address],
max(case when [Postcode]= 'Uwanted Data' or [Postcode] = 'Unwanted Data' then '' else [Postcode] end) [Postcode],
max(case when [City]= 'Uwanted Data' or [City] = 'Unwanted Data' then '' else [City] end) [City],
max(case when [State]= 'Uwanted Data' or [State] = 'Unwanted Data' then '' else [State] end) [State],
max(case when [First Name]= 'Uwanted Data' or [First Name] = 'Unwanted Data' then '' else [First Name] end) [First Name],
max(case when [Last Name]= 'Uwanted Data' or [Last Name] = 'Unwanted Data' then '' else [Last Name] end) [Last Name]
from tb1
group by id
- 您在 Azure 数据工厂副本 activity 源中使用相同的查询,方法是使用查询选项。
This gives your expected results only when there is a single correct value in a column per ID. It gives different results if there are more than 1 correct value in a column per ID value.
我有一个 table 看起来像这样:
ID | Supplier Number | Supplier Name | Address | Postcode | City | State | First Name | Last Name |
---|---|---|---|---|---|---|---|---|
1 | 13 | Example.com | Uwanted Data | Unwanted Data | ||||
1 | 15 Example St | 9999 | Brisbane | QLD | ||||
1 | Unwanted Data | Uwanted Data | John | Doe | ||||
2 | 16 | New Example Services | Uwanted Data | Unwanted Data | ||||
2 | 15 Test Drive | 6789 | Melbourne | VIC | ||||
2 | Unwanted Data | Uwanted Data | Jane | Test |
其中“不需要的数据”是指最终数据集中不需要的值。 我想要做的是删除“不需要的数据”值,然后合并具有相同键的行,这样我们最终得到以下 table:
ID | Supplier Number | Supplier Name | Address | Postcode | City | State | First Name | Last Name |
---|---|---|---|---|---|---|---|---|
1 | 13 | Example.com | 15 Example St | 9999 | Brisbane | QLD | John | Doe |
2 | 16 | New Example Services | 15 Test Drive | 6789 | Melbourne | VIC | Jane | Test |
有没有办法在 Azure 数据工厂中执行此操作?或者,我可以连接到 Azure SQL 数据库以 运行 我能够完成工作的任何 SQL 命令。
非常感谢。
编辑:在某些情况下,不需要的数据可能与列中所需的值具有相似的类型或值。例如,在“名字”列中,一个特定 ID 可能有 Joe 和 John。但是,所需的值相对于每个 ID 位于完全相同的位置。也就是说,所需的名字在每个 ID 的第三行。
正如评论中提到的@Nick.McDermaid、@Anand Sowmithiran,识别不需要的数据,您可以将每列中的 unwanted data/uwanted data
替换为 blank 或NULL 并应用 max() 函数来获取聚合值。
您可以直接在 Azure SQL 数据库中编写查询,以获得如下预期结果。
select id ID, max(case when [Supplier Number] = 'Uwanted Data' or [Supplier Number] = 'Unwanted Data' then '' else [Supplier Number] end) [Supplier Number], max(case when [Supplier Name]= 'Uwanted Data' or [Supplier Name] = 'Unwanted Data' then '' else [Supplier Name] end) [Supplier Name], max(case when [Address] = 'Uwanted Data' or [Address] = 'Unwanted Data' then '' else [Address] end) [Address], max(case when [Postcode]= 'Uwanted Data' or [Postcode] = 'Unwanted Data' then '' else [Postcode] end) [Postcode], max(case when [City]= 'Uwanted Data' or [City] = 'Unwanted Data' then '' else [City] end) [City], max(case when [State]= 'Uwanted Data' or [State] = 'Unwanted Data' then '' else [State] end) [State], max(case when [First Name]= 'Uwanted Data' or [First Name] = 'Unwanted Data' then '' else [First Name] end) [First Name], max(case when [Last Name]= 'Uwanted Data' or [Last Name] = 'Unwanted Data' then '' else [Last Name] end) [Last Name] from tb1 group by id
- 您在 Azure 数据工厂副本 activity 源中使用相同的查询,方法是使用查询选项。
This gives your expected results only when there is a single correct value in a column per ID. It gives different results if there are more than 1 correct value in a column per ID value.