分解 table 以在列中旋转 (SQL,PYSPARK)
Break down a table to pivot in columns (SQL,PYSPARK)
我在 AWS Glue 中使用 python3.6 的 pyspark 环境中工作。我有这个 table :
+----+-----+-----+-----+
|year|month|total| loop|
+----+-----+-----+-----+
|2012| 1| 20|loop1|
|2012| 2| 30|loop1|
|2012| 1| 10|loop2|
|2012| 2| 5|loop2|
|2012| 1| 50|loop3|
|2012| 2| 60|loop3|
+----+-----+-----+-----+
我需要得到如下输出:
year month total_loop1 total_loop2 total_loop3
2012 1 20 10 50
2012 2 30 5 60
我越接近SQL代码:
select a.year,a.month, a.total,b.total from test a
left join test b
on a.loop <> b.loop
and a.year = b.year and a.month=b.month
到目前为止仍然有输出:
+----+-----+-----+-----+
|year|month|total|total|
+----+-----+-----+-----+
|2012| 1| 20| 10|
|2012| 1| 20| 50|
|2012| 1| 10| 20|
|2012| 1| 10| 50|
|2012| 1| 50| 20|
|2012| 1| 50| 10|
|2012| 2| 30| 5|
|2012| 2| 30| 60|
|2012| 2| 5| 30|
|2012| 2| 5| 60|
|2012| 2| 60| 30|
|2012| 2| 60| 5|
+----+-----+-----+-----+
我该怎么做?非常感谢
你不需要使用join
你可以做条件聚合:
select year, month,
max(case when loop = 'loop1' then total end) loop1,
max(case when loop = 'loop2' then total end) loop2,
max(case when loop = 'loop3' then total end) loop3
from test a
group by year, month;
您可以使用 PIVOT()
将行转换为列:
SELECT
year,
MONTH,
p.loop1 AS 'total_loop1',
p.loop2 AS 'total_loop2',
p.loop3 AS 'total_loop3'
FROM
tablename
PIVOT
(MAX(total)
FOR loop IN ([loop1], [loop2], [loop3])
) AS p;
Table 脚本和示例数据
CREATE TABLE [TableName](
[year] [nvarchar](50) NULL,
[month] [int] NULL,
[total] [int] NULL,
[loop] [nvarchar](50) NULL
)
INSERT [TableName] ([year], [month], [total], [loop]) VALUES (N'2012', 1, 20, N'loop1')
INSERT [TableName] ([year], [month], [total], [loop]) VALUES (N'2012', 2, 30, N'loop1')
INSERT [TableName] ([year], [month], [total], [loop]) VALUES (N'2012', 1, 10, N'loop2')
INSERT [TableName] ([year], [month], [total], [loop]) VALUES (N'2012', 2, 5, N'loop2')
INSERT [TableName] ([year], [month], [total], [loop]) VALUES (N'2012', 1, 50, N'loop3')
INSERT [TableName] ([year], [month], [total], [loop]) VALUES (N'2012', 2, 60, N'loop3')
使用透视功能...
SELECT *
FROM TableName
PIVOT(Max([total])
FOR [loop] IN ([loop1], [loop2], [loop3]) ) pvt
在线演示:http://www.sqlfiddle.com/#!18/164a4/1/0
如果您正在寻找动态解决方案,请试试这个...(动态枢轴)
DECLARE @cols AS NVARCHAR(max) = Stuff((SELECT DISTINCT ',' + Quotename([loop])
FROM TableName
FOR xml path(''), type).value('.', 'NVARCHAR(MAX)'), 1, 1, '');
DECLARE @query AS NVARCHAR(max) = 'SELECT *
FROM TableName
PIVOT(Max([total])
FOR [loop] IN ('+ @cols +') ) pvt';
EXECUTE(@query)
在线演示:http://www.sqlfiddle.com/#!18/164a4/3/0
输出
+------+-------+-------+-------+-------+
| year | month | loop1 | loop2 | loop3 |
+------+-------+-------+-------+-------+
| 2012 | 1 | 20 | 10 | 50 |
| 2012 | 2 | 30 | 5 | 60 |
+------+-------+-------+-------+-------+
我在 AWS Glue 中使用 python3.6 的 pyspark 环境中工作。我有这个 table :
+----+-----+-----+-----+
|year|month|total| loop|
+----+-----+-----+-----+
|2012| 1| 20|loop1|
|2012| 2| 30|loop1|
|2012| 1| 10|loop2|
|2012| 2| 5|loop2|
|2012| 1| 50|loop3|
|2012| 2| 60|loop3|
+----+-----+-----+-----+
我需要得到如下输出:
year month total_loop1 total_loop2 total_loop3
2012 1 20 10 50
2012 2 30 5 60
我越接近SQL代码:
select a.year,a.month, a.total,b.total from test a
left join test b
on a.loop <> b.loop
and a.year = b.year and a.month=b.month
到目前为止仍然有输出:
+----+-----+-----+-----+
|year|month|total|total|
+----+-----+-----+-----+
|2012| 1| 20| 10|
|2012| 1| 20| 50|
|2012| 1| 10| 20|
|2012| 1| 10| 50|
|2012| 1| 50| 20|
|2012| 1| 50| 10|
|2012| 2| 30| 5|
|2012| 2| 30| 60|
|2012| 2| 5| 30|
|2012| 2| 5| 60|
|2012| 2| 60| 30|
|2012| 2| 60| 5|
+----+-----+-----+-----+
我该怎么做?非常感谢
你不需要使用join
你可以做条件聚合:
select year, month,
max(case when loop = 'loop1' then total end) loop1,
max(case when loop = 'loop2' then total end) loop2,
max(case when loop = 'loop3' then total end) loop3
from test a
group by year, month;
您可以使用 PIVOT()
将行转换为列:
SELECT
year,
MONTH,
p.loop1 AS 'total_loop1',
p.loop2 AS 'total_loop2',
p.loop3 AS 'total_loop3'
FROM
tablename
PIVOT
(MAX(total)
FOR loop IN ([loop1], [loop2], [loop3])
) AS p;
Table 脚本和示例数据
CREATE TABLE [TableName](
[year] [nvarchar](50) NULL,
[month] [int] NULL,
[total] [int] NULL,
[loop] [nvarchar](50) NULL
)
INSERT [TableName] ([year], [month], [total], [loop]) VALUES (N'2012', 1, 20, N'loop1')
INSERT [TableName] ([year], [month], [total], [loop]) VALUES (N'2012', 2, 30, N'loop1')
INSERT [TableName] ([year], [month], [total], [loop]) VALUES (N'2012', 1, 10, N'loop2')
INSERT [TableName] ([year], [month], [total], [loop]) VALUES (N'2012', 2, 5, N'loop2')
INSERT [TableName] ([year], [month], [total], [loop]) VALUES (N'2012', 1, 50, N'loop3')
INSERT [TableName] ([year], [month], [total], [loop]) VALUES (N'2012', 2, 60, N'loop3')
使用透视功能...
SELECT *
FROM TableName
PIVOT(Max([total])
FOR [loop] IN ([loop1], [loop2], [loop3]) ) pvt
在线演示:http://www.sqlfiddle.com/#!18/164a4/1/0
如果您正在寻找动态解决方案,请试试这个...(动态枢轴)
DECLARE @cols AS NVARCHAR(max) = Stuff((SELECT DISTINCT ',' + Quotename([loop])
FROM TableName
FOR xml path(''), type).value('.', 'NVARCHAR(MAX)'), 1, 1, '');
DECLARE @query AS NVARCHAR(max) = 'SELECT *
FROM TableName
PIVOT(Max([total])
FOR [loop] IN ('+ @cols +') ) pvt';
EXECUTE(@query)
在线演示:http://www.sqlfiddle.com/#!18/164a4/3/0
输出
+------+-------+-------+-------+-------+
| year | month | loop1 | loop2 | loop3 |
+------+-------+-------+-------+-------+
| 2012 | 1 | 20 | 10 | 50 |
| 2012 | 2 | 30 | 5 | 60 |
+------+-------+-------+-------+-------+