如何在 pyspark 数据框中添加多个具有 when 条件的新列?
How to add multiple new columns with when condition in pyspark dataframe?
I need to add two new columns to my existing pyspark dataframe.
Below is my sample data:
Section Grade Promotion_grade Section_team
Admin C
Account B
IT B
condition :
If Section = Admin then Promotion_grade = B
If Section = Account then Promotion_grade = A
If Section = IT then
If Grade = C then Promotion_grade = B & Section_team= team1
If Grade = D then Promotion_grade = C & Section_team= team2
If Grade = A then Promotion_grade = A+ & Section_team= team3
我可以为前两个条件添加一列。但是其余的情况我就不知道了。
def addCols(data):
data = (data.withColumn('Promotion_grade', F.when(data.Section =='Admin', 'B')
.when(data.Section =='Account', 'A')
.otherwise('Not applicable')))
return data
有人可以帮助我吗?可能是我做的方式不对。谢谢
您可以嵌套 when
条件来处理嵌套条件。
工作示例
from pyspark.sql import functions as F
data = [("Admin", "C", ),
("Account", "B", ),
("IT", "B", ),
("IT", "C", ),
("IT", "D", ),
("IT", "A", ),]
df = spark.createDataFrame(data, ("Section", "Grade", ))
# Define Promotion Grade conditions for IT Section
it_promotion_grade = (F.when(F.col("Grade") == "C", "B")
.when(F.col("Grade") == "D", "C")
.when(F.col("Grade") == "A", "A+")
.otherwise("Not applicable"))
# Define Section Team conditions for IT Section
it_section_team = (F.when(F.col("Grade") == "C", "team1")
.when(F.col("Grade") == "D", "team2")
.when(F.col("Grade") == "A", "team3")
.otherwise("Not applicable"))
(df.withColumn("Promotion_grade", F.when(F.col("Section") == "Admin", "B")
.when(F.col("Section") == "Account", "A")
.when(F.col("Section") == "IT", it_promotion_grade)
.otherwise("Not applicable"))
.withColumn("Section_team", F.when(F.col("Section") == "IT", it_section_team)
.otherwise("Not applicable"))
.show())
输出
+-------+-----+---------------+--------------+
|Section|Grade|Promotion_grade| Section_team|
+-------+-----+---------------+--------------+
| Admin| C| B|Not applicable|
|Account| B| A|Not applicable|
| IT| B| Not applicable|Not applicable|
| IT| C| B| team1|
| IT| D| C| team2|
| IT| A| A+| team3|
+-------+-----+---------------+--------------+
I need to add two new columns to my existing pyspark dataframe.
Below is my sample data:
Section Grade Promotion_grade Section_team
Admin C
Account B
IT B
condition :
If Section = Admin then Promotion_grade = B
If Section = Account then Promotion_grade = A
If Section = IT then
If Grade = C then Promotion_grade = B & Section_team= team1
If Grade = D then Promotion_grade = C & Section_team= team2
If Grade = A then Promotion_grade = A+ & Section_team= team3
我可以为前两个条件添加一列。但是其余的情况我就不知道了。
def addCols(data):
data = (data.withColumn('Promotion_grade', F.when(data.Section =='Admin', 'B')
.when(data.Section =='Account', 'A')
.otherwise('Not applicable')))
return data
有人可以帮助我吗?可能是我做的方式不对。谢谢
您可以嵌套 when
条件来处理嵌套条件。
工作示例
from pyspark.sql import functions as F
data = [("Admin", "C", ),
("Account", "B", ),
("IT", "B", ),
("IT", "C", ),
("IT", "D", ),
("IT", "A", ),]
df = spark.createDataFrame(data, ("Section", "Grade", ))
# Define Promotion Grade conditions for IT Section
it_promotion_grade = (F.when(F.col("Grade") == "C", "B")
.when(F.col("Grade") == "D", "C")
.when(F.col("Grade") == "A", "A+")
.otherwise("Not applicable"))
# Define Section Team conditions for IT Section
it_section_team = (F.when(F.col("Grade") == "C", "team1")
.when(F.col("Grade") == "D", "team2")
.when(F.col("Grade") == "A", "team3")
.otherwise("Not applicable"))
(df.withColumn("Promotion_grade", F.when(F.col("Section") == "Admin", "B")
.when(F.col("Section") == "Account", "A")
.when(F.col("Section") == "IT", it_promotion_grade)
.otherwise("Not applicable"))
.withColumn("Section_team", F.when(F.col("Section") == "IT", it_section_team)
.otherwise("Not applicable"))
.show())
输出
+-------+-----+---------------+--------------+
|Section|Grade|Promotion_grade| Section_team|
+-------+-----+---------------+--------------+
| Admin| C| B|Not applicable|
|Account| B| A|Not applicable|
| IT| B| Not applicable|Not applicable|
| IT| C| B| team1|
| IT| D| C| team2|
| IT| A| A+| team3|
+-------+-----+---------------+--------------+