如何用 SQL 中类别的平均值替换 NULL 值?
How to replace NULL values with Mean value of a category in SQL?
我有一个在 'revenues_from_appointment'
列中包含空值的数据集
数据集
appointment_date
patient_id
practitioner_id
appointment_duration_min
revenues_from_appointment
2021-06-28
42734
748
30
90.0
2021-06-29
42737
747
60
150.0
2021-07-01
42737
747
60
NaN
2021-07-03
42736
748
30
60.0
2021-07-03
42735
747
15
42.62
2021-07-04
42734
748
30
NaN
2021-07-05
42734
748
30
100.0
2021-07-10
42738
747
15
50.72
2021-08-12
42739
748
30
73.43
我希望用行的平均值替换 NULL 值,其中“patient_id、practitioner_id、appointment_duration_min”相同.
我使用 pandas 数据框,
df['revenues_from_appointment'].fillna(df.groupby(['patient_id','practitioner_id','appointment_duration_min'])['revenues_from_appointment'].transform('mean'), inplace = True)
如何使用SQL得到相同的结果?
最终输出
appointment_date
patient_id
practitioner_id
appointment_duration_min
revenues_from_appointment
2021-06-28
42734
748
30
90.0
2021-06-29
42737
747
60
150.0
2021-07-01
42737
747
60
150.0
2021-07-03
42736
748
30
60.0
2021-07-03
42735
747
15
42.62
2021-07-04
42734
748
30
95.0
2021-07-05
42734
748
30
100.0
2021-07-10
42738
747
15
50.72
2021-08-12
42739
748
30
73.43
您可以使用 AVG
window 函数,它将对感兴趣的三列进行分区并使用 COALESCE
函数替换空值:
SELECT appointment_date,
patient_id,
practitioner_id,
appointment_duration_min,
COALESCE(revenues_from_appointment,
AVG(revenues_from_appointment) OVER(PARTITION BY patient_id,
practitioner_id,
appointment_duration_min))
FROM tab
试试看 here.
我有一个在 'revenues_from_appointment'
列中包含空值的数据集数据集
appointment_date | patient_id | practitioner_id | appointment_duration_min | revenues_from_appointment |
---|---|---|---|---|
2021-06-28 | 42734 | 748 | 30 | 90.0 |
2021-06-29 | 42737 | 747 | 60 | 150.0 |
2021-07-01 | 42737 | 747 | 60 | NaN |
2021-07-03 | 42736 | 748 | 30 | 60.0 |
2021-07-03 | 42735 | 747 | 15 | 42.62 |
2021-07-04 | 42734 | 748 | 30 | NaN |
2021-07-05 | 42734 | 748 | 30 | 100.0 |
2021-07-10 | 42738 | 747 | 15 | 50.72 |
2021-08-12 | 42739 | 748 | 30 | 73.43 |
我希望用行的平均值替换 NULL 值,其中“patient_id、practitioner_id、appointment_duration_min”相同.
我使用 pandas 数据框,
df['revenues_from_appointment'].fillna(df.groupby(['patient_id','practitioner_id','appointment_duration_min'])['revenues_from_appointment'].transform('mean'), inplace = True)
如何使用SQL得到相同的结果?
最终输出
appointment_date | patient_id | practitioner_id | appointment_duration_min | revenues_from_appointment |
---|---|---|---|---|
2021-06-28 | 42734 | 748 | 30 | 90.0 |
2021-06-29 | 42737 | 747 | 60 | 150.0 |
2021-07-01 | 42737 | 747 | 60 | 150.0 |
2021-07-03 | 42736 | 748 | 30 | 60.0 |
2021-07-03 | 42735 | 747 | 15 | 42.62 |
2021-07-04 | 42734 | 748 | 30 | 95.0 |
2021-07-05 | 42734 | 748 | 30 | 100.0 |
2021-07-10 | 42738 | 747 | 15 | 50.72 |
2021-08-12 | 42739 | 748 | 30 | 73.43 |
您可以使用 AVG
window 函数,它将对感兴趣的三列进行分区并使用 COALESCE
函数替换空值:
SELECT appointment_date,
patient_id,
practitioner_id,
appointment_duration_min,
COALESCE(revenues_from_appointment,
AVG(revenues_from_appointment) OVER(PARTITION BY patient_id,
practitioner_id,
appointment_duration_min))
FROM tab
试试看 here.