如何用 SQL 中类别的平均值替换 NULL 值?

How to replace NULL values with Mean value of a category in SQL?

我有一个在 'revenues_from_appointment'

列中包含空值的数据集

数据集

appointment_date patient_id practitioner_id appointment_duration_min revenues_from_appointment
2021-06-28 42734 748 30 90.0
2021-06-29 42737 747 60 150.0
2021-07-01 42737 747 60 NaN
2021-07-03 42736 748 30 60.0
2021-07-03 42735 747 15 42.62
2021-07-04 42734 748 30 NaN
2021-07-05 42734 748 30 100.0
2021-07-10 42738 747 15 50.72
2021-08-12 42739 748 30 73.43

我希望用行的平均值替换 NULL 值,其中“patient_id、practitioner_id、appointment_duration_min”相同.

我使用 pandas 数据框,

df['revenues_from_appointment'].fillna(df.groupby(['patient_id','practitioner_id','appointment_duration_min'])['revenues_from_appointment'].transform('mean'), inplace = True)

如何使用SQL得到相同的结果?

最终输出

appointment_date patient_id practitioner_id appointment_duration_min revenues_from_appointment
2021-06-28 42734 748 30 90.0
2021-06-29 42737 747 60 150.0
2021-07-01 42737 747 60 150.0
2021-07-03 42736 748 30 60.0
2021-07-03 42735 747 15 42.62
2021-07-04 42734 748 30 95.0
2021-07-05 42734 748 30 100.0
2021-07-10 42738 747 15 50.72
2021-08-12 42739 748 30 73.43

您可以使用 AVG window 函数,它将对感兴趣的三列进行分区并使用 COALESCE 函数替换空值:

SELECT appointment_date,
       patient_id,
       practitioner_id,
       appointment_duration_min,
       COALESCE(revenues_from_appointment, 
                AVG(revenues_from_appointment) OVER(PARTITION BY patient_id, 
                                                                 practitioner_id, 
                                                                 appointment_duration_min))
FROM tab

试试看 here.