运行 spicy.stats 数据框列中所有唯一值的 ANOVA 测试？

Question

我有一个包含许多城市及其相应温度的数据框：

               CurrentThermostatTemp
City                                
Cradley Heath                   20.0
Cradley Heath                   20.0
Cradley Heath                   18.0
Cradley Heath                   15.0
Cradley Heath                   19.0
...                              ...
Walsall                         16.0
Walsall                         22.0
Walsall                         20.0
Walsall                         20.0
Walsall                         20.0

[6249 rows x 1 columns]

唯一值是：

Index(['Cradley Heath', 'ROWLEY REGIS', 'Smethwick', 'Oldbury',
       'West Bromwich', 'Bradford', 'Bournemouth', 'Poole', 'Wareham',
       'Wimborne',
       ...
       'St. Helens', 'Altrincham', 'Runcorn', 'Widnes', 'St Helens',
       'Wakefield', 'Castleford', 'Pontefract', 'Walsall', 'Wednesbury'],
      dtype='object', name='City', length=137)

我的目标是进行单向方差分析测试，即

from scipy.stats import f_oneway

对于数据框中的所有唯一值。也一样

SciPy.stats.f_oneway("all unique values")

并接收输出：所有变量的单向方差分析检验给出 {} 和 p 值 {} 这是我尝试了很多次但不起作用的方法：

all = Tempvs.index.unique()
Tempvs.sort_index(inplace=True)
for n in range(len(all)):
    truncated = Tempvs.truncate(all[n], all[n])
    print(f_oneway(truncated))

Answer 1

IIUC 您想要一个方差分析测试，其中每个样本都包含独特元素 City 的值 Temp。如果是这种情况，你可以这样做

import numpy as np
import pandas as pd
import scipy.stats as sps

# I create a sample dataset
index = ['Cradley Heath', 'ROWLEY REGIS',
         'Smethwick', 'Oldbury',
         'West Bromwich', 'Bradford', 
         'Bournemouth', 'Poole', 'Wareham',
         'Wimborne','St. Helens', 'Altrincham', 
         'Runcorn', 'Widnes', 'St Helens',
         'Wakefield', 'Castleford', 'Pontefract', 
         'Walsall', 'Wednesbury']
np.random.seed(1)
df = pd.DataFrame({
    'City': np.random.choice(index, 500),
    'Temp': np.random.uniform(15, 25, 500)
})

# populate a list with all
# values of unique Cities
values = []
for city in df.City.unique():
    _df = df[df.City==city]
    values.append(_df.Temp.values)

# compute the ANOVA
# with starred *list
# as arguments
sps.f_oneway(*values)

在这种情况下，会给出

F_onewayResult(statistic=0.4513685152123563, pvalue=0.9788508507035195)

PS: do not use all as a variable, because it is a builtin python function, see https://docs.python.org/3/library/functions.html#all

运行 spicy.stats 数据框列中所有唯一值的 ANOVA 测试？

Run spicy.stats ANOVA test for all unique values in a data frames column?

python

dataframe

anova

pandas

scipy.stats