pandas 基于年份索引的层次索引切片

pandas hierarchical index slicing based on year index

我有一个数据集,data1。我正在尝试使用基于 input

的索引切片

其中 data1 =

                  stats
gender  year    
        
women   2003    cellphone use
        2007    height
        2007    cigarette use
        2008    weight
        2009    cellphone use
        2015    cigarette use
        2018    weight
        2020    height

这是我对索引切片的尝试:

 isvalid_yr = False
 while not isvalid_yr:
     year_input = int(input("Input the year you want to compare data from: "))
     if year_input in data1.index.get_level_values('year') 
         idx = pd.IndexSlice
         isvalid_yr = True
         new_data1 = data1.loc(axis = 0)[idx[year_input:year_input], idx[:]]
     else:
          isvalid_yr = False
     try:
         if isvalid_yr ==True:
             pass
         else:
             raise ValueError("Year not in data!")
         except ValueError as err:
             print("Year not in data!")

它给了我这个我不想要的输出。

Empty DataFrame
Columns: [stats]
Index: []

我想要实现的最终期望输出如下所示

Input the year you want to compare data from: 2007

new_data1 =

的结果
                  stats
gender  year    

women   
        2007    height
        2007    cigarette use

使用xs获取DataFrame的横截面:

res = df.xs(2007, axis=0, level='year', drop_level=False)

res:

                     stats
gender year               
women  2007         height
       2007  cigarette use

有用户输入:

while True:
    try:
        year_input = int(
            input("Input the year you want to compare data from: ")
        )
        res = df.xs(year_input, axis=0, level='year', drop_level=False)
        break
    except KeyError:
        print("Year not in data!")
    except ValueError:
        print("Please enter a valid year")

df 使用:

df = pd.DataFrame({
    'gender': ['women', 'women', 'women', 'women', 'women', 'women', 'women',
               'women'],
    'year': [2003, 2007, 2007, 2008, 2009, 2015, 2018, 2020],
    'stats': ['cellphone use', 'height', 'cigarette use', 'weight',
              'cellphone use', 'cigarette use', 'weight', 'height']
}).set_index(['gender', 'year'])

df:

                     stats
gender year               
women  2003  cellphone use
       2007         height
       2007  cigarette use
       2008         weight
       2009  cellphone use
       2015  cigarette use
       2018         weight
       2020         height