如果范围值为真，则获取名称 Python

Question

我有一个 pandas 问题，关于在 X 列中的值范围为真时获取名称。 如果年份在从 1960 年到现在的十年内，打印名称所以这是我的数据框的示例：

#,Name,description,year
1,a,foo,1961
2,a,foo2,1977
3,a,foo3,1980
4,a,foo4,1995
5,a,foo5,2001
6,a,foo6,2011
7,a,foo7,2020
8,b,bar,1965
9,b,bar2,1970
10,b,bar3,1983
11,b,bar4,1997
12,b,bar5,2005
13,b,bar6,2016
14,b,bar7,2022
15,c,abc,1965
16,c,ab2,1970
17,c,abc3,1993
18,c,abc4,2007
19,c,abc5,2015
20,c,abc6,2020

输出：a,b

到目前为止，我这样做了：

dataset[Year].str.match(str(year[0:3]))

我想为此我需要一个 for 循环，但我一点也不确定。感谢您的帮助！

Answer 1

你可以使用 dataframe.query 方法来做同样的事情 dataset.query("year">=1961, inplace=True) print(dataset) #it 用年份大于 1961 的那些值替换 dataframe

Answer 2

解决问题的一种方法是使用 Pandas groupby method 创建组然后使用 Pandas filter method.

过滤组

import pandas as pd


def is_within_range(group):
    years = sorted(list(group["Year"]))
    check_decade = {}
    for year in years:
        decade = year // 10
        if 196 <= decade <= 202:
            check_decade[decade] = True
    if len(check_decade.keys()) == (202 - 196 + 1):
        return True
    return False


data = pd.read_csv("years.csv")
filtered_data = data.groupby(['Name']).filter(lambda x: is_within_range(x))
print(list(filtered_data.Name.unique()))

输出：

['a', 'b']

years.csv:

#,Name,Description,Year
1,a,foo,1961
2,a,foo2,1977
3,a,foo3,1980
4,a,foo4,1995
5,a,foo5,2001
6,a,foo6,2011
7,a,foo7,2020
8,b,bar,1965
9,b,bar2,1970
10,b,bar3,1983
11,b,bar4,1997
12,b,bar5,2005
13,b,bar6,2016
14,b,bar7,2022
15,c,abc,1965
16,c,ab2,1970
17,c,abc3,1993
18,c,abc4,2007
19,c,abc5,2015
20,c,abc6,2020

解释：

is_with_range 方法检查一组是否有从 1960 年到 2020 年的每个十年的年份。一年的十年是 year // 10。例如。 1965 年和 1969 年的十进制值为 196，而 1996、1998 的十进制值为 199。
我用字典将每个十年标记为 True，然后计算组中的十年数。

参考文献：

如果范围值为真，则获取名称 Python

Get name if range values are true Python

python

time

pandas