如何定义一个函数来检查年龄列和 return 箱的任何数据框?
How to define a function that will check any data frame for Age column and return bins?
我正在尝试定义一个函数,该函数将采用具有 'Age' 列的任何数据框、对年龄进行分类,以及 return 每个年龄类别中有多少 X。
考虑以下几点:
def age_range():
x = input("Enter Dataframe Name: ")
df = x
df['Age']
bins=[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
labels=['0-9', '10-19', '20s', '30s', '40s', '50s', '60s', '70s', '80s', '90s','100s']
pd.df['AgeGroup'] = pd.cut(df['Age'], bins=bins, labels=labels, right=False)
return print("Age Ranges:", result)
我一直收到类型错误:字符串索引必须是整数。
我认为通过调用 df['Age'],它将 return 一个单列系列,从中合并和标记将有效地工作。但这对我不起作用。
问题出在这里
x = input("Enter Dataframe Name: ") # type of x is a string
df = x # now type of df is also a string
df['Age'] # python uses [] as a slicing operation for string, hence generate error
这将解决您的问题
def age_range(df):
bins=[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
labels=['0-9', '10-19', '20s', '30s', '40s', '50s', '60s', '70s', '80s', '90s']
result = pd.cut(df['Age'], bins=bins, labels=labels, right=False)
return result
例如,您可以运行它像:
df = pd.DataFrame({'Age' : [random.randint(1, 99) for i in range(500)]})
df["AgeRange"] = age_range(df)
或
df = pd.DataFrame({'Age' : [random.randint(1, 99) for i in range(500)]})
AgeRangeDf = pd.DataFrame({"Age_Range" :age_range(df)})
假设您想要 dataFrame 上的总 bin 计数:
from numpy import random
import pandas as pd
df1 = pd.DataFrame({'Age' : [random.randint(1, 99) for i in range(100)]})
def age_range(df):
import pandas as pd
df['Age']
bins=[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
labels=['0-9', '10-19', '20s', '30s', '40s', '50s', '60s', '70s', '80s', '90s']
df['AgeGroup'] = pd.cut(df['Age'], bins=bins, labels=labels, right=False)
result = pd.DataFrame(df['AgeGroup'].groupby(df['AgeGroup']).count())
return result
print(age_range(df1))
这个returns单列DataFrame
我正在尝试定义一个函数,该函数将采用具有 'Age' 列的任何数据框、对年龄进行分类,以及 return 每个年龄类别中有多少 X。
考虑以下几点:
def age_range():
x = input("Enter Dataframe Name: ")
df = x
df['Age']
bins=[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
labels=['0-9', '10-19', '20s', '30s', '40s', '50s', '60s', '70s', '80s', '90s','100s']
pd.df['AgeGroup'] = pd.cut(df['Age'], bins=bins, labels=labels, right=False)
return print("Age Ranges:", result)
我一直收到类型错误:字符串索引必须是整数。
我认为通过调用 df['Age'],它将 return 一个单列系列,从中合并和标记将有效地工作。但这对我不起作用。
问题出在这里
x = input("Enter Dataframe Name: ") # type of x is a string
df = x # now type of df is also a string
df['Age'] # python uses [] as a slicing operation for string, hence generate error
这将解决您的问题
def age_range(df):
bins=[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
labels=['0-9', '10-19', '20s', '30s', '40s', '50s', '60s', '70s', '80s', '90s']
result = pd.cut(df['Age'], bins=bins, labels=labels, right=False)
return result
例如,您可以运行它像:
df = pd.DataFrame({'Age' : [random.randint(1, 99) for i in range(500)]})
df["AgeRange"] = age_range(df)
或
df = pd.DataFrame({'Age' : [random.randint(1, 99) for i in range(500)]})
AgeRangeDf = pd.DataFrame({"Age_Range" :age_range(df)})
假设您想要 dataFrame 上的总 bin 计数:
from numpy import random
import pandas as pd
df1 = pd.DataFrame({'Age' : [random.randint(1, 99) for i in range(100)]})
def age_range(df):
import pandas as pd
df['Age']
bins=[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
labels=['0-9', '10-19', '20s', '30s', '40s', '50s', '60s', '70s', '80s', '90s']
df['AgeGroup'] = pd.cut(df['Age'], bins=bins, labels=labels, right=False)
result = pd.DataFrame(df['AgeGroup'].groupby(df['AgeGroup']).count())
return result
print(age_range(df1))
这个returns单列DataFrame