python pandas 合并数值范围
python pandas binning numerical range
我有一个要求,我想在其中存放一个数值
If the student marks is
b/w 0-50 (incl 50) then assign the level column value = "L"
b/w 50-75(incl. 75) then assign the level column value ="M"
>75 then assign the level column value ="H"
这是我得到的
raw_data = {'student':['A','B','C'],'marks_maths':[75,90,99]}
df = pd.DataFrame(raw_data, columns = ['student','marks_maths'])
bins = [0,50,75,>75]
groups = ['L','M','H']
df['maths_level'] = pd.cut(df['marks_maths'], bins, labels=groups)
我收到语法错误
File "<ipython-input-25-f0b9dd609c63>", line 3
bins = [0,50,75,>75]
^
SyntaxError: invalid syntax
我如何引用显示 > 特定值的截止值?
试试这个:
bins = [0,50,75,101] or bins = [0,50,75,np.inf]
只需将上限定义为可能的最佳标记:
bins = [0, 50, 75, 100]
结果如你所料:
student marks_maths maths_level
0 A 75 M
1 B 90 H
2 C 99 H
希望对您有所帮助
import numpy as np
import pandas as pd
# 20 random numbers between 0 and 100
scores = np.random.randint(0,100,20)
df = pd.DataFrame(scores, columns=['scores'])
bins = [0,50,75, np.inf]
df['binned_scores'] = pd.cut(df.scores, bins=[0,50,75, np.inf], include_lowest=False, right=True)
df['bin_labels'] = pd.cut(df.scores, bins=[0,50,75, np.inf], include_lowest=False, right=True, labels=['L','M','H'])
include_lowest
和 right
参数可让您控制 bin 的边缘是否包含在内。
我有一个要求,我想在其中存放一个数值
If the student marks is
b/w 0-50 (incl 50) then assign the level column value = "L"
b/w 50-75(incl. 75) then assign the level column value ="M"
>75 then assign the level column value ="H"
这是我得到的
raw_data = {'student':['A','B','C'],'marks_maths':[75,90,99]}
df = pd.DataFrame(raw_data, columns = ['student','marks_maths'])
bins = [0,50,75,>75]
groups = ['L','M','H']
df['maths_level'] = pd.cut(df['marks_maths'], bins, labels=groups)
我收到语法错误
File "<ipython-input-25-f0b9dd609c63>", line 3
bins = [0,50,75,>75]
^
SyntaxError: invalid syntax
我如何引用显示 > 特定值的截止值?
试试这个:
bins = [0,50,75,101] or bins = [0,50,75,np.inf]
只需将上限定义为可能的最佳标记:
bins = [0, 50, 75, 100]
结果如你所料:
student marks_maths maths_level
0 A 75 M
1 B 90 H
2 C 99 H
希望对您有所帮助
import numpy as np
import pandas as pd
# 20 random numbers between 0 and 100
scores = np.random.randint(0,100,20)
df = pd.DataFrame(scores, columns=['scores'])
bins = [0,50,75, np.inf]
df['binned_scores'] = pd.cut(df.scores, bins=[0,50,75, np.inf], include_lowest=False, right=True)
df['bin_labels'] = pd.cut(df.scores, bins=[0,50,75, np.inf], include_lowest=False, right=True, labels=['L','M','H'])
include_lowest
和 right
参数可让您控制 bin 的边缘是否包含在内。