是否有更 pythonic 的方法来嵌套条件语句以填充 pandas df 中的新列?
Is there a more pythonic way to nest conditional statements for filling a new column in a pandas df?
我是 pandas 和 python 的新手,我进行了搜索但找不到我的问题所在。我正在尝试根据另一列 'NO' 的内容,在 pandas 数据框 'Sample Location' 中找到填充新列的最佳方法,以将它们放入定义的集合中。
第一个问题是:
if TestLocation == 'LH Duct':
df['Sample Location'] = df.apply(
lambda x: samplePoint(x['NO']),
axis=1
)
我不确定格式是否正确,因为我的数据框有点混乱。
第二个问题 - 是否有更 pythonic 的方法来进行此检查:
def samplePoint(n):
if n <= 15:
v = 'P1 S1'
elif n >= 20 & n <= 35:
v = 'P1 S2'
elif n >= 40 & n <= 55:
v = 'P1 S3'
elif n >= 60 & n <= 75:
v = 'P1 S4'
elif n >= 80 & n <= 95:
v = 'P1 S5'
elif n >= 100 & n <= 115:
v = 'P1 S6'
elif n >= 150 & n <= 165:
v = 'P2 S1'
elif n >= 170 & n <= 185:
v = 'P2 S2'
elif n >= 190 & n <= 205:
v = 'P2 S3'
elif n >= 210 & n <= 225:
v = 'P2 S4'
elif n >= 230 & n <= 245:
v = 'P2 S5'
elif n >= 250 & n <= 265:
v = 'P2 S6'
else:
v = 'null'
return v
我认为整个事情 could/should 可以作为 apply/lambda 完成,但我有点迷路了。如果有人能解释这个或给我一个好的 link 我将永远感激不已!
尝试 built-in pd.cut
方法,只是假设 x
是数据框和 NO
列你正在处理的例子:
pd.cut(x['NO'], bins=[15,25,35,40], right=True, labels=False)
根据您的需要调整箱和边缘/排列。
这样试试:
elif n >= 20 & n <= 35:
=>
elif 20 <= n <= 35:
可能可以计算出v_code的值,否则我会将选项放在dicts列表中,然后编写函数samplePoint
如下:
samples = [
{'range': (0, 15),
'v_code': 'P1 S1'},
{'range': (20, 35),
'v_code': 'P1 S3'},
{'range': (60, 75),
'v_code': 'P1 S4'},
{'range': (80, 95),
'v_code': 'P1 S5'},
{'range': (100, 115),
'v_code': 'P1 S6'},
{'range': (150, 165),
'v_code': 'P2 S1'},
{'range': (170, 185),
'v_code': 'P2 S2'},
{'range': (190, 205),
'v_code': 'P2 S3'},
{'range': (210, 225),
'v_code': 'P2 S4'},
{'range': (230, 245),
'v_code': 'P2 S5'},
{'range': (250, 265),
'v_code': 'P2 S6'},
]
def samplepoint(n):
for sample in samples:
if sample['range'][0] <= n <= sample['range'][1]:
return sample['v_code']
return 'null'
if __name__ == '__main__':
print(samplepoint(10))
还根据 Python 命名约定将 samplePoint
重命名为 samplepoint
。为了使模块不那么混乱,您可以从保存所有常量和设置的配置文件中导入列表 samples
。于是
from my_config import samples
def samplepoint(n):
for sample in samples:
if sample['range'][0] <= n <= sample['range'][1]:
return sample['v_code']
return 'null'
if __name__ == '__main__':
print(samplepoint(100))
文件 my_config.py
所在的位置
samples = [
{'range': (0, 15),
'v_code': 'P1 S1'},
{'range': (20, 35),
'v_code': 'P1 S3'},
{'range': (60, 75),
'v_code': 'P1 S4'},
{'range': (80, 95),
'v_code': 'P1 S5'},
{'range': (100, 115),
'v_code': 'P1 S6'},
{'range': (150, 165),
'v_code': 'P2 S1'},
{'range': (170, 185),
'v_code': 'P2 S2'},
{'range': (190, 205),
'v_code': 'P2 S3'},
{'range': (210, 225),
'v_code': 'P2 S4'},
{'range': (230, 245),
'v_code': 'P2 S5'},
{'range': (250, 265),
'v_code': 'P2 S6'},
]
我是 pandas 和 python 的新手,我进行了搜索但找不到我的问题所在。我正在尝试根据另一列 'NO' 的内容,在 pandas 数据框 'Sample Location' 中找到填充新列的最佳方法,以将它们放入定义的集合中。
第一个问题是:
if TestLocation == 'LH Duct':
df['Sample Location'] = df.apply(
lambda x: samplePoint(x['NO']),
axis=1
)
我不确定格式是否正确,因为我的数据框有点混乱。
第二个问题 - 是否有更 pythonic 的方法来进行此检查:
def samplePoint(n):
if n <= 15:
v = 'P1 S1'
elif n >= 20 & n <= 35:
v = 'P1 S2'
elif n >= 40 & n <= 55:
v = 'P1 S3'
elif n >= 60 & n <= 75:
v = 'P1 S4'
elif n >= 80 & n <= 95:
v = 'P1 S5'
elif n >= 100 & n <= 115:
v = 'P1 S6'
elif n >= 150 & n <= 165:
v = 'P2 S1'
elif n >= 170 & n <= 185:
v = 'P2 S2'
elif n >= 190 & n <= 205:
v = 'P2 S3'
elif n >= 210 & n <= 225:
v = 'P2 S4'
elif n >= 230 & n <= 245:
v = 'P2 S5'
elif n >= 250 & n <= 265:
v = 'P2 S6'
else:
v = 'null'
return v
我认为整个事情 could/should 可以作为 apply/lambda 完成,但我有点迷路了。如果有人能解释这个或给我一个好的 link 我将永远感激不已!
尝试 built-in pd.cut
方法,只是假设 x
是数据框和 NO
列你正在处理的例子:
pd.cut(x['NO'], bins=[15,25,35,40], right=True, labels=False)
根据您的需要调整箱和边缘/排列。
这样试试:
elif n >= 20 & n <= 35:
=>
elif 20 <= n <= 35:
可能可以计算出v_code的值,否则我会将选项放在dicts列表中,然后编写函数samplePoint
如下:
samples = [
{'range': (0, 15),
'v_code': 'P1 S1'},
{'range': (20, 35),
'v_code': 'P1 S3'},
{'range': (60, 75),
'v_code': 'P1 S4'},
{'range': (80, 95),
'v_code': 'P1 S5'},
{'range': (100, 115),
'v_code': 'P1 S6'},
{'range': (150, 165),
'v_code': 'P2 S1'},
{'range': (170, 185),
'v_code': 'P2 S2'},
{'range': (190, 205),
'v_code': 'P2 S3'},
{'range': (210, 225),
'v_code': 'P2 S4'},
{'range': (230, 245),
'v_code': 'P2 S5'},
{'range': (250, 265),
'v_code': 'P2 S6'},
]
def samplepoint(n):
for sample in samples:
if sample['range'][0] <= n <= sample['range'][1]:
return sample['v_code']
return 'null'
if __name__ == '__main__':
print(samplepoint(10))
还根据 Python 命名约定将 samplePoint
重命名为 samplepoint
。为了使模块不那么混乱,您可以从保存所有常量和设置的配置文件中导入列表 samples
。于是
from my_config import samples
def samplepoint(n):
for sample in samples:
if sample['range'][0] <= n <= sample['range'][1]:
return sample['v_code']
return 'null'
if __name__ == '__main__':
print(samplepoint(100))
文件 my_config.py
所在的位置
samples = [
{'range': (0, 15),
'v_code': 'P1 S1'},
{'range': (20, 35),
'v_code': 'P1 S3'},
{'range': (60, 75),
'v_code': 'P1 S4'},
{'range': (80, 95),
'v_code': 'P1 S5'},
{'range': (100, 115),
'v_code': 'P1 S6'},
{'range': (150, 165),
'v_code': 'P2 S1'},
{'range': (170, 185),
'v_code': 'P2 S2'},
{'range': (190, 205),
'v_code': 'P2 S3'},
{'range': (210, 225),
'v_code': 'P2 S4'},
{'range': (230, 245),
'v_code': 'P2 S5'},
{'range': (250, 265),
'v_code': 'P2 S6'},
]