DRY Python For循环怎么写

How to write DRY Python For Loop

我有一个大麻数据集,其中有一个 "Effects" 列,我正在尝试为不包含某些影响的菌株添加一个二进制 "nice_buds" 列。这是代码:

nice_buds = []
undesired_effects = ["Sleepy", "Hungry", "Giggly", "Tingly", "Aroused", "Talkative"]

for row in sample["Effects"]:
    if "Sleepy" not in row and "Hungry" not in row and "Giggly" not in row and "Tingly" not in row and "Aroused" not in row and "Talkative" not in row:
        nice_buds.append(1)
    else:
        nice_buds.append(0)

sample["nice_buds"] = nice_buds

截至目前,undesired_effects 列表什么也没做,代码在提供我想要的输出方面工作得很好。

我的问题是,是否有更多 "Pythonic" 或 "DRY" 方法来解决这个问题...

您可以使用 all() 和生成器表达式来简化 if 语句

nice_buds = []
undesired_effects = ["Sleepy", "Hungry", "Giggly", "Tingly", "Aroused", "Talkative"]

for row in sample["Effects"]:
    if all(effect not in row for effect in undesired_effects):
        nice_buds.append(1)
    else:
        nice_buds.append(0)

sample["nice_buds"] = nice_buds

或使用 any() & 检查是否存在效果:

nice_buds = []
undesired_effects = ["Sleepy", "Hungry", "Giggly", "Tingly", "Aroused", "Talkative"]

for row in sample["Effects"]:
    if any(effect in row for effect in undesired_effects):
        nice_buds.append(0)
    else:
        nice_buds.append(1)

sample["nice_buds"] = nice_buds

给定一个数据框sample

  • 使用np.where
  • 使用pandas.str.contains
  • 字符串有可能是大写或小写,所以最好强制一个大小写,因为 Giggly != giggly
  • for row in sample["Effects"] 告诉我你正在使用数据框。你不应该使用 for-loopiterate through a dataframe.
import pandas as pd
import numpy as np

# create dataframe
data = {'Effects': ['I feel great', 'I feel sleepy', 'I fell hungry', 'I feel giggly', 'I feel tingly', 'I feel aroused', 'I feel talkative']}

sample = pd.DataFrame(data)

|    | Effects          |
|---:|:-----------------|
|  0 | I feel great     |
|  1 | I feel sleepy    |
|  2 | I fell hungry    |
|  3 | I feel giggly    |
|  4 | I feel tingly    |
|  5 | I feel aroused   |
|  6 | I feel talkative |

undesired_effects = ["Sleepy", "Hungry", "Giggly", "Tingly", "Aroused", "Talkative"]

# words should be 1 case for matching, lower in this instance
undesired_effects = [effect.lower() for effect in undesired_effects]

# values to match as string with | (or)
match_vals = '|'.join(undesired_effects)

# create the nice buds column
sample['nice buds'] = np.where(sample['Effects'].str.lower().str.contains(match_vals), 0, 1)

display(sample)

|    | Effects          |   nice buds |
|---:|:-----------------|------------:|
|  0 | I feel great     |           1 |
|  1 | I feel sleepy    |           0 |
|  2 | I fell hungry    |           0 |
|  3 | I feel giggly    |           0 |
|  4 | I feel tingly    |           0 |
|  5 | I feel aroused   |           0 |
|  6 | I feel talkative |           0 |