为冗长的操作覆盖变量名称是不好的风格吗？

Question

我经常发现自己处于这样一种情况，即我需要执行几个步骤才能从我的初始数据输入到我想要的输出，例如在 functions/loops。为了避免让我的行太长，我有时会覆盖我在这些操作中使用的变量名。

一个例子是：

df_2 = df_1.loc[(df1['id'] == val)]
df_2 = df_2[['c1','c2']]
df_2 = df_2.merge(df3, left_on='c1', right_on='c1'))

我能想到的唯一选择是：

df_2 = df_1.loc[(df1['id'] == val)][['c1','c2']]\
    .merge(df3, left_on='c1', right_on='c1'))

但是 none 这些选项感觉非常干净。这些情况应该如何处理？

Answer 1

作为另一种选择，您可以将所有内容放在括号中，然后换行，如下所示：

df_2 = (df_1
            .loc[(df1['id'] == val)][['c1','c2']]
            .merge(df3, left_on='c1', right_on='c1')))

即使你有很多行，它通常也很可读，如果你想改变输出变量的名称，你只需要在一个地方改变它。因此，与覆盖变量相比，更不冗长，更容易进行更改

Answer 2

你可以参考 this article 里面正好讨论了你的问题。

The pandas core team now encourages the use of "method chaining". This is a style of programming in which you chain together multiple method calls into a single statement. This allows you to pass intermediate results from one method to the next rather than storing the intermediate results using variables.

除了像@perl 的回答那样使用方括号和缩进来美化链式代码，您可能还会发现使用 .query() and .assign() 这样的函数对于以“方法链”风格进行编码非常有用。

当然，方法链接也有一些缺点，尤其是当过度时：

"One drawback to excessively long chains is that debugging can be harder. If something looks wrong at the end, you don't have intermediate values to inspect."

为冗长的操作覆盖变量名称是不好的风格吗？

Is overwriting variables names for lengthy operations bad style?

python

pep8

pandas