将组织功能添加到我的 python 脚本中

Question

我有一个非常长的 python 脚本，我需要将其整合到函数中以组织代码。

#function here?
import nltk
import collections
counts = collections.Counter()
for sent in df["messages"]:
    words = nltk.word_tokenize(sent)
    counts.update(nltk.bigrams(words))
counts = {k: v for k, v in counts.items() if v > 150}

print('\n','bigram counter finished!')

#function here?
df2 = pd.DataFrame.from_dict(counts,orient='index').reset_index()
df2 = df2.sort_values(by=0,ascending=False)
#creating a list of the bigrams after being sorted
my_bigrams = list(df2['index'])
my_bigrams = [i for i in my_bigrams if i[1] != i[0]]
#taking the top 500 bigrams
#my_bigrams = my_bigrams[0:499]
print('\n','duplicate bigrams removed!')

#function here?
pat = '|'.join(" ".join(x) for x in my_bigrams)
df['bigram'] = df['message'].str.findall(pat)
df = df.applymap(str)
df = df.drop(['message'], axis=1)
df["bigram"] = df.bigram.str[1:-1].str.split(",\s").map(set)



#function here?
df = df.applymap(str)
df['bigram'] = df['bigram'].str.replace('[^\w\s,]','')
df["bigram"] = df.bigram.str.split(",\s").map(list)
df = df.applymap(str)
df['bigram'] = df['bigram'].str.replace('[^\w\s,]','')



#function here?
df = df.sort_values(by='date')
def update_col(col):
    col[:] = col.iloc[0]
    return col
df['date'] = df.groupby('room').date.apply(update_col)

我无法将代码放入函数中。我不明白如何将这段代码组织成函数以使其更整洁。有什么建议么？仅供参考，这只是一些随机代码，所以我不想制作这个运行，只是关于要传递哪些参数、多少参数以及如何使它更整洁的想法。每个我都有'#function here?'在上面的代码中，如果有意义的话，我希望创建一个函数。

Answer 1

已编辑 header :: 以下情况建议将代码分组到函数中：

一个。冗余：查看是否在不同的数据集上执行相同的任务。如果是，将其放入函数中并进行调用。

乙。控制流：如果代码有一组原子任务，从逻辑上讲，必须按顺序执行，为每个这样的任务创建一个函数，而不考虑冗余。

C。一致性：调试通常需要跟踪代码到其根本问题。如果你觉得一组线条描述了某种动作，把它放在一个函数中，它会很好地把问题定位到准确的位置。

D.数据转换：如果一段数据正在转换为另一种格式，例如，树到双向链表，您应该为此创建一个函数。

将组织功能添加到我的 python 脚本中

Adding functions for organization to my python script

python

organization

pandas