在我的案例中,迭代 Pandas 行的最佳方法是什么?
What is the best way to iterate over Pandas rows in my case?
我遇到了 Pandas 行迭代的复杂性问题。我有一个超过 30k 行的数据集,我需要为每一行添加一个新列,其中包含来自特定列的值。
belongs_node_df = pd.DataFrame.from_records(belongs_node, columns=['hashtag', 'tweets_id', 'tokenized_text','sentiment_compound'])
posted_node_df = pd.DataFrame.from_records(posted_node, columns=['username', 'num_followers', 'tweets_id'])
df_user_hashtag = pd.merge(posted_node_df, belongs_node_df, on='tweets_id', how='outer').sort_values('username')
df_user_hashtag['p'] = None
for i in range(len(df_user_hashtag)):
df_user_hashtag['p'][i] = 3 * df_user_hashtag['num_followers'][i]\df_user_hashtag['sentiment_compound'][i]
有没有一种有效的方法可以对每一行进行此操作?非常感谢。 :)
您可以使用方法 .tolist() 并遍历列表的每一行
您不应该遍历行...这几乎破坏了您使用 pandas.
获得的所有好处
df_user_hashtag['p'] = 3 * df_user_hashtag['num_followers'] / df_user_hashtag['sentiment_compound']
我使用评论中@Riley 的建议解决了这个问题。
首先,我创建了一个函数来获取我想要的值:
def get_p(tfidf, num_followers, compund):
return (tfidf * num_followers) * compund
其次,我使用 Numpy 的 vectorize
函数来使用向量化调用我的函数:
vfunc = numpy.vectorize(get_p)
df_user_hashtag['p'] = vfunc(1, df_user_hashtag['num_followers'], df_user_hashtag['sentiment_compound'])
就这些了!
import pandas as pd
df = pd.read_excel('/content/Endereços CS 2021.xlsx')
data = df.values.tolist()
for d in data:
print(d)
在这个例子中,我读取了我的 excel 文件,然后使用方法将其转换为列表。values.tolist() 然后我做了我之前说的,我遍历了每一行我的清单。
输出是:
['Ana Lara', 'Alfa', 'Amigo']
['Ana Suely', 'Alfa', 'Amigo']
['Izabelly', 'Alfa', 'Amigo']
['Carol Loiola', 'Alfa', 'Amigo']
['Yasmin', 'Alfa', 'Amigo']
['Mariana', 'Alfa', 'Amigo']
['Tereza', 'Alfa', 'Amigo']
['Rívia', 'Alfa', 'Amigo']
['Stefany', 'Alfa', 'Amigo']
['Maria Eduarda', 'Alfa', 'Amigo']
['Meyssa', 'Alfa', 'Amigo']
['Arthur Figueiró', 'Epsilon', 'Amigo']
['Andriw', 'Epsilon', 'Amigo']
['Gabriel', 'Epsilon', 'Amigo']
['Tiago ', 'Epsilon', 'Amigo']
['João Pedro', 'Epsilon', 'Amigo']
['Carlos', 'Epsilon', 'Amigo']
['José Neto', 'Epsilon', 'Amigo']
['Raissa ', 'Beta', 'Pesquisador']
['Lara Yasmin', 'Beta', 'Pesquisador']
['Letícia', 'Beta', 'Pesquisador']
['Thalita Melo', 'Beta', 'Pesquisador']
['Isabel', 'Beta', 'Pesquisador']
['Melyssa', 'Beta', 'Excursionista']
['Sarah Gabrielle', 'Beta', 'Excursionista']
['Daniel Fernandes', 'Delta', 'Pioneiro']
['Arthur Soares', 'Delta', 'Pioneiro']
['Guido', 'Delta', 'Pioneiro']
['Emanoel', 'Delta', 'Pioneiro']
['Flávio', 'Delta', 'Pioneiro']
['Iohannes', 'Delta', 'Pioneiro']
['Lucas', 'Delta', 'Pesquisador']
['Beatriz Gillianne (Bia)', 'Sigma', 'Guia']
['Emilly Vitória', 'Sigma', 'Excursionista']
['Adriana', 'Sigma', 'Guia']
['Jade', 'Sigma', 'Guia']
['Sarah Leocádio', 'Sigma', 'Guia']
['Maria Eduarda', 'Sigma', 'Guia']
我遇到了 Pandas 行迭代的复杂性问题。我有一个超过 30k 行的数据集,我需要为每一行添加一个新列,其中包含来自特定列的值。
belongs_node_df = pd.DataFrame.from_records(belongs_node, columns=['hashtag', 'tweets_id', 'tokenized_text','sentiment_compound'])
posted_node_df = pd.DataFrame.from_records(posted_node, columns=['username', 'num_followers', 'tweets_id'])
df_user_hashtag = pd.merge(posted_node_df, belongs_node_df, on='tweets_id', how='outer').sort_values('username')
df_user_hashtag['p'] = None
for i in range(len(df_user_hashtag)):
df_user_hashtag['p'][i] = 3 * df_user_hashtag['num_followers'][i]\df_user_hashtag['sentiment_compound'][i]
有没有一种有效的方法可以对每一行进行此操作?非常感谢。 :)
您可以使用方法 .tolist() 并遍历列表的每一行
您不应该遍历行...这几乎破坏了您使用 pandas.
获得的所有好处df_user_hashtag['p'] = 3 * df_user_hashtag['num_followers'] / df_user_hashtag['sentiment_compound']
我使用评论中@Riley 的建议解决了这个问题。
首先,我创建了一个函数来获取我想要的值:
def get_p(tfidf, num_followers, compund):
return (tfidf * num_followers) * compund
其次,我使用 Numpy 的 vectorize
函数来使用向量化调用我的函数:
vfunc = numpy.vectorize(get_p)
df_user_hashtag['p'] = vfunc(1, df_user_hashtag['num_followers'], df_user_hashtag['sentiment_compound'])
就这些了!
import pandas as pd
df = pd.read_excel('/content/Endereços CS 2021.xlsx')
data = df.values.tolist()
for d in data:
print(d)
在这个例子中,我读取了我的 excel 文件,然后使用方法将其转换为列表。values.tolist() 然后我做了我之前说的,我遍历了每一行我的清单。
输出是:
['Ana Lara', 'Alfa', 'Amigo']
['Ana Suely', 'Alfa', 'Amigo']
['Izabelly', 'Alfa', 'Amigo']
['Carol Loiola', 'Alfa', 'Amigo']
['Yasmin', 'Alfa', 'Amigo']
['Mariana', 'Alfa', 'Amigo']
['Tereza', 'Alfa', 'Amigo']
['Rívia', 'Alfa', 'Amigo']
['Stefany', 'Alfa', 'Amigo']
['Maria Eduarda', 'Alfa', 'Amigo']
['Meyssa', 'Alfa', 'Amigo']
['Arthur Figueiró', 'Epsilon', 'Amigo']
['Andriw', 'Epsilon', 'Amigo']
['Gabriel', 'Epsilon', 'Amigo']
['Tiago ', 'Epsilon', 'Amigo']
['João Pedro', 'Epsilon', 'Amigo']
['Carlos', 'Epsilon', 'Amigo']
['José Neto', 'Epsilon', 'Amigo']
['Raissa ', 'Beta', 'Pesquisador']
['Lara Yasmin', 'Beta', 'Pesquisador']
['Letícia', 'Beta', 'Pesquisador']
['Thalita Melo', 'Beta', 'Pesquisador']
['Isabel', 'Beta', 'Pesquisador']
['Melyssa', 'Beta', 'Excursionista']
['Sarah Gabrielle', 'Beta', 'Excursionista']
['Daniel Fernandes', 'Delta', 'Pioneiro']
['Arthur Soares', 'Delta', 'Pioneiro']
['Guido', 'Delta', 'Pioneiro']
['Emanoel', 'Delta', 'Pioneiro']
['Flávio', 'Delta', 'Pioneiro']
['Iohannes', 'Delta', 'Pioneiro']
['Lucas', 'Delta', 'Pesquisador']
['Beatriz Gillianne (Bia)', 'Sigma', 'Guia']
['Emilly Vitória', 'Sigma', 'Excursionista']
['Adriana', 'Sigma', 'Guia']
['Jade', 'Sigma', 'Guia']
['Sarah Leocádio', 'Sigma', 'Guia']
['Maria Eduarda', 'Sigma', 'Guia']