通过遍历字典列表的词频 - python

Question

参考上一个问题，我希望遍历字典列表并将输出转换为新的数据框。现在，我有一个包含两列的 CSV。一列有一个词，另一列有 URL（见下文）。

| Keyword  | URL                     | 
| -------- | --------------          |
| word1    | www.example.com/topic-1 |
| word2    | www.example.com/topic-2 |
| word3    | www.example.com/topic-3 |
| word4    | www.example.com/topic-4 |

我已将此 CSV 转换为字典列表，并尝试遍历这些列表以计算该词在 URL 上出现的频率。

我的代码可以在this colab notebook.

中看到

我希望最终输出如下所示：

| Keyword | URL                        | Count |
|:----    |:------:                    | -----:|
| word1   | www.example.com/topic-1    | 1003  |
| word2   | www.example.com/topic-2    | 405   |
| word3   | www.example.com/topic-3    | 123   |
| word4   | www.example.com/topic-4    | 554   |

'Count' 列是 'word1' 在 [=34 上的频率=].

感谢任何帮助！

Answer 1

使用DataFrame.apply使用其他列的函数创建新列：

import pandas as pd
import requests

df = pd.DataFrame({'Keyword': ['code', 'apply', 'midnight'],
                   'URL': ['
                           'https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html',
                           '
                          ]})

print(df)
#     Keyword                                                URL
# 0      code  
# 1     apply  https://pandas.pydata.org/docs/reference/api/p...
# 2  midnight  



def get_count(row):
    r = requests.get(row['URL'], allow_redirects=False)
    count = r.text.lower().count(row['Keyword'].lower())
    return count

df['Count'] = df.apply(get_count, axis=1)

print(df)
#     Keyword                                                URL  Count
# 0      code       32
# 1     apply  https://pandas.pydata.org/docs/reference/api/p...     32
# 2  midnight       18

通过遍历字典列表的词频 - python

Word frequency by iterating over a list of dictionaries - python

python

dictionary

beautifulsoup

word-count

pandas