只有在浏览器中打开主页 (url) 的另一个 URL 时,才能从浏览器下载来自 URL (url 1) 的 csv 文件。如何在 python 中实施

Fom browser a csv file from URL (url 1) can be downloaded only if another URL of main page (url) is open in browser. How to implement in python

如果 url https://www.nseindia.com/companies-listing/corporate-filings-announcements is open in a tab of browser, I can download the CSV file using another url https://www.nseindia.com/api/corporate-announcements?index=equities&from_date=14-01-2022&to_date=20-01-2022&csv=true 来自同一浏览器中的另一个选项卡。 Else Not 并显示“找不到资源”。如何使用 pandas.

在 python 中实现它

此页面使用 Cookie 来检查文件是否从第一页打开。

您将必须使用 requestsSession 来获取第一页和 cookie,接下来使用 requestsSession(使用先前请求的 cookie)来获取文件 csv,最后你必须使用 io 将数据发送到 pandas,它在内存中模拟文件。

顺便说一句:它似乎用 BOM (Byte Order Mark) 发送文件所以我从 r.content 读取字节数据而不是从 r.textpandas 将跳过 BOM

import requests
import pandas as pd
import io

# --- create Session with User-Agent from real browser ---

headers = {
    'User-Agent': 'Mozilla/5.0'
}

s = requests.Session()
s.headers.update(headers)

# --- get first page to get cookies --- 

url = 'https://www.nseindia.com/companies-listing/corporate-filings-announcements'
r = s.get(url)

# --- get file ---

url = 'https://www.nseindia.com/api/corporate-announcements?index=equities&from_date=14-01-2022&to_date=20-01-2022&csv=true'
r = s.get(url)

print(r.text[:100])  # code `` at the beginning means BOM
                     # so I will use `r.content` instead of `r.text`

# --- read file from memory ---

#df = pd.read_csv(io.StringIO(r.text))   # it doesn't remove BOM
df = pd.read_csv(io.BytesIO(r.content))  # it removes BOM

# --- show it ---

print(df.head())

结果:

"SYMBOL","COMPANY NAME","SUBJECT","DETAILS","BROADCAST DATE/TIME","RECEIPT","DISSEMINATION","DIFF


      SYMBOL  ... DIFFERENCE
0  TATAELXSI  ...   00:00:08
1       RIIL  ...   00:00:10
2       ERIS  ...   00:00:06
3       RIIL  ...   00:00:09
4  INGERRAND  ...   00:00:09

[5 rows x 8 columns]