使用 for 循环中的数据框和 xlsxwriter 将整个 Beautifulsoup 数组保存到 excel

Question

在查阅了大量文档并在 Whosebug 上寻找答案后，我找不到解决问题的方法。

基本上我使用 beautifulsoup 从网站抓取数据列表，然后将其存储到 excel。抓取工作正常。

当我运行我的脚本时，它会将所有项目打印到终端。但是，当我尝试将此结果保存到数据框中并将其保存到 Excel 时，它只会执行最后一行并将该行保存到 excel。

我试过将代码存储在循环中，但结果相同。我试过将列表转换回 for 循环内的数组，但同样的问题。最后一行只保存到 Excel

我想我在这里缺少一种合乎逻辑的方法。如果有人可以 link 我要寻找什么，我将不胜感激。

        soup = BeautifulSoup(html, features="lxml")
        soup.find_all("div", {"id":"tbl-lock"})

        for listing in soup.find_all('tr'):

            listing.attrs = {}

            assetTime = listing.find_all("td", {"class": "locked"})
            assetCell = listing.find_all("td", {"class": "assetCell"})
            assetValue = listing.find_all("td", {"class": "assetValue"})

            for data in assetCell:

                array = [data.get_text()]

                ### Excel Heading + data
                df = pd.DataFrame({'Cell': array
                                    })
               print(array)
                # In here it will print all of the data


        ### Now we need to save the data to excel
        ### Create a Pandas Excel writer using XlsxWriter as the Engine
        writer = pd.ExcelWriter(filename+'.xlsx', engine='xlsxwriter')

        ### Convert the dataframe to an XlsxWriter Excel object and skip first row for custom header
        df.to_excel(writer, sheet_name='SheetName', startrow=1, header=False)

        ### Get the xlsxwritert workbook and worksheet objects

        workbook = writer.book
        worksheet = writer.sheets['SheetName']

        ### Custom header for Excel
        header_format = workbook.add_format({
            'bold': True,
            'text_wrap': True,
            'valign': 'top',
            'fg_color': '#D7E4BC',
            'border': 1
        })

        ### Write the column headers with the defined add_format
        print(df) ### In here it will print only 1 line
        for col_num, value in enumerate(df):

            worksheet.write(0, col_num +1, value, header_format)

            ### Close Pandas Excel writer and output the Excel file
            writer.save()

Answer 1

这一行就是问题df = pd.DataFrame({'Cell': array}) 此处您要覆盖 df，因此只存储最后一行。

而是将 df 初始化为 df = pd.DataFrame(columns=['cell']) 并在循环中执行此操作

df = df.append(pd.DataFrame({'Cell': array}),ignore_index=True)

编辑：

试试这个

soup = BeautifulSoup(html, features="lxml")
soup.find_all("div", {"id":"tbl-lock"})

df = pd.DataFrame(columns=['cell'])
for listing in soup.find_all('tr'):

        listing.attrs = {}

        assetTime = listing.find_all("td", {"class": "locked"})
        assetCell = listing.find_all("td", {"class": "assetCell"})
        assetValue = listing.find_all("td", {"class": "assetValue"})

        for data in assetCell:

            array = [data.get_text()]

            ### Excel Heading + data
            df = df.append(pd.DataFrame({'Cell': array}),ignore_index=True)
            ##Or this
            #df = df.append(pd.DataFrame({'Cell': array}))   

            print(array)
            # In here it will print all of the data

。 . . . 其余代码

使用 for 循环中的数据框和 xlsxwriter 将整个 Beautifulsoup 数组保存到 excel

Saving whole Beautifulsoup array into excel using dataframe and xlsxwriter inside for loop

python

beautifulsoup

dataframe

pandas

xlsxwriter