新迭代后行值被替换

Question

我有一个看起来像这样的函数，我运行它在 for 循环中：

def findInfo(url, df):
    allLinks = getAllLinks(url)
    katalogLinks = getKatalogLinks(allLinks)
    if len(katalogLinks) == 0:
        df = df.append({'Company URL' : url,
                    'Potential Client' : 0} , 
                    ignore_index=True)
        return df

    else:
        print("catalog links foud", url)
        df["Company URL"] = url
        df["Potential Client"] = 1
        pdfLinks = getPDFLinks(katalogLinks)
        print(pdfLinks)
        
        pdfDetails = checkPDFs(url, pdfLinks)
        df = df.append({'Company URL' : url,
                    'Potential Client' : 1, "Number of PDFs found":len(pdfLinks),"Info":pdfDetails} , 
                    ignore_index=True) 
        return df

df = pd.DataFrame()
df["Company URL"] = ""
df["Potential Client"] = ""
lst = ["http://www.aurednik.de/", "https://www.eltako.de/"]
for i in lst:
    df = findInfo(i, df)
    print("DF", df)

df.head()

对于第一次迭代，当我在循环中打印 df 时，我得到了正确的结果

DF                Company URL  Potential Client Info  Number of PDFs found
0  http://www.aurednik.de/                 1   {}                   0.0

但是，对于第二次迭代，我希望第一行保持原样，然后从 df 添加另一行 returned。但是，第一个 df 中的 url 被替换了，我的最终 df 是这样的：

Company URL Potential Client    Info    Number of PDFs found
0   https://www.eltako.de/  1   {}  0.0
1   https://www.eltako.de/  1   {'https://www.eltako.de/wp-content/uploads/2020/11/Eltako_Gesamtkatalog_LowRes.pdf': {'numberOfPages': 440, 'creationDate': '2017-09-20'}}  1.0

为什么要替换第一行？我怎样才能解决这个问题？这可能与我的保存方式或 return df 有关，但我无法弄清楚问题所在。

Answer 1

第 12-13 行：

        df["Company URL"] = url
        df["Potential Client"] = 1

您将整个列“Company URL”和“Potential Client”设置为当前迭代的值。删除这些行应该可以解决问题。

新迭代后行值被替换

row values get replaced after new iteration

python

data-analysis

dataframe

python-3.x

pandas