数据框 tolist 添加 [] 或数据框在通过 for 循环时读取 header。如何让列表在 for 循环中工作?

Dataframe tolist adds [] or the dataframe reads header while going through a for loop. How can I get the list to work in the for loop?

我正在使用来自 Alteryx 数据流的导入,它是包含以下整数格式的单个列:

从 alteryx 读取数据会自动将其转换为数据帧。

searches = Alteryx.read("dataimport")

SUCCESS: reading input data "dataimport"
      RxNorm_Id
0            99
1           161
2           167
3           168
4           197
...         ...
6711    2562541
6712    2565823
6713    2566308
6714    2566416
6715    2571104

我有一个 for 循环,它遍历 URL 并用搜索替换一个片段。

for search in searches:
    print(f"Scraping {search}")
    url = f"https://rxnav.nlm.nih.gov/REST/rxcui/{search}/historystatus.json?caller=RxNav"
    print(url)

当我尝试通过循环 运行 数据时,它首先以 header 名称开头

抓取RxNorm_Id https://rxnav.nlm.nih.gov/REST/rxcui/**RxNorm_Id**/historystatus.json?caller=RxNav

我不确定为什么它首先使用 header,但显然它会导致错误,因为搜索不存在。

如果我尝试将数据框更改为列表,它会将每个项目包装在方括号中。如:

Info: Python (2): [[99], [161], [167], [168], [197], [272], [281], [376]]


searches = searches.values.tolist()

Scraping [99]
https://rxnav.nlm.nih.gov/REST/rxcui/[99]/historystatus.json?caller=RxNav

如果我将搜索硬编码为 [99,161,167,168,197,272,281,376],我的循环可以正常工作。

如何获得该格式的初始数据框?或者如何让 tolist 函数不将每个数字括在方括号中。

我了解我的数据源是安全的,使用 Alteryx 会阻止我复制数据源。但是,这应该足以解决问题。

下面是我为便于重现而修剪的整个代码:

    from ayx import Alteryx
    from numpy import dtype
    import pandas as pd
    import requests
    
    searches = Alteryx.read("dataimport")
#    searches = searches.values.tolist()
    
    
#    for search in searches: attempt for the tolist() function
    for search in [searches]:
        print(f"Scraping {search}")
        url = f"https://rxnav.nlm.nih.gov/REST/rxcui/{search}/historystatus.json?caller=RxNav"

        print(url)
        data = s.get(url,headers=headers).json()      #results from second redirect
        print(data)
        a = data['rxcuiStatusHistory']['definitionalFeatures']
        b = data['rxcuiStatusHistory']['attributes']
        print(b)
        rxcui = b['rxcui']
        name = b['name']
        print(rxcui)
        print(name)
    
        try: 
            baserxcui = a['ingredientAndStrength'][0]['baseRxcui']
            basename = a['ingredientAndStrength'][0]['baseName']
            print(baserxcui)
            print(basename)
        except KeyError:
            baserxcui = rxcui
            basename = name
            print(baserxcui)
            print(basename)
    
        try:  
            bossrxcui = a['ingredientAndStrength'][0]['bossRxcui']
            bossname = a['ingredientAndStrength'][0]['bossName']
            print(bossrxcui)
            print(bossname)
        except KeyError:
            bossrxcui = rxcui
            bossname = name
            print(bossrxcui)
            print(bossname)

您可以只取搜索中的第一项,因为搜索是一个列表。所以你的代码会变成这样:

for search in searches:
    print(f"Scraping {search[0]}")
    url = f"https://rxnav.nlm.nih.gov/REST/rxcui/{search[0]}/historystatus.json?caller=RxNav"
    print(url)

或者你可以简单地改变

searches = searches.values.tolist()

searches = [i[0] for i in searches.values.tolist()]