数据框 tolist 添加 [] 或数据框在通过 for 循环时读取 header。如何让列表在 for 循环中工作?
Dataframe tolist adds [] or the dataframe reads header while going through a for loop. How can I get the list to work in the for loop?
我正在使用来自 Alteryx 数据流的导入,它是包含以下整数格式的单个列:
从 alteryx 读取数据会自动将其转换为数据帧。
searches = Alteryx.read("dataimport")
SUCCESS: reading input data "dataimport"
RxNorm_Id
0 99
1 161
2 167
3 168
4 197
... ...
6711 2562541
6712 2565823
6713 2566308
6714 2566416
6715 2571104
我有一个 for 循环,它遍历 URL 并用搜索替换一个片段。
for search in searches:
print(f"Scraping {search}")
url = f"https://rxnav.nlm.nih.gov/REST/rxcui/{search}/historystatus.json?caller=RxNav"
print(url)
当我尝试通过循环 运行 数据时,它首先以 header 名称开头
抓取RxNorm_Id
https://rxnav.nlm.nih.gov/REST/rxcui/**RxNorm_Id**/historystatus.json?caller=RxNav
我不确定为什么它首先使用 header,但显然它会导致错误,因为搜索不存在。
如果我尝试将数据框更改为列表,它会将每个项目包装在方括号中。如:
Info: Python (2): [[99], [161], [167], [168], [197], [272], [281], [376]]
searches = searches.values.tolist()
Scraping [99]
https://rxnav.nlm.nih.gov/REST/rxcui/[99]/historystatus.json?caller=RxNav
如果我将搜索硬编码为 [99,161,167,168,197,272,281,376],我的循环可以正常工作。
如何获得该格式的初始数据框?或者如何让 tolist 函数不将每个数字括在方括号中。
我了解我的数据源是安全的,使用 Alteryx 会阻止我复制数据源。但是,这应该足以解决问题。
下面是我为便于重现而修剪的整个代码:
from ayx import Alteryx
from numpy import dtype
import pandas as pd
import requests
searches = Alteryx.read("dataimport")
# searches = searches.values.tolist()
# for search in searches: attempt for the tolist() function
for search in [searches]:
print(f"Scraping {search}")
url = f"https://rxnav.nlm.nih.gov/REST/rxcui/{search}/historystatus.json?caller=RxNav"
print(url)
data = s.get(url,headers=headers).json() #results from second redirect
print(data)
a = data['rxcuiStatusHistory']['definitionalFeatures']
b = data['rxcuiStatusHistory']['attributes']
print(b)
rxcui = b['rxcui']
name = b['name']
print(rxcui)
print(name)
try:
baserxcui = a['ingredientAndStrength'][0]['baseRxcui']
basename = a['ingredientAndStrength'][0]['baseName']
print(baserxcui)
print(basename)
except KeyError:
baserxcui = rxcui
basename = name
print(baserxcui)
print(basename)
try:
bossrxcui = a['ingredientAndStrength'][0]['bossRxcui']
bossname = a['ingredientAndStrength'][0]['bossName']
print(bossrxcui)
print(bossname)
except KeyError:
bossrxcui = rxcui
bossname = name
print(bossrxcui)
print(bossname)
您可以只取搜索中的第一项,因为搜索是一个列表。所以你的代码会变成这样:
for search in searches:
print(f"Scraping {search[0]}")
url = f"https://rxnav.nlm.nih.gov/REST/rxcui/{search[0]}/historystatus.json?caller=RxNav"
print(url)
或者你可以简单地改变
searches = searches.values.tolist()
至
searches = [i[0] for i in searches.values.tolist()]
我正在使用来自 Alteryx 数据流的导入,它是包含以下整数格式的单个列:
从 alteryx 读取数据会自动将其转换为数据帧。
searches = Alteryx.read("dataimport")
SUCCESS: reading input data "dataimport"
RxNorm_Id
0 99
1 161
2 167
3 168
4 197
... ...
6711 2562541
6712 2565823
6713 2566308
6714 2566416
6715 2571104
我有一个 for 循环,它遍历 URL 并用搜索替换一个片段。
for search in searches:
print(f"Scraping {search}")
url = f"https://rxnav.nlm.nih.gov/REST/rxcui/{search}/historystatus.json?caller=RxNav"
print(url)
当我尝试通过循环 运行 数据时,它首先以 header 名称开头
抓取RxNorm_Id https://rxnav.nlm.nih.gov/REST/rxcui/**RxNorm_Id**/historystatus.json?caller=RxNav
我不确定为什么它首先使用 header,但显然它会导致错误,因为搜索不存在。
如果我尝试将数据框更改为列表,它会将每个项目包装在方括号中。如:
Info: Python (2): [[99], [161], [167], [168], [197], [272], [281], [376]]
searches = searches.values.tolist()
Scraping [99]
https://rxnav.nlm.nih.gov/REST/rxcui/[99]/historystatus.json?caller=RxNav
如果我将搜索硬编码为 [99,161,167,168,197,272,281,376],我的循环可以正常工作。
如何获得该格式的初始数据框?或者如何让 tolist 函数不将每个数字括在方括号中。
我了解我的数据源是安全的,使用 Alteryx 会阻止我复制数据源。但是,这应该足以解决问题。
下面是我为便于重现而修剪的整个代码:
from ayx import Alteryx
from numpy import dtype
import pandas as pd
import requests
searches = Alteryx.read("dataimport")
# searches = searches.values.tolist()
# for search in searches: attempt for the tolist() function
for search in [searches]:
print(f"Scraping {search}")
url = f"https://rxnav.nlm.nih.gov/REST/rxcui/{search}/historystatus.json?caller=RxNav"
print(url)
data = s.get(url,headers=headers).json() #results from second redirect
print(data)
a = data['rxcuiStatusHistory']['definitionalFeatures']
b = data['rxcuiStatusHistory']['attributes']
print(b)
rxcui = b['rxcui']
name = b['name']
print(rxcui)
print(name)
try:
baserxcui = a['ingredientAndStrength'][0]['baseRxcui']
basename = a['ingredientAndStrength'][0]['baseName']
print(baserxcui)
print(basename)
except KeyError:
baserxcui = rxcui
basename = name
print(baserxcui)
print(basename)
try:
bossrxcui = a['ingredientAndStrength'][0]['bossRxcui']
bossname = a['ingredientAndStrength'][0]['bossName']
print(bossrxcui)
print(bossname)
except KeyError:
bossrxcui = rxcui
bossname = name
print(bossrxcui)
print(bossname)
您可以只取搜索中的第一项,因为搜索是一个列表。所以你的代码会变成这样:
for search in searches:
print(f"Scraping {search[0]}")
url = f"https://rxnav.nlm.nih.gov/REST/rxcui/{search[0]}/historystatus.json?caller=RxNav"
print(url)
或者你可以简单地改变
searches = searches.values.tolist()
至
searches = [i[0] for i in searches.values.tolist()]