如何从函数结果创建新列
How to create a new column from function result
目前运行正在使用下面的脚本检查一长串 url 中的错误。此代码首先在 df['Final_URL'] 中查找唯一的 url,测试每个单独的 url 和 returns link url 的状态.当我 运行 下面的代码时,我在我的笔记本上得到了当前输出,这很好。现在我想将状态代码(例如 200、404、BAD 等)推送到我的 df 中一个名为 "Status" 的新列,用于所有 url,它等于我唯一的 urls在代码的开头得到。
创建新列 df['Status'] 的最佳方法是什么,因为我想将其导出到 google 工作表,您知道在使用 pygsheets 更新单元格时是否保留了文本颜色吗?
Input code:
#get unique urls and check for errors
URLS = []
for unique_link in df['Final_URL'].unique():
URLS.append(unique_link)
try:
GREEN = '3[92m'
YELLOW = '3[93m'
RED = '3[91m'
ENDC = '3[0m'
def main():
while True:
print ("\nTesting URLs.", time.ctime())
checkUrls()
time.sleep(10) #Sleep 10 seconds
break
def checkUrls():
for url in URLS:
status = "N/A"
try:
#check if regex contains bet3.com
if re.search(".*bet3\.com.*", url):
status = checkUrl(url)
else:
status = "BAD"
except requests.exceptions.ConnectionError:
status = "DOWN"
printStatus(url, status)
#for x in df['Final_URL']:
# if x == url:
# df['Status'] = printStatus(status)
def checkUrl(url):
r = requests.get(url, timeout=5)
#print r.status_code
return str(r.status_code)
def printStatus(url, status):
color = GREEN
if status != "200":
color=RED
print (color+status+ENDC+' '+ url)
#
# Main app
#
if __name__ == '__main__':
main()
except:
print('Something went wrong!')
Current output:
200 https://www.bet3.com/dl/~offer
404 http://extra.bet3.com/promotions/en/soccer/soccer-accumulator-bonus
BAD https://extra.betting3.com/features/en/bet-builder
200 https://www.bet3.com/dl/6
你可以这样重写你的函数
def checkUrl(url):
if re.search(".*bet3\.com.*", url):
try:
r = requests.get(url, timeout=5)
except requests.exceptions.ConnectionError:
return 'DOWN'
return str(r.status_code)
return 'BAD'
然后像这样应用它
df['Status'] = df['Final_URL'].apply(checkUrl)
不过,正如 user32185 所注意到的,如果有重复的 URL,这将调用它们两次。
为了避免这种情况,您可以按照 user32185 的建议并像这样编写您的函数:
def checkUrls(urls):
results = []
for url in urls:
if re.search(".*bet3\.com.*", url):
try:
r = requests.get(url, timeout=5)
except requests.exceptions.ConnectionError:
results.append([url, 'DOWN'])
results.append([url, str(r.status_code)])
else:
results.append([url, 'BAD'])
return pd.DataFrame(data=results, columns=['Final_URL', 'Status'])
然后像这样使用它:
status_df = checkUrls(df['Final_URL'].unique())
df = df.merge(status_df, how='left', on='Final_URL')
目前运行正在使用下面的脚本检查一长串 url 中的错误。此代码首先在 df['Final_URL'] 中查找唯一的 url,测试每个单独的 url 和 returns link url 的状态.当我 运行 下面的代码时,我在我的笔记本上得到了当前输出,这很好。现在我想将状态代码(例如 200、404、BAD 等)推送到我的 df 中一个名为 "Status" 的新列,用于所有 url,它等于我唯一的 urls在代码的开头得到。
创建新列 df['Status'] 的最佳方法是什么,因为我想将其导出到 google 工作表,您知道在使用 pygsheets 更新单元格时是否保留了文本颜色吗?
Input code:
#get unique urls and check for errors
URLS = []
for unique_link in df['Final_URL'].unique():
URLS.append(unique_link)
try:
GREEN = '3[92m'
YELLOW = '3[93m'
RED = '3[91m'
ENDC = '3[0m'
def main():
while True:
print ("\nTesting URLs.", time.ctime())
checkUrls()
time.sleep(10) #Sleep 10 seconds
break
def checkUrls():
for url in URLS:
status = "N/A"
try:
#check if regex contains bet3.com
if re.search(".*bet3\.com.*", url):
status = checkUrl(url)
else:
status = "BAD"
except requests.exceptions.ConnectionError:
status = "DOWN"
printStatus(url, status)
#for x in df['Final_URL']:
# if x == url:
# df['Status'] = printStatus(status)
def checkUrl(url):
r = requests.get(url, timeout=5)
#print r.status_code
return str(r.status_code)
def printStatus(url, status):
color = GREEN
if status != "200":
color=RED
print (color+status+ENDC+' '+ url)
#
# Main app
#
if __name__ == '__main__':
main()
except:
print('Something went wrong!')
Current output:
200 https://www.bet3.com/dl/~offer
404 http://extra.bet3.com/promotions/en/soccer/soccer-accumulator-bonus
BAD https://extra.betting3.com/features/en/bet-builder
200 https://www.bet3.com/dl/6
你可以这样重写你的函数
def checkUrl(url):
if re.search(".*bet3\.com.*", url):
try:
r = requests.get(url, timeout=5)
except requests.exceptions.ConnectionError:
return 'DOWN'
return str(r.status_code)
return 'BAD'
然后像这样应用它
df['Status'] = df['Final_URL'].apply(checkUrl)
不过,正如 user32185 所注意到的,如果有重复的 URL,这将调用它们两次。
为了避免这种情况,您可以按照 user32185 的建议并像这样编写您的函数:
def checkUrls(urls):
results = []
for url in urls:
if re.search(".*bet3\.com.*", url):
try:
r = requests.get(url, timeout=5)
except requests.exceptions.ConnectionError:
results.append([url, 'DOWN'])
results.append([url, str(r.status_code)])
else:
results.append([url, 'BAD'])
return pd.DataFrame(data=results, columns=['Final_URL', 'Status'])
然后像这样使用它:
status_df = checkUrls(df['Final_URL'].unique())
df = df.merge(status_df, how='left', on='Final_URL')