Python DataFrames 存在连接或追加问题
Python DataFrames has a concat or append problem
我的专业不是编程或编码。但是工作上有事Python,不得不去做。我一个人研究了一个月,做出了这段代码,但我想把它改成正确的循环结构。我该怎么做?
- 条件1.df和mark变化0~900.(不是0~10)
- 条件2.每天只能调用500次access keys,希望大家不要尝试太多。
补充问题:收到XML形式的数据,应该如何编码?
import urllib.request
import json
import pandas as pd
import datetime
Host = "https://oapi.saramin.co.kr/job-search?access-key=L8ILhlpIElsdz7BvhWQxcON3g8WBCSRyPTBEY7qlitt5ksdVBV6"
headers = { Host: "oapi.saramin.co.kr", "Accept": "application/json"}
for i in range(0, 10) :
pages = i + 1
url = Host + "&start=" + str(pages) + "&count=110"
response = urllib.request.urlopen(url)
json_str = response.read().decode("utf-8")
json_object = json.loads(json_str)
globals()['mark{}'.format(i)] = pd.json_normalize(json_object['jobs']['job'])
df0=pd.DataFrame(mark0)
df1=pd.DataFrame(mark1)
df2=pd.DataFrame(mark2)
df3=pd.DataFrame(mark3)
df4=pd.DataFrame(mark4)
df5=pd.DataFrame(mark5)
df6=pd.DataFrame(mark6)
df7=pd.DataFrame(mark7)
df8=pd.DataFrame(mark8)
df9=pd.DataFrame(mark9)
df_all = pd.concat([df0, df1, df2, df3, df4, df5, df6, df7, df8, df9])
today_string = datetime.datetime.now().strftime('%y.%m.%d_%Hh%Mm%Ss')
df_all.to_excel('Saramin_Raw_Data(' + today_string + ').xlsx')
您可以试试下面的代码:-
import urllib.request
import json
import pandas as pd
import datetime
Host = "https://oapi.saramin.co.kr/job-search?access-key=L8ILhlpIElsdz7BvhWQxcON3g8WBCSRyPTBEY7qlitt5ksdVBV6"
headers = { Host: "oapi.saramin.co.kr", "Accept": "application/json"}
df_list = list()
NUM_FILES = 3
for i in range(0, NUM_FILES+1) :
pages = i + 1
url = Host + "&start=" + str(pages) + "&count=110"
response = urllib.request.urlopen(url)
json_str = response.read().decode("utf-8")
json_object = json.loads(json_str)
mark = pd.json_normalize(json_object['jobs']['job'])
df = pd.DataFrame(mark)
df_list.append(df)
df_all = pd.concat(df_list)
today_string = datetime.datetime.now().strftime('%y.%m.%d_%Hh%Mm%Ss')
df_all.to_excel('Saramin_Raw_Data(' + today_string + ').xlsx')
您必须设置 NUM_FILES = 900
的值
希望对您有所帮助!!
关于您的其他问题,您可以关注这篇不错的媒体文章:-
https://medium.com/@robertopreste/from-xml-to-pandas-dataframes-9292980b1c1c
我的专业不是编程或编码。但是工作上有事Python,不得不去做。我一个人研究了一个月,做出了这段代码,但我想把它改成正确的循环结构。我该怎么做?
- 条件1.df和mark变化0~900.(不是0~10)
- 条件2.每天只能调用500次access keys,希望大家不要尝试太多。
补充问题:收到XML形式的数据,应该如何编码?
import urllib.request
import json
import pandas as pd
import datetime
Host = "https://oapi.saramin.co.kr/job-search?access-key=L8ILhlpIElsdz7BvhWQxcON3g8WBCSRyPTBEY7qlitt5ksdVBV6"
headers = { Host: "oapi.saramin.co.kr", "Accept": "application/json"}
for i in range(0, 10) :
pages = i + 1
url = Host + "&start=" + str(pages) + "&count=110"
response = urllib.request.urlopen(url)
json_str = response.read().decode("utf-8")
json_object = json.loads(json_str)
globals()['mark{}'.format(i)] = pd.json_normalize(json_object['jobs']['job'])
df0=pd.DataFrame(mark0)
df1=pd.DataFrame(mark1)
df2=pd.DataFrame(mark2)
df3=pd.DataFrame(mark3)
df4=pd.DataFrame(mark4)
df5=pd.DataFrame(mark5)
df6=pd.DataFrame(mark6)
df7=pd.DataFrame(mark7)
df8=pd.DataFrame(mark8)
df9=pd.DataFrame(mark9)
df_all = pd.concat([df0, df1, df2, df3, df4, df5, df6, df7, df8, df9])
today_string = datetime.datetime.now().strftime('%y.%m.%d_%Hh%Mm%Ss')
df_all.to_excel('Saramin_Raw_Data(' + today_string + ').xlsx')
您可以试试下面的代码:-
import urllib.request
import json
import pandas as pd
import datetime
Host = "https://oapi.saramin.co.kr/job-search?access-key=L8ILhlpIElsdz7BvhWQxcON3g8WBCSRyPTBEY7qlitt5ksdVBV6"
headers = { Host: "oapi.saramin.co.kr", "Accept": "application/json"}
df_list = list()
NUM_FILES = 3
for i in range(0, NUM_FILES+1) :
pages = i + 1
url = Host + "&start=" + str(pages) + "&count=110"
response = urllib.request.urlopen(url)
json_str = response.read().decode("utf-8")
json_object = json.loads(json_str)
mark = pd.json_normalize(json_object['jobs']['job'])
df = pd.DataFrame(mark)
df_list.append(df)
df_all = pd.concat(df_list)
today_string = datetime.datetime.now().strftime('%y.%m.%d_%Hh%Mm%Ss')
df_all.to_excel('Saramin_Raw_Data(' + today_string + ').xlsx')
您必须设置 NUM_FILES = 900
希望对您有所帮助!!
关于您的其他问题,您可以关注这篇不错的媒体文章:- https://medium.com/@robertopreste/from-xml-to-pandas-dataframes-9292980b1c1c