json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) Scrapy
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) Scrapy
大家好我正在尝试 scrap/crawl 这个基于 json 的网站使用 scrapy/Beautifulsoup
https://pk.profdir.com/jobs-for-angular-developer-lahore-punjab-cddb
我已经将下面的代码写到 运行 read/fetch 来自网站的 json:
website_text = response.body.decode("utf-8")
jobs_soup = BeautifulSoup(website_text.replace("<", " <"), "html.parser")
script_tag = jobs_soup.find('script', {"type": 'application/ld+json'}).text
data = json.loads(script_tag, strict=False)
但是会反复出现这个错误:
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
如果有人知道请帮助我,这对我很有帮助
位于 <script>
内的 json 无效,因此 json
默认情况下无法对其进行解码。 Quick-and-dirty 修复是将 "description":
替换为 re.sub
(另外,使用 html5lib
作为 BeautifulSoup 解析器):
import re
import json
import requests
from bs4 import BeautifulSoup
url = "https://pk.profdir.com/jobs-for-angular-developer-lahore-punjab-cddb"
soup = BeautifulSoup(requests.get(url).content, "html5lib")
data = soup.select_one('script[type="application/ld+json"]').contents[0]
# fix "broken" description
data = re.sub(
r'(?<="description" : )"(.*?)"(?=,\s+")',
lambda g: json.dumps(g.group(1)),
data,
flags=re.S,
)
data = json.loads(data)
print(json.dumps(data, indent=4))
打印:
{
"@context": "http://schema.org/",
"@type": "JobPosting",
"title": "angular-developer",
"description": "<p>Designing and developing user interfaces using Angular best practices\n</p><p>\n</p><p>Adapting interface for modern internet applications using the latest front-end technologies\n</p><p>\n</p><p>Developing product analysis tasks and optimizing the user experience\n</p><p>\n</p><p>Proficiency in Angular, HTML, CSS, and JavaScript for rapid prototyping.\n</p><p>\n</p><p>Integration of APIs and RESTful Services.\n</p><p>\n</p><p>Creating Maintaining Mobile and Website Responsive Design and Mobile website.\n</p><p>\n</p><p>Developing Across Browsers\n</p><p>\n</p><p>Creating tools that improve site interaction regardless of the browser.\n</p><p>\n</p><p>Managing software workflow.\n</p><p>\n</p><p>Following SEO best practices Fixing bugs and testing for usability\n</p><p>\n</p><p>Conducting performance tests\n</p><p>\n</p><p>Consulting with the design team\n</p><p>\n</p><p>Ensuring high performance of applications and providing support\n</p><p>\n</p><p>\n</p><p>Job Requirements:\n</p><p>\n</p><p>\n</p><p>Expert knowledge of HTML5, CSS3\n</p><p>\n</p><p>Strong knowledge of JavaScript\n</p><p>\n</p><p>Experience in JS frameworks Angular\n</p><p>\n</p><p>Familiarity with Software version control systems e.g., Git\n</p><p>\n</p><p>Experience in Node.js\n</p><p>\n</p><p>Having knowledge of AWS environment is a plus\n</p><p>\n</p><p>AlienVault experience is a plus\n</p><p>\n</p><p>Jira Cloud experience is a plus\n</p><p>\n</p><p>Knowledge of CSS Pre-processor technologies including SASS\n</p><p>\n</p><p>Able to quickly transform visual designs into accurate HTML/CSS\n</p><p>\n</p><p>Ability to write high-performance, reusable code for UI components\n</p><p>\n</p><p>Strong understanding of security and performance fundamentals required\n</p><p>\n</p><p>Familiarity with the whole web stack, including protocols and web server optimization techniques\n</p><p>\n</p><p>Great communication skills You'll be interacting with Product and Development teams\n</p><p>\n</p><p>Experience in Grunt, Rollup, or Webpack is a plus\n</p><p>\n</p><p>Good Technical skills, Communication skills, General problem-solving skills, and Coding skills\n</p><p>\n</p><p>Package: Negotiable</p>",
"identifier": {
"@type": "PropertyValue",
"name": "TTS",
"value": "cddb"
},
"datePosted": "2022-02-18T00:00",
"validThrough": "2022-05-19T00:00",
"employmentType": "permanent<br>full time",
"hiringOrganization": {
"@type": "Organization",
"name": "TTS",
"sameAs": "https://pk.profdir.com/companies/tts-ebfu",
"logo": "https://pk.profdir.com/apple-icon.png"
},
"jobLocation": {
"@type": "Place",
"address": {
"@type": "PostalAddress",
"streetAddress": "R Block DHA Phase 2",
"addressLocality": "Lahore",
"addressRegion": "Punjab",
"postalCode": "53720",
"addressCountry": "PK"
}
},
"baseSalary": {
"@type": "MonetaryAmount",
"currency": "PKR",
"value": {
"@type": "QuantitativeValue",
"value": "70000",
"unitText": "MONTH"
}
}
}
大家好我正在尝试 scrap/crawl 这个基于 json 的网站使用 scrapy/Beautifulsoup
https://pk.profdir.com/jobs-for-angular-developer-lahore-punjab-cddb
我已经将下面的代码写到 运行 read/fetch 来自网站的 json:
website_text = response.body.decode("utf-8")
jobs_soup = BeautifulSoup(website_text.replace("<", " <"), "html.parser")
script_tag = jobs_soup.find('script', {"type": 'application/ld+json'}).text
data = json.loads(script_tag, strict=False)
但是会反复出现这个错误:
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
如果有人知道请帮助我,这对我很有帮助
位于 <script>
内的 json 无效,因此 json
默认情况下无法对其进行解码。 Quick-and-dirty 修复是将 "description":
替换为 re.sub
(另外,使用 html5lib
作为 BeautifulSoup 解析器):
import re
import json
import requests
from bs4 import BeautifulSoup
url = "https://pk.profdir.com/jobs-for-angular-developer-lahore-punjab-cddb"
soup = BeautifulSoup(requests.get(url).content, "html5lib")
data = soup.select_one('script[type="application/ld+json"]').contents[0]
# fix "broken" description
data = re.sub(
r'(?<="description" : )"(.*?)"(?=,\s+")',
lambda g: json.dumps(g.group(1)),
data,
flags=re.S,
)
data = json.loads(data)
print(json.dumps(data, indent=4))
打印:
{
"@context": "http://schema.org/",
"@type": "JobPosting",
"title": "angular-developer",
"description": "<p>Designing and developing user interfaces using Angular best practices\n</p><p>\n</p><p>Adapting interface for modern internet applications using the latest front-end technologies\n</p><p>\n</p><p>Developing product analysis tasks and optimizing the user experience\n</p><p>\n</p><p>Proficiency in Angular, HTML, CSS, and JavaScript for rapid prototyping.\n</p><p>\n</p><p>Integration of APIs and RESTful Services.\n</p><p>\n</p><p>Creating Maintaining Mobile and Website Responsive Design and Mobile website.\n</p><p>\n</p><p>Developing Across Browsers\n</p><p>\n</p><p>Creating tools that improve site interaction regardless of the browser.\n</p><p>\n</p><p>Managing software workflow.\n</p><p>\n</p><p>Following SEO best practices Fixing bugs and testing for usability\n</p><p>\n</p><p>Conducting performance tests\n</p><p>\n</p><p>Consulting with the design team\n</p><p>\n</p><p>Ensuring high performance of applications and providing support\n</p><p>\n</p><p>\n</p><p>Job Requirements:\n</p><p>\n</p><p>\n</p><p>Expert knowledge of HTML5, CSS3\n</p><p>\n</p><p>Strong knowledge of JavaScript\n</p><p>\n</p><p>Experience in JS frameworks Angular\n</p><p>\n</p><p>Familiarity with Software version control systems e.g., Git\n</p><p>\n</p><p>Experience in Node.js\n</p><p>\n</p><p>Having knowledge of AWS environment is a plus\n</p><p>\n</p><p>AlienVault experience is a plus\n</p><p>\n</p><p>Jira Cloud experience is a plus\n</p><p>\n</p><p>Knowledge of CSS Pre-processor technologies including SASS\n</p><p>\n</p><p>Able to quickly transform visual designs into accurate HTML/CSS\n</p><p>\n</p><p>Ability to write high-performance, reusable code for UI components\n</p><p>\n</p><p>Strong understanding of security and performance fundamentals required\n</p><p>\n</p><p>Familiarity with the whole web stack, including protocols and web server optimization techniques\n</p><p>\n</p><p>Great communication skills You'll be interacting with Product and Development teams\n</p><p>\n</p><p>Experience in Grunt, Rollup, or Webpack is a plus\n</p><p>\n</p><p>Good Technical skills, Communication skills, General problem-solving skills, and Coding skills\n</p><p>\n</p><p>Package: Negotiable</p>",
"identifier": {
"@type": "PropertyValue",
"name": "TTS",
"value": "cddb"
},
"datePosted": "2022-02-18T00:00",
"validThrough": "2022-05-19T00:00",
"employmentType": "permanent<br>full time",
"hiringOrganization": {
"@type": "Organization",
"name": "TTS",
"sameAs": "https://pk.profdir.com/companies/tts-ebfu",
"logo": "https://pk.profdir.com/apple-icon.png"
},
"jobLocation": {
"@type": "Place",
"address": {
"@type": "PostalAddress",
"streetAddress": "R Block DHA Phase 2",
"addressLocality": "Lahore",
"addressRegion": "Punjab",
"postalCode": "53720",
"addressCountry": "PK"
}
},
"baseSalary": {
"@type": "MonetaryAmount",
"currency": "PKR",
"value": {
"@type": "QuantitativeValue",
"value": "70000",
"unitText": "MONTH"
}
}
}