从 Python 中的大量字符串中读取引号内的子字符串
Reading substring within quotes from a massive string in Python
我有以下字符串:
{"name":"INPROCEEDINGS","__typename":"PublicationConferencePaper"},"hasPermiss
ionToLike":true,"hasPermissionToFollow":true,"publicationCategory":"researchSu
mmary","hasPublicFulltexts":false,"canClaim":false,"publicationType":"inProcee
dings","fulltextRequesterCount":0,"requests":{"__pagination__":
[{"offset":0,"limit":1,"list":[]}]},"activeFiguresCount":0,"activeFigures":
{"__pagination__":[{"offset":0,"limit":100,"list":
[]}]},"abstract":"Heterogeneous Multiprocessor System-on-Chip (MPSoC) are
progressively becoming predominant in most modern mobile devices. These
devices are required to perform processing of applications within thermal,
energy and performance constraints. However, most stock power and thermal
management mechanisms either neglect some of these constraints or rely on
frequency scaling to achieve energy-efficiency and temperature reduction on
the device. Although this inefficient technique can reduce temporal thermal
gradient, but at the same time hurts the performance of the executing task.
In this paper, we propose a thermal and energy management mechanism which
achieves reduction in thermal gradient as well as energy-efficiency through
resource mapping and thread-partitioning of applications with online
optimization in heterogeneous MPSoCs. The efficacy of the proposed approach is
experimentally appraised using different applications from Polybench benchmark
suite on Odroid-XU4 developmental platform. Results show 28% performance
improvement, 28.32% energy saving and reduced thermal variance of over 76%
when compared to the existing approaches. Additionally, the method is able to
free more than 90% in memory storage on the MPSoC, which would have been
previously utilized to store several task-to-thread mapping
configurations.","hasRequestedAbstract":false,"lockedFields"
我正在尝试获取 "abstract":" 和 ","hasRequestedAbstract" 之间的子字符串。为此,我使用以下代码:
import requests
#some more codes here........
to_visit_url = 'https://www.researchgate.net/publication/328749434_TEEM_Online_Thermal-_and_Energy-Efficiency_Management_on_CPU-GPU_MPSoCs'
this_page = requests.get(to_visit_url)
content = str(page.content, encoding="utf-8")
abstract = re.search('\"abstract\":\"(.*)\",\"hasRequestedAbstract\"', content)
print('Abstract:\n' + str(abstract))
但是在抽象变量中它的值是None。可能是什么问题?如何获取上面提到的子字符串?
注意:虽然看起来我可以将其作为 JSON 对象来阅读,但这不是一个选项,因为上面提供的示例文本只是完整 html 内容的一小部分提取 JSON 对象非常困难。
P.S。页面的全部内容即 page.content,可以从这里下载:https://docs.google.com/document/d/1awprvKsLPNoV6NZRmCkktYwMwWJo5aujGyNwGhDf7cA/edit?usp=sharing
re.search
没有 return 解析结果列表。它 returns SRE_Match
对象。
如果你想得到匹配列表,你需要使用re.findall
方法。
测试代码
import re
import requests
test_pattern = re.compile('\"abstract\":\"(.*)\",\"hasRequestedAbstract\"')
test_requests = requests.get("https://www.researchgate.net/publication/328749434_TEEM_Online_Thermal-_and_Energy-Efficiency_Management_on_CPU-GPU_MPSoCs")
print(test_pattern.findall(test_requests.text)[0])
结果
'Heterogeneous Multiprocessor System-on-Chip (MPSoC) are progressively becoming predominant in most modern mobile devices. These devices are required to perform processing of applications within thermal, energy and performance constraints. However, most stock power and thermal management mechanisms either neglect some of these constraints or rely on frequency scaling to achieve energy-efficiency and temperature reduction on the device. Although this inefficient technique can reduce temporal thermal gradient, but at the same time hurts the performance of the executing task. In this paper, we propose a thermal and energy management mechanism which achieves reduction in thermal gradient as well as energy-efficiency through resource mapping and thread-partitioning of applications with online optimization in heterogeneous MPSoCs. The efficacy of the proposed approach is experimentally appraised using different applications from Polybench benchmark suite on Odroid-XU4 developmental platform. Results show 28% performance improvement, 28.32% energy saving and reduced thermal variance of over 76% when compared to the existing approaches. Additionally, the method is able to free more than 90% in memory storage on the MPSoC, which would have been previously utilized to store several task-to-thread mapping configurations.'
当你requests.get(...)
你应该得到一个请求对象?
这些对象真的很聪明,您可以使用内置的 .json()
方法将您在问题中发布的字符串 return 作为 python 字典。
尽管我注意到您发布的 link 并未指向类似的内容,而是指向完整的 html 文档。如果您正在尝试解析这样的网站,您应该改为查看 beautifulsoup。 (https://www.crummy.com/software/BeautifulSoup/)
这个答案没有使用正则表达式(正则表达式),但可以完成工作。回答如下:
import re
import requests
def fetch_abstract(url = "https://www.researchgate.net/publication/328749434_TEEM_Online_Thermal-_and_Energy-Efficiency_Management_on_CPU-GPU_MPSoCs"):
test_requests = requests.get(url)
index = 0
inner_count = 0
while index < len(test_requests.text):
index = test_requests.text.find('[Show full abstract]</a><span class=\"lite-page-hidden', index)
if index == -1:
break
inner_count += 1
if inner_count == 4:
#extract the abstract from here -->
temp = test_requests.text[index-1:]
index2 = temp.find('</span></div><a class=\"nova-e-link nova-e-link--color-blue')
quote_index = temp.find('\">')
abstract = test_requests.text[index + quote_index + 2 : index - 1 + index2]
print(abstract)
index += 52
if __name__ == '__main__':
fetch_abstract()
结果:
Heterogeneous Multiprocessor System-on-Chip (MPSoC) are progressively
becoming predominant in most modern mobile devices. These devices are
required to perform processing of applications within thermal, energy
and performance constraints. However, most stock power and thermal
management mechanisms either neglect some of these constraints or rely
on frequency scaling to achieve energy-efficiency and temperature
reduction on the device. Although this inefficient technique can
reduce temporal thermal gradient, but at the same time hurts the
performance of the executing task. In this paper, we propose a thermal
and energy management mechanism which achieves reduction in thermal
gradient as well as energy-efficiency through resource mapping and
thread-partitioning of applications with online optimization in
heterogeneous MPSoCs. The efficacy of the proposed approach is
experimentally appraised using different applications from Polybench
benchmark suite on Odroid-XU4 developmental platform. Results show 28%
performance improvement, 28.32% energy saving and reduced thermal
variance of over 76% when compared to the existing approaches.
Additionally, the method is able to free more than 90% in memory
storage on the MPSoC, which would have been previously utilized to
store several task-to-thread mapping configurations.
我有以下字符串:
{"name":"INPROCEEDINGS","__typename":"PublicationConferencePaper"},"hasPermiss
ionToLike":true,"hasPermissionToFollow":true,"publicationCategory":"researchSu
mmary","hasPublicFulltexts":false,"canClaim":false,"publicationType":"inProcee
dings","fulltextRequesterCount":0,"requests":{"__pagination__":
[{"offset":0,"limit":1,"list":[]}]},"activeFiguresCount":0,"activeFigures":
{"__pagination__":[{"offset":0,"limit":100,"list":
[]}]},"abstract":"Heterogeneous Multiprocessor System-on-Chip (MPSoC) are
progressively becoming predominant in most modern mobile devices. These
devices are required to perform processing of applications within thermal,
energy and performance constraints. However, most stock power and thermal
management mechanisms either neglect some of these constraints or rely on
frequency scaling to achieve energy-efficiency and temperature reduction on
the device. Although this inefficient technique can reduce temporal thermal
gradient, but at the same time hurts the performance of the executing task.
In this paper, we propose a thermal and energy management mechanism which
achieves reduction in thermal gradient as well as energy-efficiency through
resource mapping and thread-partitioning of applications with online
optimization in heterogeneous MPSoCs. The efficacy of the proposed approach is
experimentally appraised using different applications from Polybench benchmark
suite on Odroid-XU4 developmental platform. Results show 28% performance
improvement, 28.32% energy saving and reduced thermal variance of over 76%
when compared to the existing approaches. Additionally, the method is able to
free more than 90% in memory storage on the MPSoC, which would have been
previously utilized to store several task-to-thread mapping
configurations.","hasRequestedAbstract":false,"lockedFields"
我正在尝试获取 "abstract":" 和 ","hasRequestedAbstract" 之间的子字符串。为此,我使用以下代码:
import requests
#some more codes here........
to_visit_url = 'https://www.researchgate.net/publication/328749434_TEEM_Online_Thermal-_and_Energy-Efficiency_Management_on_CPU-GPU_MPSoCs'
this_page = requests.get(to_visit_url)
content = str(page.content, encoding="utf-8")
abstract = re.search('\"abstract\":\"(.*)\",\"hasRequestedAbstract\"', content)
print('Abstract:\n' + str(abstract))
但是在抽象变量中它的值是None。可能是什么问题?如何获取上面提到的子字符串?
注意:虽然看起来我可以将其作为 JSON 对象来阅读,但这不是一个选项,因为上面提供的示例文本只是完整 html 内容的一小部分提取 JSON 对象非常困难。
P.S。页面的全部内容即 page.content,可以从这里下载:https://docs.google.com/document/d/1awprvKsLPNoV6NZRmCkktYwMwWJo5aujGyNwGhDf7cA/edit?usp=sharing
re.search
没有 return 解析结果列表。它 returns SRE_Match
对象。
如果你想得到匹配列表,你需要使用re.findall
方法。
测试代码
import re import requests test_pattern = re.compile('\"abstract\":\"(.*)\",\"hasRequestedAbstract\"') test_requests = requests.get("https://www.researchgate.net/publication/328749434_TEEM_Online_Thermal-_and_Energy-Efficiency_Management_on_CPU-GPU_MPSoCs") print(test_pattern.findall(test_requests.text)[0])
结果
'Heterogeneous Multiprocessor System-on-Chip (MPSoC) are progressively becoming predominant in most modern mobile devices. These devices are required to perform processing of applications within thermal, energy and performance constraints. However, most stock power and thermal management mechanisms either neglect some of these constraints or rely on frequency scaling to achieve energy-efficiency and temperature reduction on the device. Although this inefficient technique can reduce temporal thermal gradient, but at the same time hurts the performance of the executing task. In this paper, we propose a thermal and energy management mechanism which achieves reduction in thermal gradient as well as energy-efficiency through resource mapping and thread-partitioning of applications with online optimization in heterogeneous MPSoCs. The efficacy of the proposed approach is experimentally appraised using different applications from Polybench benchmark suite on Odroid-XU4 developmental platform. Results show 28% performance improvement, 28.32% energy saving and reduced thermal variance of over 76% when compared to the existing approaches. Additionally, the method is able to free more than 90% in memory storage on the MPSoC, which would have been previously utilized to store several task-to-thread mapping configurations.'
当你requests.get(...)
你应该得到一个请求对象?
这些对象真的很聪明,您可以使用内置的 .json()
方法将您在问题中发布的字符串 return 作为 python 字典。
尽管我注意到您发布的 link 并未指向类似的内容,而是指向完整的 html 文档。如果您正在尝试解析这样的网站,您应该改为查看 beautifulsoup。 (https://www.crummy.com/software/BeautifulSoup/)
这个答案没有使用正则表达式(正则表达式),但可以完成工作。回答如下:
import re
import requests
def fetch_abstract(url = "https://www.researchgate.net/publication/328749434_TEEM_Online_Thermal-_and_Energy-Efficiency_Management_on_CPU-GPU_MPSoCs"):
test_requests = requests.get(url)
index = 0
inner_count = 0
while index < len(test_requests.text):
index = test_requests.text.find('[Show full abstract]</a><span class=\"lite-page-hidden', index)
if index == -1:
break
inner_count += 1
if inner_count == 4:
#extract the abstract from here -->
temp = test_requests.text[index-1:]
index2 = temp.find('</span></div><a class=\"nova-e-link nova-e-link--color-blue')
quote_index = temp.find('\">')
abstract = test_requests.text[index + quote_index + 2 : index - 1 + index2]
print(abstract)
index += 52
if __name__ == '__main__':
fetch_abstract()
结果:
Heterogeneous Multiprocessor System-on-Chip (MPSoC) are progressively becoming predominant in most modern mobile devices. These devices are required to perform processing of applications within thermal, energy and performance constraints. However, most stock power and thermal management mechanisms either neglect some of these constraints or rely on frequency scaling to achieve energy-efficiency and temperature reduction on the device. Although this inefficient technique can reduce temporal thermal gradient, but at the same time hurts the performance of the executing task. In this paper, we propose a thermal and energy management mechanism which achieves reduction in thermal gradient as well as energy-efficiency through resource mapping and thread-partitioning of applications with online optimization in heterogeneous MPSoCs. The efficacy of the proposed approach is experimentally appraised using different applications from Polybench benchmark suite on Odroid-XU4 developmental platform. Results show 28% performance improvement, 28.32% energy saving and reduced thermal variance of over 76% when compared to the existing approaches. Additionally, the method is able to free more than 90% in memory storage on the MPSoC, which would have been previously utilized to store several task-to-thread mapping configurations.