多个函数使用相同的代码,但差异很小——如何优化?
Same code used in multiple functions but with minor differences - how to optimize?
这是Udacity课程的代码,我稍微改了一下。现在,当它 运行 时,它要求我提供电影名称,预告片将在浏览器中弹出(这是另一部分,未显示)。
如您所见,该程序中有很多重复代码,函数extract_name
、movie_poster_url
和movie_trailer_url
具有相同的代码。有没有办法摆脱重复相同的代码但具有相同的输出?如果是这样,它会 运行 更快吗?
import fresh_tomatoes
import media
import urllib
import requests
from BeautifulSoup import BeautifulSoup
name = raw_input("Enter movie name:- ")
global movie_name
def extract_html(name):
url = "website name" + name + "continuation of website name" + name + "again continuation of web site name"
response = requests.get(url)
page = str(BeautifulSoup(response.content))
return page
def extract_name(page):
start_link = page.find(' - IMDb</a></h3><div class="s"><div class="kv"')
start_url = page.find('>',start_link-140)
start_url1 = page.find('>', start_link-140)
end_url = page.find(' - IMDb</a>', start_link-140)
name_of_movie = page[start_url1+1:end_url]
return extract_char(name_of_movie)
def extract_char(name_of_movie):
name_array = []
for words in name_of_movie:
word = words.strip('</b>,')
name_array.append(word)
return ''.join(name_array)
def movie_poster_url(name_of_movie):
movie_name, seperator, tail = name_of_movie.partition(' (')
#movie_name = name_of_movie.rstrip('()0123456789 ')
page = urllib.urlopen('another web site name' + movie_name + 'continuation of website name').read()
start_link = page.find('"Poster":')
start_url = page.find('"',start_link+9)
end_url = page.find('"',start_url+1)
poster_url = page[start_url+1:end_url]
return poster_url
def movie_trailer_url(name_of_movie):
movie_name, seperator, tail = name_of_movie.partition(' (')
#movie_name = name_of_movie.rstrip('()0123456789 ')
page = urllib.urlopen('another website name' + movie_name + " trailer").read()
start_link = page.find('<div class="yt-lockup-dismissable"><div class="yt-lockup-thumbnail contains-addto"><a aria-hidden="true" href=')
start_url = page.find('"',start_link+110)
end_url = page.find('" ',start_url+1)
trailer_url1 = page[start_url+1:end_url]
trailer_url = "www.youtube.com" + trailer_url1
return trailer_url
page = extract_html(name)
movie_name = extract_name(page)
new_movie = media.Movie(movie_name, "Storyline WOW", movie_poster_url(movie_name), movie_trailer_url(movie_name))
movies = [new_movie]
fresh_tomatoes.open_movies_page(movies)
您可以将共享部分移动到它们自己的函数中:
def find_page(url, name, find, offset):
movie_name, seperator, tail = name_of_movie.partition(' (')
page = urllib.urlopen(url.format(name)).read()
start_link = page.find(find)
start_url = page.find('"',start_link+offset)
end_url = page.find('" ',start_url+1)
return page[start_url+1:end_url]
def movie_poster_url(name_of_movie):
return find_page("another website name{} continuation of website name", name_of_movie, '"Poster":', 9)
def movie_trailer_url(name_of_movie):
trailer_url = find_page("another website name{} trailer", name_of_movie, '<div class="yt-lockup-dismissable"><div class="yt-lockup-thumbnail contains-addto"><a aria-hidden="true" href=', 110)
return "www.youtube.com" + trailer_url
它肯定不会 运行 更快(函数之间 "switch" 需要额外的工作)但性能差异可能可以忽略不计。
关于你的第二个问题:分析不是一种技术或方法,它在你的代码中是"finding out what's being bad":
Profiling is a form of
dynamic program analysis that measures, for example, the space
(memory) or time complexity of a program, the usage of particular
instructions, or the frequency and duration of function calls.
(wikipedia)
所以它不是加速你的程序的东西,它是你所做的事情的一个词找出你可以做什么加快你的程序。
这里真的很快,因为我是一个超级新手,但我可以看到重复;我要做的是找出所有 3 个函数共享的(大部分)重复代码块,然后找出它们的不同之处;编写一个将差异作为参数的新函数。例如:
def extract(tarString,delim,startDiff,endDiff):
start_link = page.find(tarString)
start_url = page.find(delim,start_link+startDiff)
end_url = page.find(delim,start_url+endDiff)
url_out = page[start_url+1:end_url]
然后,在您的海报、预告片等函数中,只需为每种情况使用适当的参数调用此提取函数。即发帖人会打电话给
poster_url=extract(tarString='"Poster:"',delim='"',startDiff=9, endDiff=1)
我可以看到你已经有了另一个答案,它很可能是由比我更了解的人写的,但我希望你能从我的 "philosophy of modularizing" 中从新手的角度得到一些东西。
这是Udacity课程的代码,我稍微改了一下。现在,当它 运行 时,它要求我提供电影名称,预告片将在浏览器中弹出(这是另一部分,未显示)。
如您所见,该程序中有很多重复代码,函数extract_name
、movie_poster_url
和movie_trailer_url
具有相同的代码。有没有办法摆脱重复相同的代码但具有相同的输出?如果是这样,它会 运行 更快吗?
import fresh_tomatoes
import media
import urllib
import requests
from BeautifulSoup import BeautifulSoup
name = raw_input("Enter movie name:- ")
global movie_name
def extract_html(name):
url = "website name" + name + "continuation of website name" + name + "again continuation of web site name"
response = requests.get(url)
page = str(BeautifulSoup(response.content))
return page
def extract_name(page):
start_link = page.find(' - IMDb</a></h3><div class="s"><div class="kv"')
start_url = page.find('>',start_link-140)
start_url1 = page.find('>', start_link-140)
end_url = page.find(' - IMDb</a>', start_link-140)
name_of_movie = page[start_url1+1:end_url]
return extract_char(name_of_movie)
def extract_char(name_of_movie):
name_array = []
for words in name_of_movie:
word = words.strip('</b>,')
name_array.append(word)
return ''.join(name_array)
def movie_poster_url(name_of_movie):
movie_name, seperator, tail = name_of_movie.partition(' (')
#movie_name = name_of_movie.rstrip('()0123456789 ')
page = urllib.urlopen('another web site name' + movie_name + 'continuation of website name').read()
start_link = page.find('"Poster":')
start_url = page.find('"',start_link+9)
end_url = page.find('"',start_url+1)
poster_url = page[start_url+1:end_url]
return poster_url
def movie_trailer_url(name_of_movie):
movie_name, seperator, tail = name_of_movie.partition(' (')
#movie_name = name_of_movie.rstrip('()0123456789 ')
page = urllib.urlopen('another website name' + movie_name + " trailer").read()
start_link = page.find('<div class="yt-lockup-dismissable"><div class="yt-lockup-thumbnail contains-addto"><a aria-hidden="true" href=')
start_url = page.find('"',start_link+110)
end_url = page.find('" ',start_url+1)
trailer_url1 = page[start_url+1:end_url]
trailer_url = "www.youtube.com" + trailer_url1
return trailer_url
page = extract_html(name)
movie_name = extract_name(page)
new_movie = media.Movie(movie_name, "Storyline WOW", movie_poster_url(movie_name), movie_trailer_url(movie_name))
movies = [new_movie]
fresh_tomatoes.open_movies_page(movies)
您可以将共享部分移动到它们自己的函数中:
def find_page(url, name, find, offset):
movie_name, seperator, tail = name_of_movie.partition(' (')
page = urllib.urlopen(url.format(name)).read()
start_link = page.find(find)
start_url = page.find('"',start_link+offset)
end_url = page.find('" ',start_url+1)
return page[start_url+1:end_url]
def movie_poster_url(name_of_movie):
return find_page("another website name{} continuation of website name", name_of_movie, '"Poster":', 9)
def movie_trailer_url(name_of_movie):
trailer_url = find_page("another website name{} trailer", name_of_movie, '<div class="yt-lockup-dismissable"><div class="yt-lockup-thumbnail contains-addto"><a aria-hidden="true" href=', 110)
return "www.youtube.com" + trailer_url
它肯定不会 运行 更快(函数之间 "switch" 需要额外的工作)但性能差异可能可以忽略不计。
关于你的第二个问题:分析不是一种技术或方法,它在你的代码中是"finding out what's being bad":
Profiling is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. (wikipedia)
所以它不是加速你的程序的东西,它是你所做的事情的一个词找出你可以做什么加快你的程序。
这里真的很快,因为我是一个超级新手,但我可以看到重复;我要做的是找出所有 3 个函数共享的(大部分)重复代码块,然后找出它们的不同之处;编写一个将差异作为参数的新函数。例如:
def extract(tarString,delim,startDiff,endDiff):
start_link = page.find(tarString)
start_url = page.find(delim,start_link+startDiff)
end_url = page.find(delim,start_url+endDiff)
url_out = page[start_url+1:end_url]
然后,在您的海报、预告片等函数中,只需为每种情况使用适当的参数调用此提取函数。即发帖人会打电话给
poster_url=extract(tarString='"Poster:"',delim='"',startDiff=9, endDiff=1)
我可以看到你已经有了另一个答案,它很可能是由比我更了解的人写的,但我希望你能从我的 "philosophy of modularizing" 中从新手的角度得到一些东西。