多个函数使用相同的代码,但差异很小——如何优化?

Same code used in multiple functions but with minor differences - how to optimize?

这是Udacity课程的代码,我稍微改了一下。现在,当它 运行 时,它要求我提供电影名称,预告片将在浏览器中弹出(这是另一部分,未显示)。

如您所见,该程序中有很多重复代码,函数extract_namemovie_poster_urlmovie_trailer_url具有相同的代码。有没有办法摆脱重复相同的代码但具有相同的输出?如果是这样,它会 运行 更快吗?

import fresh_tomatoes
import media
import urllib
import requests
from BeautifulSoup import BeautifulSoup

name = raw_input("Enter movie name:- ")
global movie_name

def extract_html(name):
    url = "website name" + name + "continuation of website name" + name + "again continuation of web site name"
    response = requests.get(url)
    page = str(BeautifulSoup(response.content))
    return page

def extract_name(page):
    start_link = page.find(' - IMDb</a></h3><div class="s"><div class="kv"')
    start_url = page.find('>',start_link-140)
    start_url1 = page.find('>', start_link-140)
    end_url = page.find(' - IMDb</a>', start_link-140)
    name_of_movie = page[start_url1+1:end_url]
    return extract_char(name_of_movie)

def extract_char(name_of_movie):
    name_array = []
    for words in name_of_movie:
        word = words.strip('</b>,')
        name_array.append(word)
    return ''.join(name_array)

def movie_poster_url(name_of_movie):
    movie_name, seperator, tail = name_of_movie.partition(' (')
    #movie_name = name_of_movie.rstrip('()0123456789 ')
    page = urllib.urlopen('another web site name' + movie_name + 'continuation of website name').read()
    start_link = page.find('"Poster":')
    start_url = page.find('"',start_link+9)
    end_url = page.find('"',start_url+1)
    poster_url = page[start_url+1:end_url]
    return poster_url

def movie_trailer_url(name_of_movie):
     movie_name, seperator, tail = name_of_movie.partition(' (')
#movie_name = name_of_movie.rstrip('()0123456789 ')
    page = urllib.urlopen('another website name' + movie_name + " trailer").read()
    start_link = page.find('<div class="yt-lockup-dismissable"><div class="yt-lockup-thumbnail contains-addto"><a aria-hidden="true" href=')
    start_url = page.find('"',start_link+110)
    end_url = page.find('" ',start_url+1)
    trailer_url1 = page[start_url+1:end_url]
    trailer_url = "www.youtube.com" + trailer_url1
    return trailer_url

page = extract_html(name)
movie_name = extract_name(page)
new_movie = media.Movie(movie_name, "Storyline WOW", movie_poster_url(movie_name), movie_trailer_url(movie_name))
movies = [new_movie]
fresh_tomatoes.open_movies_page(movies)

您可以将共享部分移动到它们自己的函数中:

def find_page(url, name, find, offset):
    movie_name, seperator, tail = name_of_movie.partition(' (')
    page = urllib.urlopen(url.format(name)).read()
    start_link = page.find(find)
    start_url = page.find('"',start_link+offset)
    end_url = page.find('" ',start_url+1)
    return page[start_url+1:end_url]


def movie_poster_url(name_of_movie):
    return find_page("another website name{} continuation of website name", name_of_movie, '"Poster":', 9)

def movie_trailer_url(name_of_movie):
    trailer_url = find_page("another website name{} trailer", name_of_movie, '<div class="yt-lockup-dismissable"><div class="yt-lockup-thumbnail contains-addto"><a aria-hidden="true" href=', 110)
    return "www.youtube.com" + trailer_url

它肯定不会 运行 更快(函数之间 "switch" 需要额外的工作)但性能差异可能可以忽略不计。

关于你的第二个问题:分析不是一种技术或方法,它在你的代码中是"finding out what's being bad":

Profiling is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. (wikipedia)

所以它不是加速你的程序的东西,它是你所做的事情的一个词找出可以做什么加快你的程序。

这里真的很快,因为我是一个超级新手,但我可以看到重复;我要做的是找出所有 3 个函数共享的(大部分)重复代码块,然后找出它们的不同之处;编写一个将差异作为参数的新函数。例如:

def extract(tarString,delim,startDiff,endDiff):
    start_link = page.find(tarString)
    start_url = page.find(delim,start_link+startDiff)
    end_url = page.find(delim,start_url+endDiff)
    url_out = page[start_url+1:end_url]

然后,在您的海报、预告片等函数中,只需为每种情况使用适当的参数调用此提取函数。即发帖人会打电话给

poster_url=extract(tarString='"Poster:"',delim='"',startDiff=9, endDiff=1)

我可以看到你已经有了另一个答案,它很可能是由比我更了解的人写的,但我希望你能从我的 "philosophy of modularizing" 中从新手的角度得到一些东西。