我想用我自己的代码替换 html 代码

Question

我正在使用 lxml 和 beautifulsoup 库，实际上我的目标是将特定标签的文本从整个 html 代码中翻译出来，我想要的是，我想替换文本带有翻译文本的特定标签。

我想为特定的 xpath 设置一个循环，其中所有翻译的文本都应该一个接一个地插入。并且 html 代码应该 return 与翻译版本一起编辑。

from bs4 import BeautifulSoup, NavigableString, Tag
import requests
import time
import pandas as pd
import translators as ts
import json
import numpy as np
import regex
import selenium
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from lxml import html
import time
import lxml.html



#r=requests.get(input('Enter the URL of your HTML page:\n'))
r=requests.get('https://neculaifantanaru.com/en/qualities-of-a-leader-inner-integrity.html')
soup=BeautifulSoup(r.text, 'html.parser')
page=r.content
element = html.fromstring(page)




try:
    articles=[]
    for item in element.xpath('//*[@id="incadrare_text_mijloc_2"]/div[1]//p[@class = "text_obisnuit"]'):  

        texts=item.text_content()
        #texts=texts.split('"',100)
        #articles.append(item.text_content())
        articles.append(texts)
        translated_articles=[]
        for text in articles:
            print(text)
            output=ts.google(text, from_language='en', to_language='ro')
            translated_articles.append(output)
            
            for i,z in zip(translated_articles,soup.find_all('p', attrs={'class':'text_obisnuit'})):
                var=z.string
                var.replace_with(var.replace(var, i))

    
    #print(soup)

except Exception as e:
    print(e)

我没有从这个 xpath 中获取整个文本。

element.xpath('//*[@id="incadrare_text_mijloc_2"]/div[1]//p[@class = "text_obisnuit"]')

我得到的输出：

Everything in Kevin Lomax's life changed after he was recruited by the most powerful law firm in the world, "Milton, Chadwick & Waters". Despite the fact that his mother was not agree, he accepted to provide his services of a professional lawyer to this company headed by none other than John Milton, a very powerful man with a very strange personality, which has aroused some suspicion since their first meeting.
If you saw the movie "The Devil's Advocate (1997)", perhaps you remember the end. Milton proposes to Kevin to take over his company, promising that he will have everything in the world, but with a single price - to sell his soul. But Kevin was hiding virtues that Milton did not believe that he has them.
AttributeError: 'NoneType' object has no attribute 'replace_with'

我想使用上面的 xpath 提取“属性 class=obisnuit”的 p 标签的所有文本，然后使用翻译库进行翻译并希望 return 整个 html 代码在属性 class=obisnuit.

的 p 标签之间带有翻译文本

###注：###

应该有一个循环来在所有这些标签中插入翻译后的文本，我的意思是所有标签在使用循环翻译后都应该得到自己的文本。

我无法解释更多，请任何人指导我。

Answer 1

需要更换吗？不能简单地将 string/contnet 设置为翻译吗？

此外，您在这里做了一些不必要的循环。而且您需要修复缩进，因为您想要的是 for i,z 向上 2 级。

试试这个：

r=requests.get('https://neculaifantanaru.com/en/qualities-of-a-leader-inner-integrity.html')
soup=BeautifulSoup(r.text, 'html.parser')

try:
    articles = soup.find_all('p', {'class':"text_obisnuit"})
    for item in articles:  

        original_text=item.text
        #print(original_text)
        translated_output=ts.google(original_text, from_language='en', to_language='ro')
        print(item)

        item.string = translated_output
            
except Exception as e:
    print(e)

# To see that it was changed
for item in articles:   
    print(item)


translated_html = str(soup)

我想用我自己的代码替换 html 代码

I want to replace the html code with my own

python

tags

lxml

beautifulsoup