在 HTML 响应中删除行之间的空格，在 Python 中

Question

我正在编写一个脚本来帮助更新我网站上托管的一个小型博客，但由于某种原因，当我请求页面的 HTML 时，我可以将其写入内存并进行修改它，它似乎在分隔线：

预计：

<html>
    <head>
        <!-- <link rel="icon" href="/sort/this/later.jpg" type="image/x-icon" />-->
        <title>foo</title>
        <meta name="description" content="bar" />

我的脚本收到什么：

<html>

    <head>

        <!-- <link rel="icon" href="/sort/this/later.jpg" type="image/x-icon" />-->

        <title>foo</title>

        <meta name="description" content="bar" />

我试过从响应中删除 \n 和 \r 个字符，但这似乎没有任何改变。

编辑：抱歉，我忘了 post 实际的脚本本身。给你：

import neocities
import requests
import re
nc = neocities.NeoCities(api_key='[no]')

response = nc.info()
print(response)

htmlresponse = requests.get('https://thesite.com/index.html')

oldBlog = open('newindex.html', 'w')
oldBlog.write(str(htmlresponse.text).strip('\n').strip('\r'))
oldBlog.close()

with open('newindex.html', 'r') as blog:
    contents = blog.readlines()

contents.insert(39,'        <p class="header">test lol</p>\n'
                   '        <p class="logpost">foobar</p>\n')

with open('newindex.html', 'w') as blog:
    contents = "".join(contents)
    blog.write(contents)

我知道我用来去除字符的方法非常糟糕，但我只是想看看它是否有效。如果它最终能正常工作，我会让它更干净。

Answer 1

改变

oldBlog.write(str(htmlresponse.text).strip('\n').strip('\r'))

至

oldBlog.write(str(htmlresponse.text).replace('\n', ''))

Answer 2

假设您的 html 在 python 字符串中（在您的代码中 html_string 是 str(htmlresponse.text)）：

html_string = '''<html>

    <head>

        <!-- <link rel="icon" href="/sort/this/later.jpg" type="image/x-icon" />-->

        <title>foo</title>

        <meta name="description" content="bar" />
'''

用换行符拆分 html_string.split('\n') 将输出：

['<html>',
 '',
 '    <head>',
 '',
 '        <!-- <link rel="icon" href="/sort/this/later.jpg" type="image/x-icon" />-->',
 '',
 '        <title>foo</title>',
 '',
 '        <meta name="description" content="bar" />',
 '']

如果字符串的长度为 > 0

，此代码将提取列表中的每个字符串并保留它

list1 = [line for line in html_string.split('\n') if len(line) > 0]

或更紧凑：

list1 = [line for line in html_string.split('\n') if line]

这会给你：

['<html>',
 '    <head>',
 '        <!-- <link rel="icon" href="/sort/this/later.jpg" type="image/x-icon" />-->',
 '        <title>foo</title>',
 '        <meta name="description" content="bar" />']

但是list1是一个列表。要将其转换回字符串，您需要：

new_html_string = '\n'.join(list1)

打印 new_html_string 会得到：

<html>
    <head>
        <!-- <link rel="icon" href="/sort/this/later.jpg" type="image/x-icon" />-->
        <title>foo</title>
        <meta name="description" content="bar" />

总结一下：

html_string = '''<html>

    <head>

        <!-- <link rel="icon" href="/sort/this/later.jpg" type="image/x-icon" />-->

        <title>foo</title>

        <meta name="description" content="bar" />
'''
list1 = [line for line in html_string.split('\n') if line]
new_html_string = '\n'.join(list1)

在 HTML 响应中删除行之间的空格，在 Python 中

Removing spaces between lines in HTML response, in Python

html

python

httpresponse