itext html to pdf 内容超出文档

itext html to pdf content gets out of document

我正在尝试在没有任何 css 的情况下转换这段 html:

<!-- saved from url=(1129)https://00f74ba44bf27c26fa604fec19ae391f1d94b6b867-apidata.googleusercontent.com/download/storage/v1/b/backoffice-pao-export/o/document.html?jk=AFshE3XhuRHA7mtfWHAXotti5kjbdIdwxYMBJwIALdaUHwAd5SAytVpKLo_GL_3G_C4shq09Xmhlh2M5uo4BlheALWF58v-9mdqU7EYAR03iEraa1dZZNG0eu3waNSsxkMoxAHr-_GqZXDUHVNvMrLZnTiO7uYcZzQ2OuWvLl3xnX2ppzF0fZ3Bi1b7Rka7nhlNGmrjYDbWWBbrWRiiMnBNd_QZAK_T0t5XobSXCwlJ90IczJLMgjlDYXdq6UJzlsJQLEBI4MA5Ca1s0x-yhygik9sYOv1yawtyPAmvUfwVThET3b6HEA_tnVShpSes8rLZzAJemRtJ7HAJ0NhasQxwsIwOtmriFl8jhQCbFT7nxlwmnfhnSwTSqCxL9JiBdCTHOEqmHVCfsGAC3j3eiJdFFTncsgwhu2MN9_4DSibiuyc_UjHPPcOHOmbSLQxZFtnY4lL-OMIM4G-iDm5gb2k7_K0icO_-eTpSySqhKsFJroGg9KtzU-Rp8mUjeCeY_oGNWE8u1ndsZnP635pJ3hSzsFhEKK85X-L0BpCKTOH3WEATg7c4cEl-VaIyrEbz5ap4GoKCMo9oV2egcfoM2c2N91ZN5IpuXpAlwBoRf0O0zECZfBHQaVOX5RbNYu1cdB69jWVl52ZHl1q2dkx8pILl7dThSan5GHK3cfnP_0fucOiPLLKTH0KXZdY7y1eH666WyUdIsv4SrXvLHzhASeQp7XV_WjtEbVriylge0iOVdbngznKzVxGOJ5xQCnyr3oFZl_GfDnVxMokx-dBNefPAYCWNu3NrNkvJ1emR1KBlTJjX7OIrmQPjSDX5lx8fejzIB3cstLXeTHFVU-ITkQ4ZadevjoV_mMz3SKUU_chyzQVybYdHt498-1gVLmtlb2Qww3bKMPsOK9i3_h2MxvHiV9Sow6mYzZHV9Q-riCbBEDoRbNo0iyHgjbOjs-UHwQPN0U1bvOvU2RxcS7A&isca=1 -->
<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
</head>
<body>
<div id="184981f8-654a-4e90-a0f5-e75d1edaf2ca" class="act">
    <div id="b995877a-0d3c-439f-984e-f9f809d124a5" class="footnotes">
        <table>
            <tbody>
            <tr>
                <td id="f29aca16-143d-1fc6-8f6a-d2aa116cde25">1</td>
                <td>Ezechiel HAVRENNE is a lecturer at the University of Luxembourg on Investment Funds. Views expressed
                    in this article reflect some of the author’s experience to date on the subject matter. As the
                    Luxembourg investment fund market continues to develop these views may – and will most likely
                    continue to – evolve in one way or another. This article should in no way be construed as legal,
                    business or structuring advice rendered by the author or any other entity, nor should it be
                    construed as reflecting the views of such entity(ies)
                </td>
            </tr>
            <tr>
                <td id="434b1865-a5ea-1f96-b0fa-09ea9e4fb76a">2</td>
                <td>The Preqin Quarterly Update: Private Debt, Q3 2020, 7 October 2020, page 12; <a
                        id="0e11d32d-c25b-65c1-8266-39da10bb62f3"
                        href="https://www.preqin.com/insights/research/quarterly-updates/preqin-quarterly-update-private-debt-q3-2020"
                        target="_blank" class="tech_external" rel="noopener">https://www.preqin.com/insights/research/quarterly-updates/preqin-quarterly-update-private-debt-q3-2020</a>
                    (accessed 15 March 2021). These figures drastically contrast with those reported by Lipper as of
                    October 2016, whereby “<em>the gross AuM of all funds that invest primarily in loan participations
                        was approximately USD 218 billon</em>� as mentioned in IOSCO’s final report; IOSCO
                    FR03/2017, ib., page 4
                </td>
            </tr>
            <tr>
                <td id="6bf035e5-d434-1eec-a550-58147bed84a0">3</td>
                <td>According to EU recommendation 2003/361, 2 factors determine whether a business is an SME: (i) the
                    number of employees and (ii) either turnover or balance sheet total. A medium-sized company has up
                    to 250 employees, a turnover of up to €50 million or a balance sheet total of up to €43 million.
                    A small-sized company has up to 50 employees &amp; a turnover or balance sheet total of up to €10
                    million. A micro-company has up to 10 employees &amp; a turnover or balance sheet total of up to
                    €2 million
                </td>
            </tr>
            <tr>
                <td id="5028557e-4efe-1066-9fd4-28809a6d0653">4</td>
                <td>For instance, one of the driving forces that has led European jurisdictions to consider permitting
                    funds to originate loans was the adoption of the EU regulation on European long-term investment
                    funds allowing funds the origination of loans under certain conditions. As a result, many
                    jurisdictions in Europe now allow loan originations by funds
                </td>
            </tr>
            <tr>
                <td id="cd0ac4df-9139-1c0a-9dd0-c15cca78845a">5</td>
                <td>See IOSCO’s final report FR03/2017, <em>Findings of the Survey on Loan Funds</em>, February 2017,
                    page 4 <a id="76d9ff09-04f9-61a4-a311-2cfee0e19245"
                              href="https://www.iosco.org/library/pubdocs/pdf/IOSCOPD555.pdf" target="_blank"
                              class="tech_external" rel="noopener">https://www.iosco.org/library/pubdocs/pdf/IOSCOPD555.pdf</a>
                    (accessed 13 April 2021)
                </td>
            </tr>
            <tr>
                <td id="a0dd548b-cfa4-182c-9472-624a6be46538">6</td>
                <td>See the Glossary of Summaries published on EUR-Lex, <a id="3052c250-b9c1-60f7-b36c-45ab06665101"
                                                                           href="https://eur-lex.europa.eu/summary/glossary/sme.html"
                                                                           target="_blank" class="tech_external"
                                                                           rel="noopener">https://eur-lex.europa.eu/summary/glossary/sme.html</a>
                    (accessed 13 April 2021) as well as the European Commission’s page titled “<em>Access to finance
                        for SMEs</em>�,<a id="b8b721ff-fd48-67aa-aaac-e5b1d0d02b60"
                                            href="https://ec.europa.eu/growth/access-to-finance_en" target="_blank"
                                            class="tech_external" rel="noopener">
                        https://ec.europa.eu/growth/access-to-finance_en</a> (accessed 13 April 2021)
                </td>
            </tr>
            <tr>
                <td id="d98d8f00-f797-1b37-9540-36713cfdc8a7">7</td>
                <td><em>Ib.</em></td>
            </tr>
            <tr>
                <td id="3868e384-a464-1b26-933a-8ec3a95f86d5">8</td>
                <td>For more information see <a id="dc357707-f043-68ce-a7bc-c9a5d9d86c7d"
                                                href="https://ec.europa.eu/growth/smes/cosme_en" target="_blank"
                                                class="tech_external" rel="noopener">https://ec.europa.eu/growth/smes/cosme_en</a>
                    (accessed 13 April 2021)
                </td>
            </tr>
            <tr>
                <td id="6766e322-fdf8-16b8-99e4-006e43fdecbd">9</td>
                <td>See the European Commission’s page titled “COSME Financial Instruments�, <a
                        id="62cbd917-994d-6388-b0db-786a5c792685"
                        href="https://ec.europa.eu/growth/access-to-finance/cosme-financial-instruments_en"
                        target="_blank" class="tech_external" rel="noopener">https://ec.europa.eu/growth/access-to-finance/cosme-financial-instruments_en</a>
                    (accessed 13 April 2021)
                </td>
            </tr>
            <tr>
                <td id="11773190-b10f-1399-b71f-3a5fcfa5a5fc">10</td>
                <td>Even if the eligibility for participation in the COSME LGF programme was extended to Loan
                    Origination funds it does not appear from the EIF’s register published as at 31 January 2021 that
                    any would have made the list. See<a id="cf5536ce-bff2-6220-9ed7-e4011b938b0e"
                                                        href="https://www.eif.org/what_we_do/guarantees/single_eu_debt_instrument/cosme-loan-facility-growth/cosme_lgf_signatures.pdf"
                                                        target="_blank" class="tech_external" rel="noopener">
                        https://www.eif.org/what_we_do/guarantees/single_eu_debt_instrument/cosme-loan-facility-growth/cosme_lgf_signatures.pdf</a>
                    (accessed 13 April 2021)
                </td>
            </tr>
            <tr>
                <td id="12b455e1-ceff-10b6-ba3d-df5b441fe989">11</td>
                <td>Those associated countries include Iceland, Montenegro, Turkey, the Republic of North Macedonia,
                    Albania, Serbia, Bosnia and Herzegovina, and Kosovo
                </td>
            </tr>
            <tr>
                <td id="d8103a16-44fa-1096-8295-d478456b0117">12</td>
                <td>Connor Hussey, Luxembourg private debt industry grows 36% from 2019, Private Funds CFO, 3 December
                    2020, <a id="0facc75b-6776-606c-b47d-e2025d559bf2"
                             href="https://www.privatefundscfo.com/luxembourg-private-debt-industry-grows-36-2-from-2019"
                             target="_blank" class="tech_external" rel="noopener">https://www.privatefundscfo.com/luxembourg-private-debt-industry-grows-36-2-from-2019</a>/
                    (accessed 13 April 2021). These figures should be in line with the then reality based on the 2017
                    final report of IOSCO whereby it stated that “<em>in Luxembourg, the net AuM of all domestic Loan
                        Funds (i.e., Funds with their primary activity engaged in lending and across various loan
                        activities, encompassing also activities such as microfinance, real estate debt or
                        infrastructure financing) is EUR 37.3 bn, constituting 1% of all domestic Funds</em>�, IOSCO
                    FR03/2017, ib., page 9
                </td>
            </tr>
            <tr>
                <td id="228c3276-de18-1393-9860-66ff5272b741">13</td>
                <td>KPMG – ALFI Private Debt Fund Survey 2020, pages 4 and 5, <br><a
                        id="6d4a0dff-557a-603a-8b28-c47bd843b6b4"
                        href="https://assets.kpmg/content/dam/kpmg/lu/pdf/private-debt-fund-survey-2020.pdf"
                        target="_blank" class="tech_external" rel="noopener">https://assets.kpmg/content/dam/kpmg/lu/pdf/private-debt-fund-survey-2020.pdf?</a>utm_source=Sailthru&amp;utm_medium=email&amp;utm_campaign=Loan%20Note%203%20December%202020&amp;utm_term=PDI_LONENOTE_SUBSCRIBER<br>
                </td>
            </tr>
            </tbody>
        </table>
    </div>
</div>
</body>
</html>

但是每次当我 运行 HtmlConverter.convertToPdf() 将 html 内容作为字符串时,内容都会被裁剪,结果如下:

但是当我 remove last tr element 时,我得到了预期的结果:

您认为造成这种情况的原因是什么?是因为 table 元素有太多子元素吗?

--- 问题更新----

所以在阅读@CptCave 的评论后,我尝试使用分词 css 属性 将 html 更改为这种格式,这在这种情况下应该有效:

<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
    <style>
        .word-break{
            word-break: break-all;
        }
    </style>
</head>
<body>
<div id="b995877a-0d3c-439f-984e-f9f809d124a5" class="footnotes">
    <table class="word-break">
        <tbody>
        <tr>
            <td id="7673aebd-bc37-198d-932f-987fb16fb503">94</td>
            <td>See ESMA Consultation Paper Guidelines on transaction reporting, reference data, order record
                keeping &amp; clock synchronisation, 23 December 2015, ESMA/2015/1909, p. 49; <a
                        id="5326eab7-02a4-69ec-9069-2d0c8eb5f180"
                        href="https://www.esma.europa.eu/sites/default/files/library/2015-1909_guidelines_on_transaction_reporting_reference_data_order_record_keeping_and_clock_synchronisation.pdf"
                        target="_blank" class="tech_external" rel="noopener">https://www.esma.europa.eu/sites/default/files/library/2015-1909_guidelines_on_transaction_reporting_reference_data_order_record_keeping_and_clock_synchronisation.pdf</a>
                (accessed on 13 April 2021)
            </td>
        </tr>
        </tbody>
    </table>
</div>
</body>
</html>

然而我得到的结果是:

解决方案是添加内联 css

*<table style="word-wrap: break-word"/>*

所以为了完成,我在转换之前用 jsoup 更改了文档结构:

Document document = Jsoup.parse(html);
document.getElementsByTag("table").forEach(table -> {
   table.attr("style", "word-wrap: break-word");
});

据我所知,您的问题是由于缺少自动换行引起的。您的最后 table 行有一个不间断的长字符串:带有 UTM 标签的 link。如果您从中删除 utm-tags,裁剪将不会持续。

            <tr>
                <td id="228c3276-de18-1393-9860-66ff5272b741">13</td>
                <td>KPMG – ALFI Private Debt Fund Survey 2020, pages 4 and 5, <br><a
                        id="6d4a0dff-557a-603a-8b28-c47bd843b6b4"
                        href="https://assets.kpmg/content/dam/kpmg/lu/pdf/private-debt-fund-survey-2020.pdf"
                        target="_blank" class="tech_external" rel="noopener">https://assets.kpmg/content/dam/kpmg/lu/pdf/private-debt-fund-survey-2020.pdf</a><br>
                </td>
            </tr>

更持久的解决方案是使用 CSS 将参数 overflow-wrap 设置为 break-word 来实现自动换行。

iText 知识库中有一个完整的示例:https://kb.itextpdf.com/home/it7kb/examples/pdfhtml-support-for-overflow-wrap-word-break-css-properties