需要通过python把HTML字符串转成Text

Need to convert HTML string into Text through python

这就是我的!我想要一个代码,我在其中传递整个字符串并只从中获取文本部分!这不是一个页面,它只是一个字符串,就像 HTML 扩展名为 txt 的页面。请帮我解决所有其他使用需要 URL 的漂亮汤的解决方案,但这不是网页。 任何帮助将不胜感激。

b'<!DOCTYPE HTML>\r\n
<html>
   \r\n
   <head>
      \r\n
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      \r\n
      <title>TalentHire - Simplified Recruiting and Staffing</title>
      \r\n
   </head>
   \r\n        \r\n
   <body leftmargin="0" rightmargin="0" topmargin="0" bottommargin="0">
      \r\n        
      <div style="width:100%; overflow:auto; float:left; margin: auto;">
         \r\n        
         <table cellpadding="0" cellspacing="0" border="0" style="width:100%; min-width:300px;">
            \r\n                        
            <tr>
               \r\n                
               <td style=" border:none;">
                  \r\n                \t
                  <table cellpadding="0" cellspacing="0" style="width:100%; min-width:280px; margin:0 auto; border:none;">
                     \r\n                        
                     <tr>
                        \r\n                            
                        <td style="font-family: calibri,sans-serif !important; font-size:15px !important; color:#333 !important; line-height:22px; border:none;">
                           \r\n                                
                           <div id="EditorSalutationID">
                              \r\n
                              <p>Position:&nbsp; Azure Architect</p>
                              \r\n\r\n
                              <p>Location: San Antonio, Texas</p>
                              \r\n\r\n
                              <p><br />\r\nResponsibilities-</p>
                              \r\n\r\n
                              <p>Customer is implementing a new POS solution and this program is all about&nbsp; doing the integration work for the new POS along with data migration and some new web app development.<br />\r\nAll the integration and web development work will be done using azure PaaS components.<br />\r\nResponsibilities are:<br />\r\n&middot; &nbsp; &nbsp; &nbsp; &nbsp; Provide Inputs to enterprise solution Architecture<br />\r\n&middot; &nbsp; &nbsp; &nbsp; &nbsp;Design secure integration solutions/Architecture<br />\r\n&middot; &nbsp; &nbsp; &nbsp; &nbsp;Implement best practices when using azure components<br />\r\n&middot; &nbsp; &nbsp; &nbsp; &nbsp;Work with 3rd party vendor architects on behalf of Customer to design integration solution<br />\r\n&middot; &nbsp; &nbsp; &nbsp; &nbsp;Provide recommendation to optimize azure cost<br />\r\n&middot; &nbsp; &nbsp; &nbsp; &nbsp;Recommendation and best practices on using various azure resources<br />\r\n&middot; &nbsp; &nbsp; &nbsp; &nbsp;Hands on set up of azure components and design patterns for development teams to follow. Hands on to .Net Technologies</p>
                              \r\n\r\n
                              <p><br />\r\nResponsible for technical solutioning and design the integration Solution in AZURE. Design, develop, and construct detailed Azure architecture. Understand current state gaps and propose secured solutions to ensure roadmap can adapt to changes and integrate with existing environment or propose changes to existing environment. Work with vendors and customers to understand new solutions&rsquo; limitations and capabilities. Work with internal delivery teams to ensure solutions align with roadmap and architecture. Lead a team of engineers and developers to design and build solutions."</p>
                              \r\n\r\n
                              <p>Regards,</p>
                              \r\n\r\n
                              <p>Manish Kumar</p>
                              \r\n\r\n
                              <p><a href="http://http/" onclick="return Webmail.Widgets.Email.Message.evLinkClick(this);" rel="noopener noreferrer" target="_blank" title="This external link will open in a new window">Email-ID:manish.kumar1@idctechnologies.com</a></p>
                              \r\n\r\n
                              <p>Desk NO:315-994-1244</p>
                              \r\n
                           </div>
                           \r\n\r\n
                           <div id="EditorSignatureID">&nbsp;</div>
                           \r\n                             
                        </td>
                        \r\n                        
                     </tr>
                     \r\n                        
                     <tr>
                        \r\n                            
                        <td style="font-family: calibri,sans-serif; font-size:14px; line-height:normal; color:#333; border:none">\r\n                                                           </td>
                        \r\n                        
                     </tr>
                     \r\n                    
                  </table>
                  \r\n                
               </td>
               \r\n            
            </tr>
            \r\n            \r\n                \t
         </table>
         \r\n        
         <p style="border:none; padding-left:10px; font-size:11px; font-family:Arial, Helvetica, sans-serif; color:#6b6c72; text-align:left; line-height:18px;text-transform: uppercase;"> To unsubscribe from future emails or to update your email preferences<a href="http://unsubscribe.idctechnologies.com/users/request_unsubscribe/217a2089eed1fd0f407ea853a29608b1cbaf9bb2/f40908d9c9fddff08cbeeb44f5678cbf48a9a840/YkgrQnRETjZscTQvT0taSDc5dzBFR0p0WXY5dmNQYjJRVDZaWnpac2Exdz0=/" style="color:#0077c5; text-decoration:underline"><b>click here </b></a>.</p>
      </div>
      \r\n<img width="1px" height="1px" alt="" src="http://clicks.mg.idctechnologies.com/o/eJwVzDsOwyAMANDTNCOyifkNLEj0GhXFJkEKRUp6f7XZ3vQ4BiL7xqVHDRrAaIOEZkWFKuVsvHM5pBSMz88HwdhU5_qVun_mMbcul6pzLHu07AkAC3CrWEKzIkTNIgmWlcEtp7RX52jd7XgKU50s_3IbpR_38gNSeihY">
   </body>
   \r\n
</html>
\r\n'
from bs4 import BeautifulSoup
data = """
b'<!DOCTYPE HTML>\r\n
<html>
   \r\n
   <head>
      \r\n
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      \r\n
      <title>TalentHire - Simplified Recruiting and Staffing</title>
      \r\n
   </head>
   \r\n        \r\n
   <body leftmargin="0" rightmargin="0" topmargin="0" bottommargin="0">
      \r\n        
      <div style="width:100%; overflow:auto; float:left; margin: auto;">
         \r\n        
         <table cellpadding="0" cellspacing="0" border="0" style="width:100%; min-width:300px;">
            \r\n                        
            <tr>
               \r\n                
               <td style=" border:none;">
                  \r\n                \t
                  <table cellpadding="0" cellspacing="0" style="width:100%; min-width:280px; margin:0 auto; border:none;">
                     \r\n                        
                     <tr>
                        \r\n                            
                        <td style="font-family: calibri,sans-serif !important; font-size:15px !important; color:#333 !important; line-height:22px; border:none;">
                           \r\n                                
                           <div id="EditorSalutationID">
                              \r\n
                              <p>Position:&nbsp; Azure Architect</p>
                              \r\n\r\n
                              <p>Location: San Antonio, Texas</p>
                              \r\n\r\n
                              <p><br />\r\nResponsibilities-</p>
                              \r\n\r\n
                              <p>Customer is implementing a new POS solution and this program is all about&nbsp; doing the integration work for the new POS along with data migration and some new web app development.<br />\r\nAll the integration and web development work will be done using azure PaaS components.<br />\r\nResponsibilities are:<br />\r\n&middot; &nbsp; &nbsp; &nbsp; &nbsp; Provide Inputs to enterprise solution Architecture<br />\r\n&middot; &nbsp; &nbsp; &nbsp; &nbsp;Design secure integration solutions/Architecture<br />\r\n&middot; &nbsp; &nbsp; &nbsp; &nbsp;Implement best practices when using azure components<br />\r\n&middot; &nbsp; &nbsp; &nbsp; &nbsp;Work with 3rd party vendor architects on behalf of Customer to design integration solution<br />\r\n&middot; &nbsp; &nbsp; &nbsp; &nbsp;Provide recommendation to optimize azure cost<br />\r\n&middot; &nbsp; &nbsp; &nbsp; &nbsp;Recommendation and best practices on using various azure resources<br />\r\n&middot; &nbsp; &nbsp; &nbsp; &nbsp;Hands on set up of azure components and design patterns for development teams to follow. Hands on to .Net Technologies</p>
                              \r\n\r\n
                              <p><br />\r\nResponsible for technical solutioning and design the integration Solution in AZURE. Design, develop, and construct detailed Azure architecture. Understand current state gaps and propose secured solutions to ensure roadmap can adapt to changes and integrate with existing environment or propose changes to existing environment. Work with vendors and customers to understand new solutions&rsquo; limitations and capabilities. Work with internal delivery teams to ensure solutions align with roadmap and architecture. Lead a team of engineers and developers to design and build solutions."</p>
                              \r\n\r\n
                              <p>Regards,</p>
                              \r\n\r\n
                              <p>Manish Kumar</p>
                              \r\n\r\n
                              <p><a href="http://http/" onclick="return Webmail.Widgets.Email.Message.evLinkClick(this);" rel="noopener noreferrer" target="_blank" title="This external link will open in a new window">Email-ID:manish.kumar1@idctechnologies.com</a></p>
                              \r\n\r\n
                              <p>Desk NO:315-994-1244</p>
                              \r\n
                           </div>
                           \r\n\r\n
                           <div id="EditorSignatureID">&nbsp;</div>
                           \r\n                             
                        </td>
                        \r\n                        
                     </tr>
                     \r\n                        
                     <tr>
                        \r\n                            
                        <td style="font-family: calibri,sans-serif; font-size:14px; line-height:normal; color:#333; border:none">\r\n                                                           </td>
                        \r\n                        
                     </tr>
                     \r\n                    
                  </table>
                  \r\n                
               </td>
               \r\n            
            </tr>
            \r\n            \r\n                \t
         </table>
         \r\n        
         <p style="border:none; padding-left:10px; font-size:11px; font-family:Arial, Helvetica, sans-serif; color:#6b6c72; text-align:left; line-height:18px;text-transform: uppercase;"> To unsubscribe from future emails or to update your email preferences<a href="http://unsubscribe.idctechnologies.com/users/request_unsubscribe/217a2089eed1fd0f407ea853a29608b1cbaf9bb2/f40908d9c9fddff08cbeeb44f5678cbf48a9a840/YkgrQnRETjZscTQvT0taSDc5dzBFR0p0WXY5dmNQYjJRVDZaWnpac2Exdz0=/" style="color:#0077c5; text-decoration:underline"><b>click here </b></a>.</p>
      </div>
      \r\n<img width="1px" height="1px" alt="" src="http://clicks.mg.idctechnologies.com/o/eJwVzDsOwyAMANDTNCOyifkNLEj0GhXFJkEKRUp6f7XZ3vQ4BiL7xqVHDRrAaIOEZkWFKuVsvHM5pBSMz88HwdhU5_qVun_mMbcul6pzLHu07AkAC3CrWEKzIkTNIgmWlcEtp7RX52jd7XgKU50s_3IbpR_38gNSeihY">
   </body>
   \r\n
</html>
\r\n'
"""

soup = BeautifulSoup(data, 'html.parser')

print(soup.text)

输出:

b'



TalentHire - Simplified Recruiting and Staffing










Position:  Azure Architect
Location: San Antonio, Texas

Responsibilities-
Customer is implementing a new POS solution and this program is all about  doing the integration work for the new POS along with data migration and some new web app development.
All the integration and web development work will be done using azure PaaS components.
Responsibilities are:
·         Provide Inputs to enterprise solution Architecture
·        Design secure integration solutions/Architecture
·        Implement best practices when using azure components
·        Work with 3rd party vendor architects on behalf of Customer to design integration solution
·        Provide recommendation to optimize azure cost
·        Recommendation and best practices on using various azure resources  
·        Hands on set up of azure components and design patterns for development teams to follow. Hands on to .Net Technologies

Responsible for technical solutioning and design the integration Solution in 
AZURE. Design, develop, and construct detailed Azure architecture. Understand current state gaps and propose secured solutions to ensure roadmap can adapt to changes and integrate with existing environment or propose changes to existing environment. Work with vendors and customers to understand new solutions’ limitations and capabilities. Work with internal delivery teams to ensure 
solutions align with roadmap and architecture. Lead a team of engineers and developers to design and build solutions."
Regards,
Manish Kumar
Email-ID:manish.kumar1@idctechnologies.com
Desk NO:315-994-1244












 To unsubscribe from future emails or to update your email preferencesclick here .





'