如何使用 RSelenium 提取文本

How to extract text using RSelenium

我有以下 HTML:

 <h3><a href='jobdetail.php?job=705945'>Job Details</a>: lrp1_vs_Hx1sh2</h3><h4>

使用这段代码,我尝试提取元素值 lrp1_vs_Hx1sh2

library(RSelenium)
webpage <- "https://cluspro.bu.edu/models.php?job=705945"
browser <- remoteDriver(port = 5556)
browser$open()
browser$navigate(webpage)

clk <- browser$findElement(using = "link text", "Use the server without the benefits of your own account")
clk$clickElement()

jobs <- browser$findElement(using = 'link text', "Job Details")
jobs$getElementText()

但它给了我 "Job Details"。 我怎样才能正确地做到这一点?


更新

这是完整的 HTML:

        <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
            <html xmlns="http://www.w3.org/1999/xhtml">
        <head>
        <title>ClusPro 2.0: protein-protein docking</title>
            <meta http-equiv="content-type" content="text/html; charset=utf-8" />
      <link rel='stylesheet' type='text/css' href='/css/style.css' />
      <link rel='stylesheet' type='text/css' href='/css/loginform.css' />
      <link rel='stylesheet' type='text/css' href='/css/signupform.css' />
      <link rel='stylesheet' type='text/css' href='/css/contactform.css' />
      <link rel='stylesheet' type='text/css' href='/css/jobsform.css' />
      <link rel='stylesheet' type='text/css' href='/css/goodform.css' />
      <link rel="stylesheet" type="text/css" href="//cdnjs.cloudflare.com/ajax/libs/yui/2.9.0/grids/grids-min.css" />
      <style type="text/css">#tabResults { font-weight:bold; }</style>      <link rel="shortcut icon" href="/favicon.png" type="image/png" />
      <script type="text/javascript" src="/js/jquery-3.5.1.min.js"></script>
      <script type="text/javascript" src="/js/jquery.equalheights.js"></script>
      
                     <script type="text/javascript">
                  
   var models = {
      reinit: function(){
         var showmodels = $('#showmodels').prop('value');
         $('td:gt('+showmodels+')').hide();
         $('td:eq('+showmodels+')').hide();
         $('td:lt('+showmodels+')').show();
         $('#modelslink').prop('href', 'zipmodels.php?job=705945&coeffi=0&nmodels='+showmodels);
      }
   }
$(document).ready(function(){
   models.reinit();
   $('#showmodels').change(models.reinit);
})
               </script>
    </head>

    <body>
      <div id="doc">
        <div id="hd">
          <ul id='tabs-menu'>
            <li><a id='tabContact' href='/contact.php'>Contact</a></li>
            <li><a id='tabHelp' href='/help.php'>Help</a></li>
            <li><a id='tabPapers' href='/publications.php'>Papers</a></li>
                        <li><a id='tabResults' href='/results.php'>Results</a></li>
            <li><a id='tabQueue' href='/queue.php'>Queue</a></li>
            <li><a id='tabDimer' href='/dimer_predict/submit.php'>Dimer Classification</a></li>
            <li><a id='tabPeptide' href='/peptide/index.php'>Peptide Docking</a></li>
            <li><a id='tabDock' href='/home.php'>Dock</a></li>
          </ul>
          <img src='/image/ClusPro1.png' width='750' height='160' alt=''/>
        </div>
    <div id="bd">
        
          
        <div id='main-header-right'>
          <a href='/logout.php'>sign out</a>
        </div>
       <h3><a href='jobdetail.php?job=705945'>Job Details</a>: lrp1_vs_Hx1sh2</h3><h4><a href='scores.php?job=705945&coeffi=0'>View Model Scores</a></h4><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=0&amp;filetype=model_bz2'>Download all Models for all Coefficients</a><div style="padding-top:1em;">Balanced | <a href='models.php?job=705945&amp;coeffi=2'>Electrostatic-favored</a> | <a href='models.php?job=705945&amp;coeffi=4'>Hydrophobic-favored</a> | <a href='models.php?job=705945&amp;coeffi=6'>VdW+Elec</a></div><br /><div>Display Models: <form style='display:inline;'><select id='showmodels'><option value='10'>10</option><option value='15'>15</option><option value='20'>20</option><option value='23'>23</option></select></form></div><br /><a id='modelslink' href=''>Download Displayed Models</a><br /><br /><strong>If you use these models in a paper, please cite our <a href='publications.php'>papers</a></strong><br /><br /><table class='nice' id='models'><tr><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=0&amp;filetype=model_file'>0</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=0&amp;filetype=model_img' alt='0' /></td><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=1&amp;filetype=model_file'>1</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=1&amp;filetype=model_img' alt='1' /></td></tr><tr><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=2&amp;filetype=model_file'>2</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=2&amp;filetype=model_img' alt='2' /></td><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=3&amp;filetype=model_file'>3</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=3&amp;filetype=model_img' alt='3' /></td></tr><tr><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=4&amp;filetype=model_file'>4</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=4&amp;filetype=model_img' alt='4' /></td><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=5&amp;filetype=model_file'>5</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=5&amp;filetype=model_img' alt='5' /></td></tr><tr><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=6&amp;filetype=model_file'>6</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=6&amp;filetype=model_img' alt='6' /></td><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=7&amp;filetype=model_file'>7</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=7&amp;filetype=model_img' alt='7' /></td></tr><tr><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=8&amp;filetype=model_file'>8</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=8&amp;filetype=model_img' alt='8' /></td><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=9&amp;filetype=model_file'>9</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=9&amp;filetype=model_img' alt='9' /></td></tr><tr><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=10&amp;filetype=model_file'>10</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=10&amp;filetype=model_img' alt='10' /></td><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=11&amp;filetype=model_file'>11</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=11&amp;filetype=model_img' alt='11' /></td></tr><tr><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=12&amp;filetype=model_file'>12</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=12&amp;filetype=model_img' alt='12' /></td><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=13&amp;filetype=model_file'>13</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=13&amp;filetype=model_img' alt='13' /></td></tr><tr><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=14&amp;filetype=model_file'>14</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=14&amp;filetype=model_img' alt='14' /></td><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=15&amp;filetype=model_file'>15</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=15&amp;filetype=model_img' alt='15' /></td></tr><tr><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=16&amp;filetype=model_file'>16</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=16&amp;filetype=model_img' alt='16' /></td><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=17&amp;filetype=model_file'>17</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=17&amp;filetype=model_img' alt='17' /></td></tr><tr><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=18&amp;filetype=model_file'>18</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=18&amp;filetype=model_img' alt='18' /></td><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=19&amp;filetype=model_file'>19</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=19&amp;filetype=model_img' alt='19' /></td></tr><tr><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=20&amp;filetype=model_file'>20</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=20&amp;filetype=model_img' alt='20' /></td><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=21&amp;filetype=model_file'>21</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=21&amp;filetype=model_img' alt='21' /></td></tr><tr><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=22&amp;filetype=model_file'>22</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=22&amp;filetype=model_img' alt='22' /></td><td>23<br /><br />(image not found)</td></tr></table>        </div>
        <div id="ft">
          ClusPro should only be used for noncommercial purposes.
          <br/>
          <a href='https://www.vajdalab.org' target='_blank'>Vajda Lab</a> and <a href='http://abcgroup.cluspro.org'>ABC Group</a>
          <br/>
          <a href='https://www.bu.edu/'>Boston University</a> and <a href='http://www.stonybrook.edu'>Stony Brook University</a>
        </div>
      </div>

    </body>
  </html>

通过 link 文本“工作详细信息”定位元素可为您提供 a 元素,而您需要获取 h3 元素文本。
试试这个:

jobs <- browser$findElement(using = 'xpath', "//h3[contains(@href,'jobdetail.php?job=705945')]")
jobs$getElementText()

或者,如果工作 ID 可以更改,试试这个:

jobs <- browser$findElement(using = 'xpath', "//h3[contains(@href,'jobdetail.php')]")
jobs$getElementText()

你可以使用这个 xpath

//div[@id='main-header-right']//following-sibling::h3

应该在 h3 tag 中获取 all the text

获得 text 后,尝试根据 space 拆分文本。然后就可以从索引中选择合适的splitting text-based了。