如何通过 shell 脚本从 HTML 数据中获取整行?
How to fetch whole row from HTML data through shell script?
我需要从 table 中获取整行,因为我已将 table 数据转换为 HTML 格式(请参阅下面的数据)。现在我需要获取特定行的数据 Ex 假设我想要 19.8.2 的数据然后 whole ROW 应该从 8/19/19(Branch cut)到 8/23 获取/19(产品部署)--
<div class="table-wrap">
<table class="wrapped confluenceTable">
<colgroup>
<col style="width: 123.0px;" />
<col style="width: 80.0px;" />
<col style="width: 138.0px;" />
<col style="width: 138.0px;" />
<col style="width: 139.0px;" />
<col style="width: 126.0px;" />
<col style="width: 788.0px;" />
</colgroup>
<tbody>
<tr>
<td class="highlight-green confluenceTd" data-highlight-colour="green">Milestone</td>
<td class="highlight-green confluenceTd" data-highlight-colour="green">Branch Cut</td>
<td class="highlight-green confluenceTd" data-highlight-colour="green">FIT Cert</td>
<td class="highlight-green confluenceTd" data-highlight-colour="green">IAT Cert</td>
<td class="highlight-green confluenceTd" data-highlight-colour="green">UAT Deploy</td>
<td class="highlight-green confluenceTd" data-highlight-colour="green">Prod Deploy</td>
<td class="highlight-green confluenceTd" colspan="1" data-highlight-colour="green">Key Features</td>
</tr>
<tr>
<td colspan="1" class="confluenceTd">19.7.1</td>
<td colspan="1" class="confluenceTd">7/8/19</td>
<td colspan="1" class="confluenceTd">7/9/19</td>
<td colspan="1" class="confluenceTd">7/10/19</td>
<td colspan="1" class="confluenceTd">7/11/19</td>
<td colspan="1" class="confluenceTd">7/12/19</td>
<td colspan="1" class="confluenceTd">
<p><span style="color: rgb(51,153,102);">MOVE TO 3 WEEK RELEASE CYCLE ON FRIDAYS</span></p>
</td>
</tr>
<tr>
<td colspan="1" class="confluenceTd">19.8.1</td>
<td colspan="1" class="confluenceTd">7/29/19</td>
<td colspan="1" class="confluenceTd">7/30/19</td>
<td colspan="1" class="confluenceTd">7/31/19</td>
<td colspan="1" class="confluenceTd">8/1/19</td>
<td colspan="1" class="confluenceTd">8/2/19</td>
<td colspan="1" class="confluenceTd">
<br/>
</td>
</tr>
<tr>
<td colspan="1" class="confluenceTd">19.8.2</td>
<td colspan="1" class="confluenceTd">8/19/19</td>
<td colspan="1" class="confluenceTd">8/20/19</td>
<td colspan="1" class="confluenceTd">8/21/19</td>
<td colspan="1" class="confluenceTd">8/22/19</td>
<td colspan="1" class="confluenceTd">8/23/19</td>
<td colspan="1" class="confluenceTd">
<br/>
</td>
</tr>
<tr>
<td colspan="1" class="confluenceTd">19.9.1</td>
<td colspan="1" class="confluenceTd">9/9/19</td>
<td colspan="1" class="confluenceTd">9/10/19</td>
<td colspan="1" class="confluenceTd">9/11/19</td>
<td colspan="1" class="confluenceTd">9/12/19</td>
<td colspan="1" class="confluenceTd">9/13/19</td>
<td colspan="1" class="confluenceTd">
<br/>
</td>
</tr>
<tr>
<td class="confluenceTd">19.10.1</td>
<td class="confluenceTd">9/30/19</td>
<td class="confluenceTd">10/1/19</td>
<td class="confluenceTd">10/2/19</td>
<td class="confluenceTd">10/3/19</td>
<td class="confluenceTd">10/4/19</td>
<td class="confluenceTd"><span>Q1 - Feature Release</span></td>
</tr>
<tr>
<td class="confluenceTd">19.10.2</td>
<td class="confluenceTd">
<br/>
</td>
<td class="confluenceTd">
<br/>
</td>
<td class="confluenceTd">
<br/>
</td>
<td class="confluenceTd">
<br/>
</td>
<td class="confluenceTd">
<br/>
</td>
<td class="confluenceTd">
<p>This deployment was canceled due to a dev group directed deployment freeze.</p>
<p>Original Dates: Branch: 10/21; FIT: 10/22; IAT: 10/23; UAT: 10/24; Prod: 10/25</p>
</td>
</tr>
<tr>
<td class="confluenceTd">19.11.1</td>
<td class="confluenceTd">11/11/19</td>
<td class="confluenceTd">11/12/19</td>
<td class="confluenceTd">11/13/19</td>
<td class="confluenceTd">11/14/19</td>
<td class="confluenceTd">11/15/19</td>
<td class="confluenceTd">
<br/>
</td>
</tr>
<tr>
<td class="confluenceTd">19.12.1</td>
<td class="confluenceTd">12/2/19</td>
<td class="confluenceTd">12/3/19</td>
<td class="confluenceTd">12/4/19</td>
<td class="confluenceTd">12/5/19</td>
<td class="confluenceTd">12/6/19</td>
<td class="confluenceTd">
<br/>
</td>
</tr>
<tr>
<td colspan="1" class="confluenceTd">20.01.1</td>
<td colspan="1" class="confluenceTd">1/20/20</td>
<td colspan="1" class="confluenceTd">1/21/20</td>
<td colspan="1" class="confluenceTd">1/22/20</td>
<td colspan="1" class="confluenceTd">1/23/20</td>
<td colspan="1" class="confluenceTd">1/24/20</td>
<td colspan="1" class="confluenceTd">Q2 - Feature Release</td>
</tr>
</tbody>
</table>
</div>
<p>** FIT certification delayed 1 day due to Holiday</p>
我尝试使用以下脚本,但得到的输出只是分支切割列日期 - 请帮忙。
#!/bin/bash
curl -user -pass --noproxy '*' 'https://confluence.es.com/display/Release+Calendar' | awk ' /<div id="main-content" class="wiki-content">/ {flag=1;next} / <\/div>/{flag=0} flag { print }' > page.tmp
xmllint --html -xpath'//table/tbody/tr/td[2]' page.tmp | egrep -o '[0-9]{1,2}/[0-9]{1,2}/[0-9]{2,4}' > branchdates.tmp
您必须获得必需的 td
,然后是该 td
的父级 tr
。您可以使用以下 xpath
获得要求td
//table//td[contains(text(), '19.8.2')]
获取所需 td
的父级 tr
//table//td[contains(text(), '19.8.2')]/parent::tr
我需要从 table 中获取整行,因为我已将 table 数据转换为 HTML 格式(请参阅下面的数据)。现在我需要获取特定行的数据 Ex 假设我想要 19.8.2 的数据然后 whole ROW 应该从 8/19/19(Branch cut)到 8/23 获取/19(产品部署)--
<div class="table-wrap">
<table class="wrapped confluenceTable">
<colgroup>
<col style="width: 123.0px;" />
<col style="width: 80.0px;" />
<col style="width: 138.0px;" />
<col style="width: 138.0px;" />
<col style="width: 139.0px;" />
<col style="width: 126.0px;" />
<col style="width: 788.0px;" />
</colgroup>
<tbody>
<tr>
<td class="highlight-green confluenceTd" data-highlight-colour="green">Milestone</td>
<td class="highlight-green confluenceTd" data-highlight-colour="green">Branch Cut</td>
<td class="highlight-green confluenceTd" data-highlight-colour="green">FIT Cert</td>
<td class="highlight-green confluenceTd" data-highlight-colour="green">IAT Cert</td>
<td class="highlight-green confluenceTd" data-highlight-colour="green">UAT Deploy</td>
<td class="highlight-green confluenceTd" data-highlight-colour="green">Prod Deploy</td>
<td class="highlight-green confluenceTd" colspan="1" data-highlight-colour="green">Key Features</td>
</tr>
<tr>
<td colspan="1" class="confluenceTd">19.7.1</td>
<td colspan="1" class="confluenceTd">7/8/19</td>
<td colspan="1" class="confluenceTd">7/9/19</td>
<td colspan="1" class="confluenceTd">7/10/19</td>
<td colspan="1" class="confluenceTd">7/11/19</td>
<td colspan="1" class="confluenceTd">7/12/19</td>
<td colspan="1" class="confluenceTd">
<p><span style="color: rgb(51,153,102);">MOVE TO 3 WEEK RELEASE CYCLE ON FRIDAYS</span></p>
</td>
</tr>
<tr>
<td colspan="1" class="confluenceTd">19.8.1</td>
<td colspan="1" class="confluenceTd">7/29/19</td>
<td colspan="1" class="confluenceTd">7/30/19</td>
<td colspan="1" class="confluenceTd">7/31/19</td>
<td colspan="1" class="confluenceTd">8/1/19</td>
<td colspan="1" class="confluenceTd">8/2/19</td>
<td colspan="1" class="confluenceTd">
<br/>
</td>
</tr>
<tr>
<td colspan="1" class="confluenceTd">19.8.2</td>
<td colspan="1" class="confluenceTd">8/19/19</td>
<td colspan="1" class="confluenceTd">8/20/19</td>
<td colspan="1" class="confluenceTd">8/21/19</td>
<td colspan="1" class="confluenceTd">8/22/19</td>
<td colspan="1" class="confluenceTd">8/23/19</td>
<td colspan="1" class="confluenceTd">
<br/>
</td>
</tr>
<tr>
<td colspan="1" class="confluenceTd">19.9.1</td>
<td colspan="1" class="confluenceTd">9/9/19</td>
<td colspan="1" class="confluenceTd">9/10/19</td>
<td colspan="1" class="confluenceTd">9/11/19</td>
<td colspan="1" class="confluenceTd">9/12/19</td>
<td colspan="1" class="confluenceTd">9/13/19</td>
<td colspan="1" class="confluenceTd">
<br/>
</td>
</tr>
<tr>
<td class="confluenceTd">19.10.1</td>
<td class="confluenceTd">9/30/19</td>
<td class="confluenceTd">10/1/19</td>
<td class="confluenceTd">10/2/19</td>
<td class="confluenceTd">10/3/19</td>
<td class="confluenceTd">10/4/19</td>
<td class="confluenceTd"><span>Q1 - Feature Release</span></td>
</tr>
<tr>
<td class="confluenceTd">19.10.2</td>
<td class="confluenceTd">
<br/>
</td>
<td class="confluenceTd">
<br/>
</td>
<td class="confluenceTd">
<br/>
</td>
<td class="confluenceTd">
<br/>
</td>
<td class="confluenceTd">
<br/>
</td>
<td class="confluenceTd">
<p>This deployment was canceled due to a dev group directed deployment freeze.</p>
<p>Original Dates: Branch: 10/21; FIT: 10/22; IAT: 10/23; UAT: 10/24; Prod: 10/25</p>
</td>
</tr>
<tr>
<td class="confluenceTd">19.11.1</td>
<td class="confluenceTd">11/11/19</td>
<td class="confluenceTd">11/12/19</td>
<td class="confluenceTd">11/13/19</td>
<td class="confluenceTd">11/14/19</td>
<td class="confluenceTd">11/15/19</td>
<td class="confluenceTd">
<br/>
</td>
</tr>
<tr>
<td class="confluenceTd">19.12.1</td>
<td class="confluenceTd">12/2/19</td>
<td class="confluenceTd">12/3/19</td>
<td class="confluenceTd">12/4/19</td>
<td class="confluenceTd">12/5/19</td>
<td class="confluenceTd">12/6/19</td>
<td class="confluenceTd">
<br/>
</td>
</tr>
<tr>
<td colspan="1" class="confluenceTd">20.01.1</td>
<td colspan="1" class="confluenceTd">1/20/20</td>
<td colspan="1" class="confluenceTd">1/21/20</td>
<td colspan="1" class="confluenceTd">1/22/20</td>
<td colspan="1" class="confluenceTd">1/23/20</td>
<td colspan="1" class="confluenceTd">1/24/20</td>
<td colspan="1" class="confluenceTd">Q2 - Feature Release</td>
</tr>
</tbody>
</table>
</div>
<p>** FIT certification delayed 1 day due to Holiday</p>
我尝试使用以下脚本,但得到的输出只是分支切割列日期 - 请帮忙。
#!/bin/bash
curl -user -pass --noproxy '*' 'https://confluence.es.com/display/Release+Calendar' | awk ' /<div id="main-content" class="wiki-content">/ {flag=1;next} / <\/div>/{flag=0} flag { print }' > page.tmp
xmllint --html -xpath'//table/tbody/tr/td[2]' page.tmp | egrep -o '[0-9]{1,2}/[0-9]{1,2}/[0-9]{2,4}' > branchdates.tmp
您必须获得必需的 td
,然后是该 td
的父级 tr
。您可以使用以下 xpath
获得要求td
//table//td[contains(text(), '19.8.2')]
获取所需 td
tr
//table//td[contains(text(), '19.8.2')]/parent::tr