在 bash 脚本中替换 html 标记数据

Replace html tag data in bash scripting

我想用相同的颜色突出显示 html 文件中的整行,并为相同的日期应用相同的颜色。日期是 html table 中的第一列。我试过写类似下面的东西,但它不起作用。我也不确定当记录有不同的日期时如何切换颜色 代码

tdDate="2020-08-24 2020-08-25 2020-08-26 2020-08-27"
tdSet=0
endTrTag="</tr>"
colors="grey blue"
for x in $tdDate
do
awk '{if (([=12=] ~ /$x/) & ($tdSet -eq 0)) {
sed -i 's@<td@<td bgcolor="grey"@g' 
$tdSet=1
}
elsif (([=12=] ~ /$endTrTag/) & ($tdSer -eq 1) {
$tdSet=0}
else {
sed -i 's@<td@<td bgcolor="grey"@g'
}}'

file
done

样本html文件


    <html>
    <table>
    <tr>
    <td>2020-08-24</td>
    <td>NYC</td>
    <td>75</td>
    <td>Sunny</td>
    </tr>
    <tr>
    <td>2020-08-24</td>
    <td>Seattle</td>
    <td>55</td>
    <td>Rainy</td>
    </tr>
    <tr>
    <td>2020-08-24</td>
    <td>Austin</td>
    <td>85</td>
    <td>Sunny</td>
    </tr>
    <tr>
    <td>2020-08-25</td>
    <td>Seattle</td>
    <td>70</td>
    <td>Sunny</td>
    </tr>
    <tr>
    <td>2020-08-25</td>
    <td>Austin</td>
    <td>95</td>
    <td>Sunny</td>
    </tr>
    <tr>
    <td>2020-08-26</td>
    <td>NYC</td>
    <td>68</td>
    <td>Rainy</td>
    </tr>
    <tr>
    <td>2020-08-26</td>
    <td>Austin</td>
    <td>95</td>
    <td>Sunny</td>
    </tr>
    <tr>
    <td>2020-08-26</td>
    <td>San Jose</td>
    <td>85</td>
    <td>Sunny</td>
    </tr>
    </table>
    </html>

愿望输出


    <html>
    <table>
    <tr>
    <td bgcolor="grey">2020-08-24</td>
    <td bgcolor="grey"> NYC</td>
    <td bgcolor="grey"> 75</td>
    <td bgcolor="grey"> Sunny</td>
    </tr>
    <tr>
    <td bgcolor="grey"> 2020-08-24</td>
    <td bgcolor="grey"> Seattle</td>
    <td bgcolor="grey"> 55</td>
    <td bgcolor="grey"> Rainy</td>
    </tr>
    <tr>
    <td bgcolor="grey"> 2020-08-24</td>
    <td bgcolor="grey"> Austin</td>
    <td bgcolor="grey"> 85</td>
    <td bgcolor="grey"> Sunny</td>
    </tr>
    <tr>
    <td bgcolor="blue">2020-08-25</td>
    <td bgcolor="blue"> Seattle</td>
    <td bgcolor="blue"> 70</td>
    <td bgcolor="blue"> Sunny</td>
    </tr>
    <tr>
    <td bgcolor="blue"> 2020-08-25</td>
    <td bgcolor="blue"> Austin</td>
    <td bgcolor="blue"> 95</td>
    <td bgcolor="blue"> Sunny</td>
    </tr>
    <tr>
    <td bgcolor="grey">2020-08-26</td>
    <td bgcolor="grey"> NYC</td>
    <td bgcolor="grey"> 68</td>
    <td bgcolor="grey"> Rainy</td>
    </tr>
    <tr>
    <td bgcolor="grey"> 2020-08-26</td>
    <td bgcolor="grey"> Austin</td>
    <td bgcolor="grey"> 95</td>
    <td bgcolor="grey"> Sunny</td>
    </tr>
    <tr>
    <td bgcolor="grey"> 2020-08-26</td>
    <td bgcolor="grey"> San Jose</td>
    <td bgcolor="grey"> 85</td>
    <td bgcolor="grey"> Sunny</td>
    </tr>
    </table>
    </html>

xmlstarletbash

#!/bin/bash

tdDate="2020-08-24 2020-08-25 2020-08-26 2020-08-27"
colors=(grey blue)               # put colors in an array

declare -i c=0                   # set integer attribute. Counter for colors

for date in $tdDate; do
  # echo "$date ${colors[$c]}"   # debug 

  xmlstarlet edit -L --insert "//html/table/tr[td='$date']/td" --type attr -n 'bgcolor' -v "${colors[$c]}" file.xml

  c=c+1
  [[ $c -eq ${#colors[@]} ]] && c=0  # reset counter if $c equal array length
done

您当然可以通过不为每种颜色重写 XML 文件来提高效率。


参见:xmlstarlet edit --help

是的,阅读 html 可能很棘手,但如果它是平的,为什么不用 gawk 呢?

输出中每一行的颜色都在切换。

#!/bin/bash

tdDate="2020-08-24 2020-08-25 2020-08-26 2020-08-27"
tdSet=0
endTrTag="</tr>"
colors="grey blue"

gawk -v dates="$tdDate" -v colors="$colors" '
        BEGIN{ split(dates,Date); for(i in Date){ tdDate[Date[i]]=tdDate[Date[i]] };
               split(colors,color); c=1;
                FS="[\<\>]";
        }
         in tdDate { c=(c==1?2:1) }
        [=10=]~"<td>"  { gsub("<td>","<td bgcolor=\""color[c]"\">",[=10=]); }
        1
        ' sample.html

输出:

<html>
<table>
<tr>
<td bgcolor="blue">2020-08-24</td>
<td bgcolor="blue">NYC</td>
<td bgcolor="blue">75</td>
<td bgcolor="blue">Sunny</td>
</tr>
<tr>
<td bgcolor="grey">2020-08-24</td>
<td bgcolor="grey">Seattle</td>
<td bgcolor="grey">55</td>
<td bgcolor="grey">Rainy</td>
</tr>
<tr>
<td bgcolor="blue">2020-08-24</td>
<td bgcolor="blue">Austin</td>
<td bgcolor="blu.....

编辑(因为“一直在尝试打印 tdDate 以查看它包含的值”):

将此行添加到 awk-script:

END{ for(i in tdDate) { print "i:", i," tdDate:",tdDate[i] }}

输出将是(最后):

i: 2020-08-27  tdDate:
i: 2020-08-24  tdDate:
i: 2020-08-25  tdDate:
i: 2020-08-26  tdDate:

假设您真正想要的是每个日期都是不同的颜色,然后输入 simple/regular 我会做的:

$ cat tst.awk
BEGIN {
    # See https://www.w3schools.com/colors/colors_names.asp
    # for all portable HTML color names, we are just using 4 here.
    numColorsAvail = split("red green blue yellow",colors)
}
/<tr>/ { tdNr=0 }
/<td>/ {
    if ( ++tdNr == 1 ) {
        date = [=10=]
        sub(/[^>]+>[[:space:]]*/,"",date)
        sub(/[[:space:]]*<[^<]+$/,"",date)
        if ( !(date in date2color) ) {
            date2color[date] = colors[++numColorsUsed]
        }
        color = date2color[date]
    }
    sub(/>/," bgcolor=\""color"\">")
}
{ print }

.

$ awk -f tst.awk file
    <html>
    <table>
    <tr>
    <td bgcolor="red">2020-08-24</td>
    <td bgcolor="red">NYC</td>
    <td bgcolor="red">75</td>
    <td bgcolor="red">Sunny</td>
    </tr>
    <tr>
    <td bgcolor="red">2020-08-24</td>
    <td bgcolor="red">Seattle</td>
    <td bgcolor="red">55</td>
    <td bgcolor="red">Rainy</td>
    </tr>
    <tr>
    <td bgcolor="red">2020-08-24</td>
    <td bgcolor="red">Austin</td>
    <td bgcolor="red">85</td>
    <td bgcolor="red">Sunny</td>
    </tr>
    <tr>
    <td bgcolor="green">2020-08-25</td>
    <td bgcolor="green">Seattle</td>
    <td bgcolor="green">70</td>
    <td bgcolor="green">Sunny</td>
    </tr>
    <tr>
    <td bgcolor="green">2020-08-25</td>
    <td bgcolor="green">Austin</td>
    <td bgcolor="green">95</td>
    <td bgcolor="green">Sunny</td>
    </tr>
    <tr>
    <td bgcolor="blue">2020-08-26</td>
    <td bgcolor="blue">NYC</td>
    <td bgcolor="blue">68</td>
    <td bgcolor="blue">Rainy</td>
    </tr>
    <tr>
    <td bgcolor="blue">2020-08-26</td>
    <td bgcolor="blue">Austin</td>
    <td bgcolor="blue">95</td>
    <td bgcolor="blue">Sunny</td>
    </tr>
    <tr>
    <td bgcolor="blue">2020-08-26</td>
    <td bgcolor="blue">San Jose</td>
    <td bgcolor="blue">85</td>
    <td bgcolor="blue">Sunny</td>
    </tr>
    </table>
    </html>

如果您愿意,为 numColorsUsed 超过 numColorsAvail 添加警告 - 发出警告,将颜色设置为“灰色”,重置 numColorsUsed 以再次从第一种颜色开始,无论您喜欢什么,处理这些都是显而易见的琐事.

以下是所有 HTML 颜色名称,以及如果您想将其构建到脚本中,如何自行检索它们:

$ curl -s https://www.w3schools.com/colors/colors_names.asp | grep -o "colARR.push('[^']*')" | cut -d\' -f2
AliceBlue
AntiqueWhite
Aqua
Aquamarine
Azure
Beige
Bisque
Black
BlanchedAlmond
Blue
BlueViolet
Brown
BurlyWood
CadetBlue
Chartreuse
Chocolate
Coral
CornflowerBlue
Cornsilk
Crimson
Cyan
DarkBlue
DarkCyan
DarkGoldenRod
DarkGray
DarkGrey
DarkGreen
DarkKhaki
DarkMagenta
DarkOliveGreen
DarkOrange
DarkOrchid
DarkRed
DarkSalmon
DarkSeaGreen
DarkSlateBlue
DarkSlateGray
DarkSlateGrey
DarkTurquoise
DarkViolet
DeepPink
DeepSkyBlue
DimGray
DimGrey
DodgerBlue
FireBrick
FloralWhite
ForestGreen
Fuchsia
Gainsboro
GhostWhite
Gold
GoldenRod
Gray
Grey
Green
GreenYellow
HoneyDew
HotPink
IndianRed
Indigo
Ivory
Khaki
Lavender
LavenderBlush
LawnGreen
LemonChiffon
LightBlue
LightCoral
LightCyan
LightGoldenRodYellow
LightGray
LightGrey
LightGreen
LightPink
LightSalmon
LightSeaGreen
LightSkyBlue
LightSlateGray
LightSlateGrey
LightSteelBlue
LightYellow
Lime
LimeGreen
Linen
Magenta
Maroon
MediumAquaMarine
MediumBlue
MediumOrchid
MediumPurple
MediumSeaGreen
MediumSlateBlue
MediumSpringGreen
MediumTurquoise
MediumVioletRed
MidnightBlue
MintCream
MistyRose
Moccasin
NavajoWhite
Navy
OldLace
Olive
OliveDrab
Orange
OrangeRed
Orchid
PaleGoldenRod
PaleGreen
PaleTurquoise
PaleVioletRed
PapayaWhip
PeachPuff
Peru
Pink
Plum
PowderBlue
Purple
RebeccaPurple
Red
RosyBrown
RoyalBlue
SaddleBrown
Salmon
SandyBrown
SeaGreen
SeaShell
Sienna
Silver
SkyBlue
SlateBlue
SlateGray
SlateGrey
Snow
SpringGreen
SteelBlue
Tan
Teal
Thistle
Tomato
Turquoise
Violet
Wheat
White
WhiteSmoke
Yellow
YellowGreen

例如,要让您的脚本自动使用所有可移植的 HTML 颜色名称,您可以这样做:

awk -v htmlColors="$(curl -s https://www.w3schools.com/colors/colors_names.asp | grep -o "colARR.push('[^']*')" | cut -d\' -f2)" '
BEGIN {
   numColorsAvail = split(htmlColors,colors)
}
... rest of the script as above ...
'