sed 处理 OFX,从 <MEMO> 中提取收款人并在 <NAME> 上打印
sed to process an OFX, extracting payee from <MEMO> and printing on <NAME>
我正在处理 OFX(银行交易)文件。我的银行不使用 <NAME>
标签来指定收款人,但此信息是 <MEMO>
标签的子字符串。
所以,我的文件是这样的:
...ofx headers and other stuff
...line below is a transaction
<STMTTRN>
<TRNTYPE>OTHER</TRNTYPE>
<DTPOSTED>20160609120000</DTPOSTED>
<TRNAMT>-4.00</TRNAMT>
<FITID>2016060914000</FITID>
<CHECKNUM>000000700132</CHECKNUM>
<REFNUM>700.132</REFNUM>
<MEMO>Credit Card Payment - 09/06 18:37 Walmart 2th street</MEMO>
</STMTTRN>
...continues other transactions and end of file
我想匹配每个 <MEMO>
标签,提取收款人姓名(本例中为 Walmart 2th street
)并用 <NAME>
写一个新行。我的输出是这样的:
...ofx headers and other stuff
...line below is a transaction
<STMTTRN>
<TRNTYPE>OTHER</TRNTYPE>
<DTPOSTED>20160609120000</DTPOSTED>
<TRNAMT>-4.00</TRNAMT>
<FITID>2016060914000</FITID>
<CHECKNUM>000000700132</CHECKNUM>
<REFNUM>700.132</REFNUM>
<MEMO>Credit Card Payment - 09/06 18:37 Walmart 2th street</MEMO>
<NAME>Walmart 2th street</NAME>
</STMTTRN>
...continues other transactions and end of file
另一个工具如 awk 可以作为解决方案。
使用 GNU sed:
sed -r 's/.*<MEMO>.* [0-9]{2}:[0-9]{2} (.*)<.*/&\n <NAME><\/NAME>/' file
输出:
<STMTTRN>
<TRNTYPE>OTHER</TRNTYPE>
<DTPOSTED>20160609120000</DTPOSTED>
<TRNAMT>-4.00</TRNAMT>
<FITID>2016060914000</FITID>
<CHECKNUM>000000700132</CHECKNUM>
<REFNUM>700.132</REFNUM>
<MEMO>Credit Card Payment - 09/06 18:37 Walmart 2th street</MEMO>
<NAME>Walmart 2th street</NAME>
</STMTTRN>
如果您想编辑文件 "in place" 使用 sed 的选项 -i
。
补充@Cyrus 的回答以处理无 ascii 字符:
我放弃了非 ascii 字符,现在可以使用了:
iconv -f "windows-1252" -t "UTF-8" file-ansi.ofx -o file-utf8.ofx
rm file-ansi.ofx
sed 'y/áÁàÀãÃâÂéÉêÊíÍóÓõÕôÔúÚüÜçÇ/aAaAaAaAeEeEiIoOoOoOuUuUcC/' -i file-utf8.ofx
sed -i -r 's/.*<MEMO>.* [0-9]{2}:[0-9]{2} (.*)<.*/&\n <NAME><\/NAME>/' file-utf8.ofx
我的输出:
<MEMO>Cartao de Credito - 09/06 18:37 Walmart 2th</MEMO>
<NAME>Walmart 2th street</NAME>
我正在处理 OFX(银行交易)文件。我的银行不使用 <NAME>
标签来指定收款人,但此信息是 <MEMO>
标签的子字符串。
所以,我的文件是这样的:
...ofx headers and other stuff
...line below is a transaction
<STMTTRN>
<TRNTYPE>OTHER</TRNTYPE>
<DTPOSTED>20160609120000</DTPOSTED>
<TRNAMT>-4.00</TRNAMT>
<FITID>2016060914000</FITID>
<CHECKNUM>000000700132</CHECKNUM>
<REFNUM>700.132</REFNUM>
<MEMO>Credit Card Payment - 09/06 18:37 Walmart 2th street</MEMO>
</STMTTRN>
...continues other transactions and end of file
我想匹配每个 <MEMO>
标签,提取收款人姓名(本例中为 Walmart 2th street
)并用 <NAME>
写一个新行。我的输出是这样的:
...ofx headers and other stuff
...line below is a transaction
<STMTTRN>
<TRNTYPE>OTHER</TRNTYPE>
<DTPOSTED>20160609120000</DTPOSTED>
<TRNAMT>-4.00</TRNAMT>
<FITID>2016060914000</FITID>
<CHECKNUM>000000700132</CHECKNUM>
<REFNUM>700.132</REFNUM>
<MEMO>Credit Card Payment - 09/06 18:37 Walmart 2th street</MEMO>
<NAME>Walmart 2th street</NAME>
</STMTTRN>
...continues other transactions and end of file
另一个工具如 awk 可以作为解决方案。
使用 GNU sed:
sed -r 's/.*<MEMO>.* [0-9]{2}:[0-9]{2} (.*)<.*/&\n <NAME><\/NAME>/' file
输出:
<STMTTRN>
<TRNTYPE>OTHER</TRNTYPE>
<DTPOSTED>20160609120000</DTPOSTED>
<TRNAMT>-4.00</TRNAMT>
<FITID>2016060914000</FITID>
<CHECKNUM>000000700132</CHECKNUM>
<REFNUM>700.132</REFNUM>
<MEMO>Credit Card Payment - 09/06 18:37 Walmart 2th street</MEMO>
<NAME>Walmart 2th street</NAME>
</STMTTRN>
如果您想编辑文件 "in place" 使用 sed 的选项 -i
。
补充@Cyrus 的回答以处理无 ascii 字符:
我放弃了非 ascii 字符,现在可以使用了:
iconv -f "windows-1252" -t "UTF-8" file-ansi.ofx -o file-utf8.ofx
rm file-ansi.ofx
sed 'y/áÁàÀãÃâÂéÉêÊíÍóÓõÕôÔúÚüÜçÇ/aAaAaAaAeEeEiIoOoOoOuUuUcC/' -i file-utf8.ofx
sed -i -r 's/.*<MEMO>.* [0-9]{2}:[0-9]{2} (.*)<.*/&\n <NAME><\/NAME>/' file-utf8.ofx
我的输出:
<MEMO>Cartao de Credito - 09/06 18:37 Walmart 2th</MEMO>
<NAME>Walmart 2th street</NAME>