从字符串中删除控制、隐藏、不需要的字符
remove control, hidden, unwanted characters from string
请在我受到打击之前回答,因为还有其他类似我的 Whosebug 问题,但它们不起作用。我试图从入站邮件中删除不需要的字符,但没有成功。我不知道这些字符是什么或代表什么,但是,它们似乎像回车 returns 和换行符或换行符一样分解数据。
我需要保留除末尾以外的所有空格。
我看到的字符是 ^M 和 ^C。有时一起使用,有时单独使用。
我的测试代码基本上来自于观察其他类似的问题。
String msg = exchange.getIn().getBody(String.class);
log.info("Message before apply filter: " + msg);
filteredMessage = msg.replaceAll("[^\x00-\x7F]","");
log.info("Remove non-ASCII characters: " + filteredMessage);
filteredMessage = msg.replaceAll("[\p{C}]","");
log.info("Remove all Control characters: " + filteredMessage);
filteredMessage = msg.replaceAll("[\p{Cntrl}\p{Cc}\p{Cf}\p{Co}\p{Cn}]","");
log.info("Remove some Control characters: " + filteredMessage);
filteredMessage = msg.replaceAll("[^\p{Print}]","");
log.info("Remove non printable characters: " + filteredMessage);
filteredMessage = msg.trim();
log.info("Trim: " + filteredMessage);
filteredMessage = msg.replaceAll("\cM","");
log.info("Remove ^M Control characters: " + filteredMessage);
filteredMessage = msg.replaceAll("^M","");
log.info("Remove ^M Control characters: " + filteredMessage);
exchange.getIn().setBody(filteredMessage);
示例数据文件:
A 291511191831421742XXXXXXXXXXWRN/WN18111917420077000009ENG 2 IGN B FAULT^M ^C
A 056611191832641742XXXXXXXXXXFLR/FR18111917410032470002BRK TEMP SENSOR4(6GW)/ BTMU(2GW)^M/IDBSCU 1^M ^C
A 080011191830061749XXXXXXXXXXMPF/AN.N306DN/FIDAL800 /DM181119142800/DAKSMF/DSKMSP/WN18111916310034000006NAV ATC/XPDR 1 FAULT^M,18111917480032000009BRAKES HOT^M/FR18111916310034523306ATC 1(1SH1)^M/IDATC 1^M/FR18111916310034723406ATC1(1SH1)/TCAS(1000SG)^M/IDTCAS^M/FR18111917120022833406AFS:FMGC2^M/IDAFS 1^M,IR 1^M,IR 2^M,IR 3^M/FR18111917120022833406AFS:FMGC1^M/IDAFS 1^M,IR 1^M,IR 2^M,IR 3^M ^C
我的过滤器不起作用。这是结果。就像正则表达式根本不起作用或者我在做一些愚蠢的事情。谢谢大家!
Message before apply filter: A 056611191832641742XXXXXXXXXXFLR/FR18111917410032470002BRK TEMP SENSOR4(6GW)/ BTMU(2GW)^M/IDBSCU 1^M ^C
Remove non-ASCII characters: A 056611191832641742XXXXXXXXXXFLR/FR18111917410032470002BRK TEMP SENSOR4(6GW)/ BTMU(2GW)^M/IDBSCU 1^M ^C
Remove all Control characters: A 056611191832641742XXXXXXXXXXFLR/FR18111917410032470002BRK TEMP SENSOR4(6GW)/ BTMU(2GW)^M/IDBSCU 1^M ^C
Remove some Control characters: A 056611191832641742XXXXXXXXXXFLR/FR18111917410032470002BRK TEMP SENSOR4(6GW)/ BTMU(2GW)^M/IDBSCU 1^M ^C
Remove non printable characters: A 056611191832641742XXXXXXXXXXFLR/FR18111917410032470002BRK TEMP SENSOR4(6GW)/ BTMU(2GW)^M/IDBSCU 1^M ^C
Trim: A 056611191832641742XXXXXXXXXXFLR/FR18111917410032470002BRK TEMP SENSOR4(6GW)/ BTMU(2GW)^M/IDBSCU 1^M ^C
Remove ^M Control characters: A 056611191832641742XXXXXXXXXXFLR/FR18111917410032470002BRK TEMP SENSOR4(6GW)/ BTMU(2GW)^M/IDBSCU 1^M ^C
Remove ^M Control characters: A 056611191832641742XXXXXXXXXXFLR/FR18111917410032470002BRK TEMP SENSOR4(6GW)/ BTMU(2GW)^M/IDBSCU 1^M ^C
尝试以下方法,看看效果如何:
String msg = exchange.getIn().getBody(String.class);
log.info("Message before apply filter: " + msg);
filteredMessage = msg.replaceAll("[^\x00-\x7F]","");
log.info("Remove non-ASCII characters: " + filteredMessage);
filteredMessage =filteredMessage.replaceAll("[\p{C}]","");
log.info("Remove all Control characters: " + filteredMessage);
filteredMessage = filteredMessage.replaceAll("[\p{Cntrl}\p{Cc}\p{Cf}\p{Co}\p{Cn}]","");
log.info("Remove some Control characters: " + filteredMessage);
filteredMessage = filteredMessage.replaceAll("[^\p{Print}]","");
log.info("Remove non printable characters: " + filteredMessage);
filteredMessage = filteredMessage.trim();
log.info("Trim: " + filteredMessage);
filteredMessage = filteredMessage.replaceAll("\cM","");
log.info("Remove ^M Control characters: " + filteredMessage);
filteredMessage = filteredMessage.replaceAll("^M","");
log.info("Remove ^M Control characters: " + filteredMessage);
在此之后,如果您仍然看到不需要的字符,则需要使用转义字符 \
来转义反斜杠和其他元字符:
<>()[]{}\^-=$!|?*+.
所以你需要,例如:.replaceALl("\^M","");
请在我受到打击之前回答,因为还有其他类似我的 Whosebug 问题,但它们不起作用。我试图从入站邮件中删除不需要的字符,但没有成功。我不知道这些字符是什么或代表什么,但是,它们似乎像回车 returns 和换行符或换行符一样分解数据。 我需要保留除末尾以外的所有空格。 我看到的字符是 ^M 和 ^C。有时一起使用,有时单独使用。
我的测试代码基本上来自于观察其他类似的问题。
String msg = exchange.getIn().getBody(String.class);
log.info("Message before apply filter: " + msg);
filteredMessage = msg.replaceAll("[^\x00-\x7F]","");
log.info("Remove non-ASCII characters: " + filteredMessage);
filteredMessage = msg.replaceAll("[\p{C}]","");
log.info("Remove all Control characters: " + filteredMessage);
filteredMessage = msg.replaceAll("[\p{Cntrl}\p{Cc}\p{Cf}\p{Co}\p{Cn}]","");
log.info("Remove some Control characters: " + filteredMessage);
filteredMessage = msg.replaceAll("[^\p{Print}]","");
log.info("Remove non printable characters: " + filteredMessage);
filteredMessage = msg.trim();
log.info("Trim: " + filteredMessage);
filteredMessage = msg.replaceAll("\cM","");
log.info("Remove ^M Control characters: " + filteredMessage);
filteredMessage = msg.replaceAll("^M","");
log.info("Remove ^M Control characters: " + filteredMessage);
exchange.getIn().setBody(filteredMessage);
示例数据文件:
A 291511191831421742XXXXXXXXXXWRN/WN18111917420077000009ENG 2 IGN B FAULT^M ^C
A 056611191832641742XXXXXXXXXXFLR/FR18111917410032470002BRK TEMP SENSOR4(6GW)/ BTMU(2GW)^M/IDBSCU 1^M ^C
A 080011191830061749XXXXXXXXXXMPF/AN.N306DN/FIDAL800 /DM181119142800/DAKSMF/DSKMSP/WN18111916310034000006NAV ATC/XPDR 1 FAULT^M,18111917480032000009BRAKES HOT^M/FR18111916310034523306ATC 1(1SH1)^M/IDATC 1^M/FR18111916310034723406ATC1(1SH1)/TCAS(1000SG)^M/IDTCAS^M/FR18111917120022833406AFS:FMGC2^M/IDAFS 1^M,IR 1^M,IR 2^M,IR 3^M/FR18111917120022833406AFS:FMGC1^M/IDAFS 1^M,IR 1^M,IR 2^M,IR 3^M ^C
我的过滤器不起作用。这是结果。就像正则表达式根本不起作用或者我在做一些愚蠢的事情。谢谢大家!
Message before apply filter: A 056611191832641742XXXXXXXXXXFLR/FR18111917410032470002BRK TEMP SENSOR4(6GW)/ BTMU(2GW)^M/IDBSCU 1^M ^C
Remove non-ASCII characters: A 056611191832641742XXXXXXXXXXFLR/FR18111917410032470002BRK TEMP SENSOR4(6GW)/ BTMU(2GW)^M/IDBSCU 1^M ^C
Remove all Control characters: A 056611191832641742XXXXXXXXXXFLR/FR18111917410032470002BRK TEMP SENSOR4(6GW)/ BTMU(2GW)^M/IDBSCU 1^M ^C
Remove some Control characters: A 056611191832641742XXXXXXXXXXFLR/FR18111917410032470002BRK TEMP SENSOR4(6GW)/ BTMU(2GW)^M/IDBSCU 1^M ^C
Remove non printable characters: A 056611191832641742XXXXXXXXXXFLR/FR18111917410032470002BRK TEMP SENSOR4(6GW)/ BTMU(2GW)^M/IDBSCU 1^M ^C
Trim: A 056611191832641742XXXXXXXXXXFLR/FR18111917410032470002BRK TEMP SENSOR4(6GW)/ BTMU(2GW)^M/IDBSCU 1^M ^C
Remove ^M Control characters: A 056611191832641742XXXXXXXXXXFLR/FR18111917410032470002BRK TEMP SENSOR4(6GW)/ BTMU(2GW)^M/IDBSCU 1^M ^C
Remove ^M Control characters: A 056611191832641742XXXXXXXXXXFLR/FR18111917410032470002BRK TEMP SENSOR4(6GW)/ BTMU(2GW)^M/IDBSCU 1^M ^C
尝试以下方法,看看效果如何:
String msg = exchange.getIn().getBody(String.class);
log.info("Message before apply filter: " + msg);
filteredMessage = msg.replaceAll("[^\x00-\x7F]","");
log.info("Remove non-ASCII characters: " + filteredMessage);
filteredMessage =filteredMessage.replaceAll("[\p{C}]","");
log.info("Remove all Control characters: " + filteredMessage);
filteredMessage = filteredMessage.replaceAll("[\p{Cntrl}\p{Cc}\p{Cf}\p{Co}\p{Cn}]","");
log.info("Remove some Control characters: " + filteredMessage);
filteredMessage = filteredMessage.replaceAll("[^\p{Print}]","");
log.info("Remove non printable characters: " + filteredMessage);
filteredMessage = filteredMessage.trim();
log.info("Trim: " + filteredMessage);
filteredMessage = filteredMessage.replaceAll("\cM","");
log.info("Remove ^M Control characters: " + filteredMessage);
filteredMessage = filteredMessage.replaceAll("^M","");
log.info("Remove ^M Control characters: " + filteredMessage);
在此之后,如果您仍然看到不需要的字符,则需要使用转义字符 \
来转义反斜杠和其他元字符:
<>()[]{}\^-=$!|?*+.
所以你需要,例如:.replaceALl("\^M","");