使用 shell 删除不需要的字符,同时保留 CSV 格式

Remove Unnecessary Characters while keeping CSV format using shell

我有一个 CSV 文件,格式如下

902610747280285697, possible future hurricaneirma analog 1995\xC2\xA02003\xC2\xA02004\xC2\xA02008 2010 til east leeward doubtful afterward invest93l
902611695239094277, midlevel ridge push invest93l future hurricaneirma wsw ridge east leave som
902642953373577216, midlevel ridge push invest93l future hurricaneirma wsw ridge east leave som
902711459561525248, midlevel ridge push invest93l future hurricaneirma wsw ridge east leave som
902755305158782976, 12z ecmwf setup support major strike east coast high east deep uul west hurricaneirma
902772740507275265, possible future hurricaneirma analog 1995\xC2\xA02003\xC2\xA02004\xC2\xA02008 2010 til east leeward doubtful
902777486186086400, future hurricaneirma satellite look impressive tropicaldepression10 24 hour
903355611810852867, hurricaneirma think f ***
903355689455804416, hurricaneirma tropics weather
903411347337162752, hurricaneirma shiiiiiiitty t *** im possibly wrong
903411365607591936, hurricaneirma 3000 mile cat 3 hurricane watch closely
903989185845088257, 

我如何删除像 *,\xC2\xA02003\xC2\xA02004\xC2\xA0 这样的字符和像最后一个这样的空行,这可能会在以后的 Scala 处理中引发错误 on.I 需要保持 CSV 结构相同方式和以前一样,但需要删除这些。

请帮我在 shell 脚本中实现这个? 再次感谢您,因为我是 shell 脚本

的新手

编辑:

你能告诉我如何更正损坏的行(没有',')吗

902755305158782976, 12z ecmwf setup support major strike east coast high east deep uul west hurricaneirma
902777486186086400, future hurricaneirma satellite look impressive tropicaldepression10 24 hour
903355611810852867 hurricaneirma think
903355611810852868 hurricagggneirma think

您可以为此使用 sed,但我敢肯定,您可能无法获得 100% 的结果。您应该使用您正在处理的文件的本机工具来获得所需格式的结果。不管怎样,下面是我的尝试:

$ sed -E '/^[^,]*$/d;/^[0-9]+, *$/d;s/ \*+ */ /;s/\[xX][^\ ,]*//g' case_file_48246326

输出

902610747280285697, possible future hurricaneirma analog 1995 2010 til east leeward doubtful afterward invest93l
902611695239094277, midlevel ridge push invest93l future hurricaneirma wsw ridge east leave som
902642953373577216, midlevel ridge push invest93l future hurricaneirma wsw ridge east leave som
902711459561525248, midlevel ridge push invest93l future hurricaneirma wsw ridge east leave som
902755305158782976, 12z ecmwf setup support major strike east coast high east deep uul west hurricaneirma
902772740507275265, possible future hurricaneirma analog 1995 2010 til east leeward doubtful
902777486186086400, future hurricaneirma satellite look impressive tropicaldepression10 24 hour
903355611810852867, hurricaneirma think f 
903355689455804416, hurricaneirma tropics weather
903411347337162752, hurricaneirma shiiiiiiitty t im possibly wrong
903411365607591936, hurricaneirma 3000 mile cat 3 hurricane watch closely