在 C# 中解析带有文字的文本文件
Parsing a text file with literals in C#
我在解析带有文字的文本文件时遇到问题。
我遇到问题的文字是:
"\(" which is an open bracket "(" and "/)" which is the close bracket ")"
这是我正在解析的文本文件的示例:
BT /F1 9 Tf 53.8646616541353 441 Td ( Voucher AADA Trans. Prods CDE TRX Payment) Tj ET
BT /F1 9 Tf 53.8646616541353 432 Td ( Number Num Date WH ID Name Name # Year Inv. # CD Due Date Qty Price Disct % Amount Due) Tj ET
BT /F1 9 Tf 53.8646616541353 423 Td (--------- ---- ---------- -- ------ -------- ----- -- ---- ---------- -- ---------- ------ ------- ------- ------- ------------) Tj ET
BT /F1 9 Tf 53.8646616541353 414 Td ( 21812539 09/30/2015 NA 29264 Symante SUMME 52 2015 1735247 RM 09/30/2015 2 .00 50.0000 100.0% 15.00 ) Tj ET
BT /F1 9 Tf 53.8646616541353 405 Td ( 21827266 10/01/2015 NA 29264 Symante SUMME 52 2015 1735966 RE 10/01/2015 1 .00 50.0000 100.0% \(7.50\)) Tj ET
BT /F1 9 Tf 53.8646616541353 396 Td ( 21832628 10/02/2015 NA 29264 Symante SUMME 52 2015 1736174 RM 10/02/2015 1 .00 50.0000 100.0% 7.50 ) Tj ET
BT /F1 9 Tf 53.8646616541353 387 Td ( 21838251 10/02/2015 NA 29264 Symante SUMME 52 2015 1736429 RE 10/02/2015 1 .00 50.0000 100.0% \(7.50\)) Tj ET
BT /F1 9 Tf 53.8646616541353 378 Td ( 21841821 10/03/2015 NA 29264 Symante SUMME 52 2015 1736583 RM 10/03/2015 1 .00 50.0000 100.0% 7.50 ) Tj ET
BT /F1 9 Tf 53.8646616541353 369 Td ( 21874851 10/08/2015 NA 29264 Symante SUMME 52 2015 1738192 RE 10/08/2015 1 .00 50.0000 100.0% \(7.50\)) Tj ET
BT /F1 9 Tf 53.8646616541353 360 Td ( 21879328 10/09/2015 NA 29264 Symante SUMME 52 2015 1738389 RM 10/09/2015 1 .00 50.0000 100.0% 7.50 ) Tj ET
BT /F1 9 Tf 53.8646616541353 351 Td ( 21933007 10/16/2015 NA 29264 Symante SUMME 52 2015 0000531968 SK 10/16/2015 1 .00 50.0000 100.0% \(7.50\)) Tj ET
BT /F1 9 Tf 53.8646616541353 342 Td ( -------------) Tj ET
BT /F1 9 Tf 53.8646616541353 333 Td ( Sub Total: \(,650.00\)) Tj ET
BT /F1 9 Tf 53.8646616541353 324 Td ( -------------) Tj ET
BT /F1 9 Tf 53.8646616541353 315 Td ( 21827466 10/02/2015 NA 57629 0000531284 PO 10/02/2015 0 100.0% \(1500.00\)) Tj ET
BT /F1 9 Tf 53.8646616541353 306 Td ( -------------) Tj ET
BT /F1 9 Tf 53.8646616541353 297 Td ( Sub Total: \(,500.00\)) Tj ET
BT /F1 9 Tf 53.8646616541353 288 Td ( -------------) Tj ET
BT /F1 9 Tf 53.8646616541353 279 Td ( 21663952 09/02/2015 SN 57629 Zeal \(I\) 61-SE 61 2015 0000529704 IN 11/01/2015 2443 .95 50.0000 100.0% 11111.43 ) Tj ET
BT /F1 9 Tf 53.8646616541353 270 Td ( 21663953 09/02/2015 SN 57629 Zeal \(I\) 61-SE 61 2015 0000529704 SP 11/01/2015 2443 .95 50.0000 100.0% \(200.33\)) Tj ET
BT /F1 9 Tf 53.8646616541353 261 Td ( 21699656 09/09/2015 S2 57629 Zeal \(I\) 61-SE 61 2015 0000530025 IN 11/08/2015 449 .95 50.0000 100.0% 1156.28 ) Tj ET
BT /F1 9 Tf 53.8646616541353 252 Td ( 21699657 09/09/2015 S2 57629 Zeal \(I\) 61-SE 61 2015 0000530025 SP 11/08/2015 449 .95 50.0000 100.0% \(36.82\)) Tj ET
BT /F1 9 Tf 53.8646616541353 243 Td ( 21699658 09/09/2015 SL 57629 Zeal \(I\) 61-SE 61 2015 0000530025 IN 11/08/2015 1320 .95 50.0000 100.0% 1111.00 ) Tj ET
BT /F1 9 Tf 53.8646616541353 234 Td ( 21699659 09/09/2015 SL 57629 Zeal \(I\) 61-SE 61 2015 0000530025 SP 11/08/2015 1320 .95 50.0000 100.0% \(108.24\)) Tj ET
BT /F1 9 Tf 53.8646616541353 225 Td ( 21736996 09/16/2015 S1 57629 Zeal \(I\) 61-SE 61 2015 0000530390 IN 11/15/2015 1016 .95 50.0000 100.0% 1111.60 ) Tj ET
BT /F1 9 Tf 53.8646616541353 216 Td ( 21736997 09/16/2015 S1 57629 Zeal \(I\) 61-SE 61 2015 0000530390 SP 11/15/2015 1016 .95 50.0000 100.0% \(83.31\)) Tj ET
BT /F1 9 Tf 53.8646616541353 207 Td ( 21808378 09/29/2015 NA 57629 Zeal \(I\) 61-SE 61 2015 1735086 RE 09/29/2015 8 .95 50.0000 100.0% \(59.80\)) Tj ET
BT /F1 9 Tf 53.8646616541353 198 Td ( 21838252 10/02/2015 NA 57629 Zeal \(I\) 61-SE 61 2015 1736429 RE 10/02/2015 1 .95 50.0000 100.0% \(7.48\)) Tj ET
BT /F1 9 Tf 53.8646616541353 189 Td ( 21874852 10/08/2015 NA 57629 Zeal \(I\) 61-SE 61 2015 1738192 RE 10/08/2015 4 .95 50.0000 100.0% \(29.90\)) Tj ET
BT /F1 9 Tf 53.8646616541353 180 Td (
如果您查看第 20 行,产品名称是 Zeal (I)。负数(最后一列应付金额)也用方括号括起来。
我正在逐行解析文本文件,但是,当我尝试
line.Replace(@"\(", "");
这似乎行不通。我以前从未在文件中遇到过这些文字,所以我不确定如何处理。除了这个,我几乎完成了解析。
我这样做的方式非常简单
string line;
int count = 0; // to be removed. Used in testing to cap count.
while ((line = reader.ReadLine()) != null)
{
if (count <= 10)
{
if (line.Length > 170 && line.Length < 200)
{
if (!ContainsAny(line))
{
line.Replace(@"\(", "");
indexStart = line.IndexOf("Td (") + 4;
col0 = line.Substring(indexStart, 9);
col1 = line.Substring(indexStart + 10, 4);
col2 = line.Substring(indexStart + 15, 10);
col3 = line.Substring(indexStart + 26, 2);
col4 = line.Substring(indexStart + 29, 6);
col5 = line.Substring(indexStart + 36, 8);
col6 = line.Substring(indexStart + 45, 5);
col7 = line.Substring(indexStart + 51, 2);
col8 = line.Substring(indexStart + 54, 4);
col9 = line.Substring(indexStart + 59, 10);
col10 = line.Substring(indexStart + 70, 2);
col11 = line.Substring(indexStart + 73, 10);
col12 = line.Substring(indexStart + 84, 6);
col13 = line.Substring(indexStart + 91, 7).Replace("$", "");
col14 = line.Substring(indexStart + 99, 7);
col15 = line.Substring(indexStart + 107, 7).Replace("%", "");
col16 = line.Substring(indexStart + 115, 12);
MessageBox.Show(string.Format("{0}; {1}; {2}; {3}; {4}; {5}; {6}; {7}; {8}; {9}; {10}; {11}; {12}; {13}; {14}; {15}; {16};", col0, col1, col2, col3, col4, col5, col6, col7, col8, col9, col10, col11, col12, col13, col14, col15, col16));
//writer.WriteLine(lineOut);
count += 1; // to be removed. Used in testing to cap count.
}
}
}
我写入文件时得到的结果是
21841821 10/03/2015 NA 29264 Symante SUMME 52 2015 1736583 RM 10/03/2015 1 15 50 100 7.5
21874851 10/08/2015 NA 29264 Symante SUMME 52 2015 1738192 RE 10/08/2015 1 15 50 100 -7.5
21879328 10/09/2015 NA 29264 Symante SUMME 52 2015 1738389 RM 10/09/2015 1 15 50 100 7.5
21933007 10/16/2015 NA 29264 Symante SUMME 52 2015 531968 SK 10/16/2015 1 15 50 100 -7.5
21827466 10/02/2015 NA 57629 531284 PO 10/02/2015 0 100 -4500
21663952 09/02/2015 SN 57629 Zeal \(I ) 61- E 1 20 5 00005297 4 N 11/01/20 5 24 3 14. 5 50.00 0 100. 18261.40%
21663953 09/02/2015 SN 57629 Zeal \(I ) 61- E 1 20 5 00005297 4 P 11/01/20 5 24 3 14. 5 50.00 0 100. -200.00%
21699656 09/09/2015 S2 57629 Zeal \(I ) 61- E 1 20 5 00005300 5 N 11/08/20 5 4 9 14. 5 50.00 0 100. 3356.20%
line.Replace(@"\(", "");
不修改 string
。它只是 return 新更改 string
。你应该写:
line = line.Replace(@"\(", "");
检查 String.Replace
的文档:
Returns a new string in which all occurrences of a specified string in the current instance are replaced with another specified
string.
您需要使用:
line=line.Replace(@"\(", "");
看起来你写的比实际需要的太多了。
var allLines = File.ReadAllLines(@"C:\myfile.text");
var correctedLines = allLines.Select(l => l.Replace(@"\(", "").Replace(@"\)", ""));
//now use corrected lines in your code
我在解析带有文字的文本文件时遇到问题。
我遇到问题的文字是:
"\(" which is an open bracket "(" and "/)" which is the close bracket ")"
这是我正在解析的文本文件的示例:
BT /F1 9 Tf 53.8646616541353 441 Td ( Voucher AADA Trans. Prods CDE TRX Payment) Tj ET
BT /F1 9 Tf 53.8646616541353 432 Td ( Number Num Date WH ID Name Name # Year Inv. # CD Due Date Qty Price Disct % Amount Due) Tj ET
BT /F1 9 Tf 53.8646616541353 423 Td (--------- ---- ---------- -- ------ -------- ----- -- ---- ---------- -- ---------- ------ ------- ------- ------- ------------) Tj ET
BT /F1 9 Tf 53.8646616541353 414 Td ( 21812539 09/30/2015 NA 29264 Symante SUMME 52 2015 1735247 RM 09/30/2015 2 .00 50.0000 100.0% 15.00 ) Tj ET
BT /F1 9 Tf 53.8646616541353 405 Td ( 21827266 10/01/2015 NA 29264 Symante SUMME 52 2015 1735966 RE 10/01/2015 1 .00 50.0000 100.0% \(7.50\)) Tj ET
BT /F1 9 Tf 53.8646616541353 396 Td ( 21832628 10/02/2015 NA 29264 Symante SUMME 52 2015 1736174 RM 10/02/2015 1 .00 50.0000 100.0% 7.50 ) Tj ET
BT /F1 9 Tf 53.8646616541353 387 Td ( 21838251 10/02/2015 NA 29264 Symante SUMME 52 2015 1736429 RE 10/02/2015 1 .00 50.0000 100.0% \(7.50\)) Tj ET
BT /F1 9 Tf 53.8646616541353 378 Td ( 21841821 10/03/2015 NA 29264 Symante SUMME 52 2015 1736583 RM 10/03/2015 1 .00 50.0000 100.0% 7.50 ) Tj ET
BT /F1 9 Tf 53.8646616541353 369 Td ( 21874851 10/08/2015 NA 29264 Symante SUMME 52 2015 1738192 RE 10/08/2015 1 .00 50.0000 100.0% \(7.50\)) Tj ET
BT /F1 9 Tf 53.8646616541353 360 Td ( 21879328 10/09/2015 NA 29264 Symante SUMME 52 2015 1738389 RM 10/09/2015 1 .00 50.0000 100.0% 7.50 ) Tj ET
BT /F1 9 Tf 53.8646616541353 351 Td ( 21933007 10/16/2015 NA 29264 Symante SUMME 52 2015 0000531968 SK 10/16/2015 1 .00 50.0000 100.0% \(7.50\)) Tj ET
BT /F1 9 Tf 53.8646616541353 342 Td ( -------------) Tj ET
BT /F1 9 Tf 53.8646616541353 333 Td ( Sub Total: \(,650.00\)) Tj ET
BT /F1 9 Tf 53.8646616541353 324 Td ( -------------) Tj ET
BT /F1 9 Tf 53.8646616541353 315 Td ( 21827466 10/02/2015 NA 57629 0000531284 PO 10/02/2015 0 100.0% \(1500.00\)) Tj ET
BT /F1 9 Tf 53.8646616541353 306 Td ( -------------) Tj ET
BT /F1 9 Tf 53.8646616541353 297 Td ( Sub Total: \(,500.00\)) Tj ET
BT /F1 9 Tf 53.8646616541353 288 Td ( -------------) Tj ET
BT /F1 9 Tf 53.8646616541353 279 Td ( 21663952 09/02/2015 SN 57629 Zeal \(I\) 61-SE 61 2015 0000529704 IN 11/01/2015 2443 .95 50.0000 100.0% 11111.43 ) Tj ET
BT /F1 9 Tf 53.8646616541353 270 Td ( 21663953 09/02/2015 SN 57629 Zeal \(I\) 61-SE 61 2015 0000529704 SP 11/01/2015 2443 .95 50.0000 100.0% \(200.33\)) Tj ET
BT /F1 9 Tf 53.8646616541353 261 Td ( 21699656 09/09/2015 S2 57629 Zeal \(I\) 61-SE 61 2015 0000530025 IN 11/08/2015 449 .95 50.0000 100.0% 1156.28 ) Tj ET
BT /F1 9 Tf 53.8646616541353 252 Td ( 21699657 09/09/2015 S2 57629 Zeal \(I\) 61-SE 61 2015 0000530025 SP 11/08/2015 449 .95 50.0000 100.0% \(36.82\)) Tj ET
BT /F1 9 Tf 53.8646616541353 243 Td ( 21699658 09/09/2015 SL 57629 Zeal \(I\) 61-SE 61 2015 0000530025 IN 11/08/2015 1320 .95 50.0000 100.0% 1111.00 ) Tj ET
BT /F1 9 Tf 53.8646616541353 234 Td ( 21699659 09/09/2015 SL 57629 Zeal \(I\) 61-SE 61 2015 0000530025 SP 11/08/2015 1320 .95 50.0000 100.0% \(108.24\)) Tj ET
BT /F1 9 Tf 53.8646616541353 225 Td ( 21736996 09/16/2015 S1 57629 Zeal \(I\) 61-SE 61 2015 0000530390 IN 11/15/2015 1016 .95 50.0000 100.0% 1111.60 ) Tj ET
BT /F1 9 Tf 53.8646616541353 216 Td ( 21736997 09/16/2015 S1 57629 Zeal \(I\) 61-SE 61 2015 0000530390 SP 11/15/2015 1016 .95 50.0000 100.0% \(83.31\)) Tj ET
BT /F1 9 Tf 53.8646616541353 207 Td ( 21808378 09/29/2015 NA 57629 Zeal \(I\) 61-SE 61 2015 1735086 RE 09/29/2015 8 .95 50.0000 100.0% \(59.80\)) Tj ET
BT /F1 9 Tf 53.8646616541353 198 Td ( 21838252 10/02/2015 NA 57629 Zeal \(I\) 61-SE 61 2015 1736429 RE 10/02/2015 1 .95 50.0000 100.0% \(7.48\)) Tj ET
BT /F1 9 Tf 53.8646616541353 189 Td ( 21874852 10/08/2015 NA 57629 Zeal \(I\) 61-SE 61 2015 1738192 RE 10/08/2015 4 .95 50.0000 100.0% \(29.90\)) Tj ET
BT /F1 9 Tf 53.8646616541353 180 Td (
如果您查看第 20 行,产品名称是 Zeal (I)。负数(最后一列应付金额)也用方括号括起来。
我正在逐行解析文本文件,但是,当我尝试
line.Replace(@"\(", "");
这似乎行不通。我以前从未在文件中遇到过这些文字,所以我不确定如何处理。除了这个,我几乎完成了解析。
我这样做的方式非常简单
string line;
int count = 0; // to be removed. Used in testing to cap count.
while ((line = reader.ReadLine()) != null)
{
if (count <= 10)
{
if (line.Length > 170 && line.Length < 200)
{
if (!ContainsAny(line))
{
line.Replace(@"\(", "");
indexStart = line.IndexOf("Td (") + 4;
col0 = line.Substring(indexStart, 9);
col1 = line.Substring(indexStart + 10, 4);
col2 = line.Substring(indexStart + 15, 10);
col3 = line.Substring(indexStart + 26, 2);
col4 = line.Substring(indexStart + 29, 6);
col5 = line.Substring(indexStart + 36, 8);
col6 = line.Substring(indexStart + 45, 5);
col7 = line.Substring(indexStart + 51, 2);
col8 = line.Substring(indexStart + 54, 4);
col9 = line.Substring(indexStart + 59, 10);
col10 = line.Substring(indexStart + 70, 2);
col11 = line.Substring(indexStart + 73, 10);
col12 = line.Substring(indexStart + 84, 6);
col13 = line.Substring(indexStart + 91, 7).Replace("$", "");
col14 = line.Substring(indexStart + 99, 7);
col15 = line.Substring(indexStart + 107, 7).Replace("%", "");
col16 = line.Substring(indexStart + 115, 12);
MessageBox.Show(string.Format("{0}; {1}; {2}; {3}; {4}; {5}; {6}; {7}; {8}; {9}; {10}; {11}; {12}; {13}; {14}; {15}; {16};", col0, col1, col2, col3, col4, col5, col6, col7, col8, col9, col10, col11, col12, col13, col14, col15, col16));
//writer.WriteLine(lineOut);
count += 1; // to be removed. Used in testing to cap count.
}
}
}
我写入文件时得到的结果是
21841821 10/03/2015 NA 29264 Symante SUMME 52 2015 1736583 RM 10/03/2015 1 15 50 100 7.5
21874851 10/08/2015 NA 29264 Symante SUMME 52 2015 1738192 RE 10/08/2015 1 15 50 100 -7.5
21879328 10/09/2015 NA 29264 Symante SUMME 52 2015 1738389 RM 10/09/2015 1 15 50 100 7.5
21933007 10/16/2015 NA 29264 Symante SUMME 52 2015 531968 SK 10/16/2015 1 15 50 100 -7.5
21827466 10/02/2015 NA 57629 531284 PO 10/02/2015 0 100 -4500
21663952 09/02/2015 SN 57629 Zeal \(I ) 61- E 1 20 5 00005297 4 N 11/01/20 5 24 3 14. 5 50.00 0 100. 18261.40%
21663953 09/02/2015 SN 57629 Zeal \(I ) 61- E 1 20 5 00005297 4 P 11/01/20 5 24 3 14. 5 50.00 0 100. -200.00%
21699656 09/09/2015 S2 57629 Zeal \(I ) 61- E 1 20 5 00005300 5 N 11/08/20 5 4 9 14. 5 50.00 0 100. 3356.20%
line.Replace(@"\(", "");
不修改 string
。它只是 return 新更改 string
。你应该写:
line = line.Replace(@"\(", "");
检查 String.Replace
的文档:
Returns a new string in which all occurrences of a specified string in the current instance are replaced with another specified string.
您需要使用:
line=line.Replace(@"\(", "");
看起来你写的比实际需要的太多了。
var allLines = File.ReadAllLines(@"C:\myfile.text");
var correctedLines = allLines.Select(l => l.Replace(@"\(", "").Replace(@"\)", ""));
//now use corrected lines in your code