如何在一行 CSV 中获取 PDF 输出
How to get PDF output in one CSV line
在我的程序中,csv 为每个输入创建一个新行。喜欢:
- 约翰;费伦
- 大理石街;45
有没有办法在一行中全部搞定?
我当前的代码:
static void Main(string[] args)
{
string path = @"C:\Users\burak\Desktop\todo";
StreamWriter write = new StreamWriter(@"C:\Users\burak\Desktop\todo\test.csv");
foreach (var file in Directory.GetFiles(path, "*.pdf", SearchOption.TopDirectoryOnly))
{
StringBuilder text = new StringBuilder();
PdfReader pdfReader = new PdfReader(file);
string currentText ="";
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
currentText = string.Join(";", currentText.Split(' ', ':', '/'));
currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
// text.Append(currentText);
pdfReader.Close();
}
text.ToString();
write.Write(currentText);
Console.WriteLine(text.ToString());
}
write.Close();
}
我试过的:
获取空格以将其合并为一行,但这根本不起作用..
要删除所有换行符,我们可以将它们替换为空字符串。要获取当前系统的新行,请使用 System.Environment.NewLine
。现在所有页面的所有 PDF 文本都在同一行上。现在要为每个新的 PDF 文件添加一个换行符,我们可以在字符串末尾添加一个 System.Environment.NewLine
,然后将整个 PDF 写入 CSV 文件。
示例:
static void Main(string[] args) {
// ...
StreamWriter write = new StreamWriter(@"C:\Users\burak\Desktop\todo\test.csv");
// ...
foreach (var file in Directory.GetFiles(path, "*.pdf", SearchOption.TopDirectoryOnly)) {
// ...
for (int page = 1; page <= pdfReader.NumberOfPages; page++) {
// ...
currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
// ...
}
// Replace newLines
currentText = currentText.Replace(System.Environment.NewLine, string.Empty);
// Add newLine to currentText
currentText += System.Environment.NewLine;
write.Write(currentText);
}
write.Close();
}
可能输入的文本中有回车或换行。你可以试试这个:
write.Write(currentText.Replace("\r", "").Replace("\n", ""));
在我的程序中,csv 为每个输入创建一个新行。喜欢:
- 约翰;费伦
- 大理石街;45
有没有办法在一行中全部搞定?
我当前的代码:
static void Main(string[] args)
{
string path = @"C:\Users\burak\Desktop\todo";
StreamWriter write = new StreamWriter(@"C:\Users\burak\Desktop\todo\test.csv");
foreach (var file in Directory.GetFiles(path, "*.pdf", SearchOption.TopDirectoryOnly))
{
StringBuilder text = new StringBuilder();
PdfReader pdfReader = new PdfReader(file);
string currentText ="";
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
currentText = string.Join(";", currentText.Split(' ', ':', '/'));
currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
// text.Append(currentText);
pdfReader.Close();
}
text.ToString();
write.Write(currentText);
Console.WriteLine(text.ToString());
}
write.Close();
}
我试过的:
获取空格以将其合并为一行,但这根本不起作用..
要删除所有换行符,我们可以将它们替换为空字符串。要获取当前系统的新行,请使用 System.Environment.NewLine
。现在所有页面的所有 PDF 文本都在同一行上。现在要为每个新的 PDF 文件添加一个换行符,我们可以在字符串末尾添加一个 System.Environment.NewLine
,然后将整个 PDF 写入 CSV 文件。
示例:
static void Main(string[] args) {
// ...
StreamWriter write = new StreamWriter(@"C:\Users\burak\Desktop\todo\test.csv");
// ...
foreach (var file in Directory.GetFiles(path, "*.pdf", SearchOption.TopDirectoryOnly)) {
// ...
for (int page = 1; page <= pdfReader.NumberOfPages; page++) {
// ...
currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
// ...
}
// Replace newLines
currentText = currentText.Replace(System.Environment.NewLine, string.Empty);
// Add newLine to currentText
currentText += System.Environment.NewLine;
write.Write(currentText);
}
write.Close();
}
可能输入的文本中有回车或换行。你可以试试这个:
write.Write(currentText.Replace("\r", "").Replace("\n", ""));