Google 愿景 |越南语:低质量的 OCR 结果
Google Vision | Vietnamese: Low Quality OCR Results
背景
使用Google Vision API (with Node) 识别越南文字,结果质量欠佳。有一些(不是全部,但有一些)声调标记以及元音表示缺失。
与他们的在线演示相比,returns 一个不错的结果(向下滚动查看现场演示):
https://cloud.google.com/vision/
(因为我没有他们的公司账户,所以我不能直接问Google。)
问题
我可以调整我的请求以获得更好的结果吗?
我已经将语言提示设置为 "vi" 并尝试将其与 "en" 结合使用。我还尝试了更具体的 "vi-VN".
示例图片
https://www.tecc.org/Slatwall/custom/assets/images/product/default/cache/j056vt-_800w_800h_sb.jpg
示例代码
const fs = require("fs");
const path = require("path");
const vision = require("@google-cloud/vision");
async function quickstart() {
let text;
const fileName = "j056vt-_800w_800h_sb.jpg";
const imageFile = fs.readFileSync(fileName);
const image = Buffer.from(imageFile).toString("base64");
const client = new vision.ImageAnnotatorClient();
const request = {
image: {
content: image
},
imageContext: {
languageHints: ["vi", 'en']
}
};
const [result] = await client.textDetection(request);
for (const tmp of result.textAnnotations) {
text += tmp.description + '\n';
}
const out = path.basename(fileName, path.extname(fileName)) + ".txt";
fs.writeFileSync(out, text);
}
quickstart();
解决方案
// $env:GOOGLE_APPLICATION_CREDENTIALS="[PATH]"
const fs = require("fs");
const path = require("path");
const vision = require("@google-cloud/vision");
async function quickstart() {
let text = '';
const fileName = "j056vt-_800w_800h_sb.jpg";
const imageFile = fs.readFileSync(fileName);
const image = Buffer.from(imageFile).toString("base64");
const client = new vision.ImageAnnotatorClient();
const request = {
image: {
content: image
},
imageContext: {
languageHints: ["vi-VN"]
}
};
const [result] = await client.documentTextDetection(request);
// OUTPUT METHOD A
for (const tmp of result.textAnnotations) {
text += tmp.description + "\n";
}
console.log(text);
const out = path.basename(fileName, path.extname(fileName)) + ".txt";
fs.writeFileSync(out, text);
// OUTPUT METHOD B
const fullTextAnnotation = result.fullTextAnnotation;
console.log(`Full text: ${fullTextAnnotation.text}`);
fullTextAnnotation.pages.forEach(page => {
page.blocks.forEach(block => {
console.log(`Block confidence: ${block.confidence}`);
block.paragraphs.forEach(paragraph => {
console.log(`Paragraph confidence: ${paragraph.confidence}`);
paragraph.words.forEach(word => {
const wordText = word.symbols.map(s => s.text).join("");
console.log(`Word text: ${wordText}`);
console.log(`Word confidence: ${word.confidence}`);
word.symbols.forEach(symbol => {
console.log(`Symbol text: ${symbol.text}`);
console.log(`Symbol confidence: ${symbol.confidence}`);
});
});
});
});
});
}
quickstart();
这个问题已经回答 。
综上所述,本例中的 Demo 可能使用 DOCUMENT_TEXT_DETECTION,有时可以更彻底地提取字符串,而您使用的是 TEXT_DETECTION。
您可以尝试发出 client.document_text_detection 请求而不是 client.textDetection,您可能会得到更接近演示的结果。
如果您想阅读相关文档,可以找到它here.
希望这能解决您的问题!
背景
使用Google Vision API (with Node) 识别越南文字,结果质量欠佳。有一些(不是全部,但有一些)声调标记以及元音表示缺失。
与他们的在线演示相比,returns 一个不错的结果(向下滚动查看现场演示):
https://cloud.google.com/vision/
(因为我没有他们的公司账户,所以我不能直接问Google。)
问题
我可以调整我的请求以获得更好的结果吗?
我已经将语言提示设置为 "vi" 并尝试将其与 "en" 结合使用。我还尝试了更具体的 "vi-VN".
示例图片
https://www.tecc.org/Slatwall/custom/assets/images/product/default/cache/j056vt-_800w_800h_sb.jpg
示例代码
const fs = require("fs");
const path = require("path");
const vision = require("@google-cloud/vision");
async function quickstart() {
let text;
const fileName = "j056vt-_800w_800h_sb.jpg";
const imageFile = fs.readFileSync(fileName);
const image = Buffer.from(imageFile).toString("base64");
const client = new vision.ImageAnnotatorClient();
const request = {
image: {
content: image
},
imageContext: {
languageHints: ["vi", 'en']
}
};
const [result] = await client.textDetection(request);
for (const tmp of result.textAnnotations) {
text += tmp.description + '\n';
}
const out = path.basename(fileName, path.extname(fileName)) + ".txt";
fs.writeFileSync(out, text);
}
quickstart();
解决方案
// $env:GOOGLE_APPLICATION_CREDENTIALS="[PATH]"
const fs = require("fs");
const path = require("path");
const vision = require("@google-cloud/vision");
async function quickstart() {
let text = '';
const fileName = "j056vt-_800w_800h_sb.jpg";
const imageFile = fs.readFileSync(fileName);
const image = Buffer.from(imageFile).toString("base64");
const client = new vision.ImageAnnotatorClient();
const request = {
image: {
content: image
},
imageContext: {
languageHints: ["vi-VN"]
}
};
const [result] = await client.documentTextDetection(request);
// OUTPUT METHOD A
for (const tmp of result.textAnnotations) {
text += tmp.description + "\n";
}
console.log(text);
const out = path.basename(fileName, path.extname(fileName)) + ".txt";
fs.writeFileSync(out, text);
// OUTPUT METHOD B
const fullTextAnnotation = result.fullTextAnnotation;
console.log(`Full text: ${fullTextAnnotation.text}`);
fullTextAnnotation.pages.forEach(page => {
page.blocks.forEach(block => {
console.log(`Block confidence: ${block.confidence}`);
block.paragraphs.forEach(paragraph => {
console.log(`Paragraph confidence: ${paragraph.confidence}`);
paragraph.words.forEach(word => {
const wordText = word.symbols.map(s => s.text).join("");
console.log(`Word text: ${wordText}`);
console.log(`Word confidence: ${word.confidence}`);
word.symbols.forEach(symbol => {
console.log(`Symbol text: ${symbol.text}`);
console.log(`Symbol confidence: ${symbol.confidence}`);
});
});
});
});
});
}
quickstart();
这个问题已经回答
综上所述,本例中的 Demo 可能使用 DOCUMENT_TEXT_DETECTION,有时可以更彻底地提取字符串,而您使用的是 TEXT_DETECTION。
您可以尝试发出 client.document_text_detection 请求而不是 client.textDetection,您可能会得到更接近演示的结果。
如果您想阅读相关文档,可以找到它here.
希望这能解决您的问题!