Flask / postgres - 使用 PDFJS 显示 pdf

Flask / postgres - display pdf with PDFJS

我有一个非常简单的应用程序。用户通过 Web 前端将 pdf 文件上传到 postgres 数据库。然后应该通过 pdfjs 在浏览器中呈现该 pdf。

我相当确定我的问题是编码问题,但我认为我对编码的了解不够深入,无法自行回答这个问题。

我的模型:

class Lesson(Base):
    __tablename__ = 'lessons'

    # Name of the lesson
    lesson_order = db.Column(db.Enum(LessonIndexes), nullable=False)
    name = db.Column(db.String(128), nullable=False)
    summary = db.Column(db.String(500))
    lesson_plan_id = db.Column(db.Integer(), ForeignKey('lesson_plans.id'), nullable=False)
    pdf = db.Column(db.LargeBinary())

我的控制器:

@mod_lp.route('/<lesson_plan_id>/create_lesson', methods=["POST"])
def create_lesson(lesson_plan_id):
    form = LessonForm()
    file = request.files['pdf']  # type: FileStorage

    if form.validate_on_submit():
        file = request.files['pdf']
        lesson = Lesson(form.lesson_order.data, form.name.data, form.summary.data, lesson_plan_id,
                        pdf=file.read() # this line here
                        )
        db.session.add(lesson)
        db.session.commit()
    return redirect(url_for('lesson_plan.show', lesson_plan_id=lesson_plan_id))

这存储的数据类似于:

%PDF-1.4
%����
1 0 obj
<</Creator (Mozilla/5.0 \(Macintosh; Intel Mac OS X 10_12_6\) AppleWebKit/537.36 \(KHTML, like Gecko\) Chrome/60.0.3112.113 Safari/537.36)
/Producer (Skia/PDF m60)
/CreationDate (D:20170916222407+00'00')
/ModDate (D:20170916222407+00'00')>>
endobj
2 0 obj
<</Filter /FlateDecode
/Length 1370>> stream
x���ݎ�4��<�������   qq�@%`aB�H�_�����T�E���ړ�c'�t�Z��[������}�{�I���@���

(etc...)

我的 javasript(取自 PDFJS,你好世界):

var pdfString = "{{ pdf_data}}";
var pdfData = atob(pdfString);
if (pdfData) {
    var loadingTask = PDFJS.getDocument({data: pdfData});
    loadingTask.promise.then(function (pdf) {
        console.log('PDF loaded');

        // Fetch the first page
        var pageNumber = 1;
        pdf.getPage(pageNumber).then(function (page) {
            console.log('Page loaded');

            var scale = 1.5;
            var viewport = page.getViewport(scale);

            // Prepare canvas using PDF page dimensions
            var canvas = document.getElementById('pdf-canvas');
            var context = canvas.getContext('2d');
            canvas.height = viewport.height;
            canvas.width = viewport.width;

            // Render PDF page into canvas context
            var renderContext = {
                canvasContext: context,
                viewport: viewport
            };
            var renderTask = page.render(renderContext);
            renderTask.then(function () {
                console.log('Page rendered');
            });
        });
    }, function (reason) {
        // PDF loading error
        console.error(reason);
    });

我目前的错误是:

6:108 Uncaught DOMException: Failed to execute 'atob' on 'Window': The string to be decoded is not correctly encoded.

我尝试过的事情:

file.stream.getvalue()

file.stream.getvalue().decode("latin-1") # for whatever reason, this was the only 'decode' that didn't throw an error

file.stream.getvalue().decode("latin-1").encode()

base64.b64encode(file.stream.getvalue().decode("latin-1").encode())

但是这些都以各种方式失败了。 更新:

如果我将数据库中的二进制数据发送到我的模板:

pdf_data = lesson.pdf

忘记调用 atob 了:

var pdfData = pdfString;
        if (pdfData) {
...

我收到这个错误:

Error: Invalid XRef stream header
pdf.worker.js:340     at error (http://0.0.0.0:8080/static/js/pdfjs/build/pdf.worker.js:340:17)
    at XRef_readXRef [as readXRef] (http://0.0.0.0:8080/static/js/pdfjs/build/pdf.worker.js:20943:13)
    at XRef_parse [as parse] (http://0.0.0.0:8080/static/js/pdfjs/build/pdf.worker.js:20613:28)
    at PDFDocument_setup [as setup] (http://0.0.0.0:8080/static/js/pdfjs/build/pdf.worker.js:26445:17)
    at PDFDocument_parse [as parse] (http://0.0.0.0:8080/static/js/pdfjs/build/pdf.worker.js:26336:12)
    at http://0.0.0.0:8080/static/js/pdfjs/build/pdf.worker.js:36120:28
    at Promise (<anonymous>)
    at LocalPdfManager_ensure [as ensure] (http://0.0.0.0:8080/static/js/pdfjs/build/pdf.worker.js:36115:14)
    at LocalPdfManager.BasePdfManager_ensureDoc [as ensureDoc] (http://0.0.0.0:8080/static/js/pdfjs/build/pdf.worker.js:36067:19)

atob 需要一个 base64 编码的字符串。我有一个基本示例,至少可以成功调用 atob。很确定这就是您所看到的问题。您可能只需将 base64 编码的内容保存在该 postgres table 中,这样您就不需要一直对其进行解码。 'source.pdf' 只是我在磁盘上的一个示例 pdf。但是,您可以将其与 postgres table.

中的数据交换

flask_app.py

from flask import Flask, request, render_template
import base64

app = Flask(__name__)


@app.route("/testing", methods=["GET"])
def get_test_file():
    with open("source.pdf", "rb") as data_file:
        data = data_file.read()
    encoded_data = base64.b64encode(data).decode('utf-8')
    return render_template("test.html", encoded_data=encoded_data)

test.html

<html>
<head>
</head>
<body>
  <script>
    var encoded_data = '{{ encoded_data }}';
    var pdf_data = atob(encoded_data);
  </script>
</body>
</html>