Getting "Ole::Storage::FormatError: OLE2 signature is invalid" when trying to get content out of a Word doc
Getting "Ole::Storage::FormatError: OLE2 signature is invalid" when trying to get content out of a Word doc
我正在使用 Rails 5. 我想从 Word 文档 (.doc) 中获取文本,所以我正在使用此代码
text = nil
MSWordDoc::Extractor.load(file_location) do |ctl00_MainContent_List1_grdData|
text = contents.whole_contents
end
但我收到以下错误。我的 Gemfile
中有这个 gem
gem 'msworddoc-extractor'
我还需要做什么才能从 Word 文档中获取内容?如果我可以像对 .doc
文件一样对 .docx
文件应用相同的代码,那就太好了。
/Users/davea/.rvm/gems/ruby-2.4.0/gems/ruby-ole-1.2.12/lib/ole/support.rb:201: warning: constant ::Fixnum is deprecated
Ole::Storage::FormatError: OLE2 signature is invalid
from /Users/davea/.rvm/gems/ruby-2.4.0/gems/ruby-ole-1.2.12/lib/ole/storage/base.rb:378:in `validate!'
from /Users/davea/.rvm/gems/ruby-2.4.0/gems/ruby-ole-1.2.12/lib/ole/storage/base.rb:370:in `initialize'
from /Users/davea/.rvm/gems/ruby-2.4.0/gems/ruby-ole-1.2.12/lib/ole/storage/base.rb:112:in `new'
from /Users/davea/.rvm/gems/ruby-2.4.0/gems/ruby-ole-1.2.12/lib/ole/storage/base.rb:112:in `load'
from /Users/davea/.rvm/gems/ruby-2.4.0/gems/ruby-ole-1.2.12/lib/ole/storage/base.rb:79:in `initialize'
from /Users/davea/.rvm/gems/ruby-2.4.0/gems/ruby-ole-1.2.12/lib/ole/storage/base.rb:85:in `new'
from /Users/davea/.rvm/gems/ruby-2.4.0/gems/ruby-ole-1.2.12/lib/ole/storage/base.rb:85:in `open'
from /Users/davea/.rvm/gems/ruby-2.4.0/gems/msworddoc-extractor-0.2.0/lib/msworddoc/extractor.rb:11:in `load'
from /Users/davea/Documents/workspace/myproject/app/services/msword_processor_service.rb:12:in `pre_process_data'
from /Users/davea/Documents/workspace/myproject/app/services/abstract_import_service.rb:88:in `process_race_data'
from (irb):2
from /Users/davea/.rvm/gems/ruby-2.4.0@global/gems/railties-5.0.1/lib/rails/commands/console.rb:65:in `start'
from /Users/davea/.rvm/gems/ruby-2.4.0@global/gems/railties-5.0.1/lib/rails/commands/console_helper.rb:9:in `start'
from /Users/davea/.rvm/gems/ruby-2.4.0@global/gems/railties-5.0.1/lib/rails/commands/commands_tasks.rb:78:in `console'
from /Users/davea/.rvm/gems/ruby-2.4.0@global/gems/railties-5.0.1/lib/rails/commands/commands_tasks.rb:49:in `run_command!'
from /Users/davea/.rvm/gems/ruby-2.4.0@global/gems/railties-5.0.1/lib/rails/commands.rb:18:in `<top (required)>'
from bin/rails:4:in `require'
from bin/rails:4:in `<main>'
您正在使用的 gem 具有 gem ruby-ole
作为依赖项。可以看到in the code:
ole = Ole::Storage.open(file)
当您导入 Word 文档时,它实际上是由 ruby-ole
gem 打开的。如果 gem 无法验证文件格式是否正确,它将 raise an exception:
raise FormatError, "OLE2 signature is invalid" unless magic == MAGIC
MAGIC
指的是.doc
文件的头部,应该是like this:
# i have seen it pointed out that the first 4 bytes of hex,
# 0xd0cf11e0, is supposed to spell out docfile. hmmm :)
MAGIC = "\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1" # expected value of Header#magic
这里指的是CFBF header format的Word文档:
BYTE _abSig[8]; // [00H,08] {0xd0, 0xcf, 0x11, 0xe0, 0xa1, 0xb1,
// 0x1a, 0xe1} for current version
您的 .doc
文件不是有效的 Word 文档,或者它是由 ruby-ole
gem.[=24] 不支持的较新版本的 Word 创建的=]
我建议使用多个不同的 Word 文档重试该操作以找到兼容的类型,然后以该格式重新保存原始文档以重试。
我正在使用 Rails 5. 我想从 Word 文档 (.doc) 中获取文本,所以我正在使用此代码
text = nil
MSWordDoc::Extractor.load(file_location) do |ctl00_MainContent_List1_grdData|
text = contents.whole_contents
end
但我收到以下错误。我的 Gemfile
中有这个 gemgem 'msworddoc-extractor'
我还需要做什么才能从 Word 文档中获取内容?如果我可以像对 .doc
文件一样对 .docx
文件应用相同的代码,那就太好了。
/Users/davea/.rvm/gems/ruby-2.4.0/gems/ruby-ole-1.2.12/lib/ole/support.rb:201: warning: constant ::Fixnum is deprecated
Ole::Storage::FormatError: OLE2 signature is invalid
from /Users/davea/.rvm/gems/ruby-2.4.0/gems/ruby-ole-1.2.12/lib/ole/storage/base.rb:378:in `validate!'
from /Users/davea/.rvm/gems/ruby-2.4.0/gems/ruby-ole-1.2.12/lib/ole/storage/base.rb:370:in `initialize'
from /Users/davea/.rvm/gems/ruby-2.4.0/gems/ruby-ole-1.2.12/lib/ole/storage/base.rb:112:in `new'
from /Users/davea/.rvm/gems/ruby-2.4.0/gems/ruby-ole-1.2.12/lib/ole/storage/base.rb:112:in `load'
from /Users/davea/.rvm/gems/ruby-2.4.0/gems/ruby-ole-1.2.12/lib/ole/storage/base.rb:79:in `initialize'
from /Users/davea/.rvm/gems/ruby-2.4.0/gems/ruby-ole-1.2.12/lib/ole/storage/base.rb:85:in `new'
from /Users/davea/.rvm/gems/ruby-2.4.0/gems/ruby-ole-1.2.12/lib/ole/storage/base.rb:85:in `open'
from /Users/davea/.rvm/gems/ruby-2.4.0/gems/msworddoc-extractor-0.2.0/lib/msworddoc/extractor.rb:11:in `load'
from /Users/davea/Documents/workspace/myproject/app/services/msword_processor_service.rb:12:in `pre_process_data'
from /Users/davea/Documents/workspace/myproject/app/services/abstract_import_service.rb:88:in `process_race_data'
from (irb):2
from /Users/davea/.rvm/gems/ruby-2.4.0@global/gems/railties-5.0.1/lib/rails/commands/console.rb:65:in `start'
from /Users/davea/.rvm/gems/ruby-2.4.0@global/gems/railties-5.0.1/lib/rails/commands/console_helper.rb:9:in `start'
from /Users/davea/.rvm/gems/ruby-2.4.0@global/gems/railties-5.0.1/lib/rails/commands/commands_tasks.rb:78:in `console'
from /Users/davea/.rvm/gems/ruby-2.4.0@global/gems/railties-5.0.1/lib/rails/commands/commands_tasks.rb:49:in `run_command!'
from /Users/davea/.rvm/gems/ruby-2.4.0@global/gems/railties-5.0.1/lib/rails/commands.rb:18:in `<top (required)>'
from bin/rails:4:in `require'
from bin/rails:4:in `<main>'
您正在使用的 gem 具有 gem ruby-ole
作为依赖项。可以看到in the code:
ole = Ole::Storage.open(file)
当您导入 Word 文档时,它实际上是由 ruby-ole
gem 打开的。如果 gem 无法验证文件格式是否正确,它将 raise an exception:
raise FormatError, "OLE2 signature is invalid" unless magic == MAGIC
MAGIC
指的是.doc
文件的头部,应该是like this:
# i have seen it pointed out that the first 4 bytes of hex,
# 0xd0cf11e0, is supposed to spell out docfile. hmmm :)
MAGIC = "\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1" # expected value of Header#magic
这里指的是CFBF header format的Word文档:
BYTE _abSig[8]; // [00H,08] {0xd0, 0xcf, 0x11, 0xe0, 0xa1, 0xb1,
// 0x1a, 0xe1} for current version
您的 .doc
文件不是有效的 Word 文档,或者它是由 ruby-ole
gem.[=24] 不支持的较新版本的 Word 创建的=]
我建议使用多个不同的 Word 文档重试该操作以找到兼容的类型,然后以该格式重新保存原始文档以重试。