windows 中的 Simstring (python) 安装

Simstring (python) installation in windows

我正在尝试通过 https://github.com/Georgetown-IR-Lab/simstring 在 windows 中安装 simstring python 包装器。对于 linux 它工作正常但是对于 windows 它在安装时给我错误。

    D:\Users\source\repos>python setup.py install
    running install
    running build
    running build_py
    running build_ext
    building '_simstring' extension
    C:\Program Files (x86)\Microsoft Visual Studio17\Community\VC\Tools\MSVC.12.25827\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -I. -IC:\ProgramData\Anaconda3\include -IC:\ProgramData\Anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual Studio17\Community\VC\Tools\MSVC.12.25827\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio17\Community\VC\Tools\MSVC.12.25827\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\include.0.16299.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\include.0.16299.0\shared" "-IC:\Program Files (x86)\Windows Kits\include.0.16299.0\um" "-IC:\Program Files (x86)\Windows Kits\include.0.16299.0\winrt" /EHsc /Tpexport.cpp /Fobuild\temp.win-amd64-3.6\Release\export.obj
    export.cpp
    export.cpp(7): fatal error C1083: Cannot open include file: 'iconv.h': No such file or directory
    error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\bin\HostX86\x64\cl.exe' failed with exit status 2

在此之后,我将 iconv.h 包含在项目中。但是现在它显示不同的错误。

running install
running build
running build_py
running build_ext
building '_simstring' extension
C:\Program Files (x86)\Microsoft Visual Studio17\Community\VC\Tools\MSVC.12.25827\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -I. -IC:\ProgramData\Anaconda3\include -IC:\ProgramData\Anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual Studio17\Community\VC\Tools\MSVC.12.25827\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio17\Community\VC\Tools\MSVC.12.25827\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\include.0.16299.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\include.0.16299.0\shared" "-IC:\Program Files (x86)\Windows Kits\include.0.16299.0\um" "-IC:\Program Files (x86)\Windows Kits\include.0.16299.0\winrt" /EHsc /Tpexport.cpp /Fobuild\temp.win-amd64-3.6\Release\export.obj
export.cpp
d:\users\aki\source\repos\simstring\cdbpp.h(101): warning C4267: 'initializing': conversion from 'size_t' to 'uint32_t', possible loss of data
export.cpp(37): error C2664: 'size_t libiconv(libiconv_t,const char **,size_t *,char **,size_t *)': cannot convert argument 2 from 'char **' to 'const char **'
export.cpp(37): note: Conversion loses qualifiers
export.cpp(140): note: see reference to function template instantiation 'bool iconv_convert<std::string,std::wstring>(libiconv_t,const source_type &,destination_type &)' being compiled
        with
        [
            source_type=std::string,
            destination_type=std::wstring
        ]
error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\bin\HostX86\x64\cl.exe' failed with exit status 2

感谢任何帮助或指导。

底注:

  • 我设法完成了构建过程,但我在某一时刻卡住了。我创建了 (我在那个问题上花了很多时间)。我以某种方式让它工作,但是在尝试构建 simstring 时还有其他(类似的?)错误,所以我不得不去除一些(Nix 基于) 代码(未编译)out

  • simstring是用C++写的。当构建C++ (C)代码时,结果是PEPortable可执行文件 (.exe, .dll)。检查 以获取有关如何转换代码的更多详细信息。在处理依赖于(loads).dlls的.exe时,有一定的限制:

    • .exe(在本例中为 python.exe)的架构 (32 vs. 64 位或 (x86 vs. x64 (或AMD64))) 必须匹配它加载的任何 .dll(以及其他 .dll加载的 .dll 加载,依此类推),因此依赖树中的所有 dll,否则 。 dll 不会加载

    • 平台(Debug vs. Release)应该匹配在某些情况下。如果没有,可能会发生以下情况:[SO]: When using fstream in a library I get linker errors in the executable (@CristiFati's answer),但我不认为我们处于那种情况

    • 构建工具在某些(其他)情况下也应该匹配。例子:
      • 编译器类型 ([SO]: Python extensions with C: staticforward (@CristiFati's answer))
      • CRT运行时间()
      • CRT 运行time version 在我们的案例中很重要。检查 [Python.Wiki]: WindowsCompilers PythonVStudio 版本之间的兼容性。请注意,这仅适用于下载和安装的 Python 版本(如果您从源构建 Python,那么您应该使用相同的构建工具 - 但我想这里不是这种情况)
        • 我看到你用的是VStudio 2017,所以兼容版本是Python3.5 Python3.61。我在我的机器上安装了 ~10 Python(一些是安装的,一些是我构建的 - 使用不同的编译器;其中大部分是 x64, 我也有一些 VEnvs,但这应该没有什么区别)。我还安装了 5 个 VStudio 版本,在我的例子中,setup.py 自动选择 VStudio 2015(但没关系,因为 VStudio 2017 它有编译器 v14.0
    • simstring 依赖于 libiconv 也作为 .dll (实际上有更多,但我们只关心一个)。使用 Dependency Walker 检查 .dll(见下文)显示它是 x86 2。这意味着:
        应该使用
      • Python 32 位 (x86)。这是我要使用的变体。从12,唯一可用的版本我的机器是 Python 3.6 x86Python 3.5 是我选择的版本,我也有 32 位格式的,但我搞砸了,没有重新安装)
      • 从源代码构建 libiconv,并摆脱限制 2。但是,这可能需要时间,而且不在当前问题的范围内。如果有关于构建它的问题,我会花一些时间试一试,因为我喜欢那种任务 ()

演练:

  • 创建一个目录并 cd 到它(应该是空的)。这将是 %ROOT_DIR%,我将要使用的所有路径都将相对于它(当然绝对路径除外),这将是默认目录(未指定时)
  • 下载 simstring 来源 ([GitHub]: Georgetown-IR-Lab/simstring - simstring-master.zip)
  • 解压缩档案 - 它将在目录 simstring-master(将自动创建)
  • 中执行
  • 创建目录 libiconv。在里面,下载:
    1. [SourceForge]: gnuwin32/GnuWin - libiconv-1.9.2-1-lib.zip
    2. [SourceForge]: gnuwin32/GnuWin - libiconv-1.9.2-1-bin.zip
    3. 从这些文件中提取需要的东西:
      • 来自#1.
        • include 目录 - 在 compile 阶段使用
        • lib 目录 - 在 link 阶段使用
        • 两个阶段都由setup.py执行(下)
      • 来自#2.
        • bin 目录 - 在 运行 时使用(使用(导入)模块时)
  • cdsimstring-master 目录。为了构建扩展,我正在使用 setup.pybuild_ext 命令(由 [=81= 递归调用) ]install - 如您的输出所示):[Python 3]: distutils.command.build_ext - Build any extensions in a package
  • 运行 build_ext,会产生你的错误:

    export.cpp(7): fatal error C1083: Cannot open include file: 'iconv.h': No such file or directory
    

    那是因为 Python 构建系统不知道我们做了什么(在 libiconv 目录中)。要让它知道,请传递:

    1. -I (--include-dirs) - 将被翻译成 [MS.Docs]: /I (Additional include directories)
    2. -L (--library-dirs) - 将被翻译成 [MS.Docs]: /LIBPATH (Additional Libpath)
    3. -l (--libraries) - 将被翻译成[MS.Docs]: LINK Input Files


    标志(python setup.py build_ext --help 将显示所有标志)。现在,不要通过 #2.#3. 因为我们不会到达 link阶段(需要的地方):

    (py36x86_test) E:\Work\Dev\Whosebug\q048528041\simstring-master>"e:\Work\Dev\VEnvs\py36x86_test\Scripts\python.exe" setup.py build_ext -I"../libiconv/include"
    running build_ext
    building '_simstring' extension
    C:\Install\x86\Microsoft\Visual Studio Community15\VC\BIN\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -I. -I../libiconv/include -Ic:\Install\x86\Python\Python.6\include -Ic:\Install\x86\Python\Python.6\include "-IC:\Install\x86\Microsoft\Visual Studio Community15\VC\INCLUDE" "-IC:\Install\x86\Microsoft\Visual Studio Community15\VC\ATLMFC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\include.0.16299.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\include.0.16299.0\shared" "-IC:\Program Files (x86)\Windows Kits\include.0.16299.0\um" "-IC:\Program Files (x86)\Windows Kits\include.0.16299.0\winrt" /EHsc /Tpexport.cpp /Fobuild\temp.win32-3.6\Release\export.obj
    export.cpp
    export.cpp(112): warning C4297: 'writer::~writer': function assumed not to throw an exception but does
    export.cpp(112): note: destructor or deallocator has a (possibly implicit) non-throwing exception specification
    export.cpp(126): warning C4297: 'writer::~writer': function assumed not to throw an exception but does
    export.cpp(126): note: destructor or deallocator has a (possibly implicit) non-throwing exception specification
    export.cpp(37): error C2664: 'size_t libiconv(libiconv_t,const char **,size_t *,char **,size_t *)': cannot convert argument 2 from 'char **' to 'const char **'
    export.cpp(37): note: Conversion loses qualifiers
    export.cpp(140): note: see reference to function template instantiation 'bool iconv_convert<std::basic_string<char,std::char_traits<char>,std::allocator<char>>,std::wstring>(libiconv_t,const source_type &,destination_type &)' being compiled
    with
    [
        source_type=std::basic_string<char,std::char_traits<char>,std::allocator<char>>,
        destination_type=std::wstring
    ]
    error: command 'C:\Install\x86\Microsoft\Visual Studio Community\2015\VC\BIN\cl.exe' failed with exit status 2
    
  • 待办事项(发现一一修正了错误,只有export.cpp需要修改):

    1. #define ICONV_CONST constcl.exe不会自动施放constness)
    2. #define __SIZEOF_WCHAR_T__ 2(因为 sizeof(wchar_t)2
    3. 剔除不编译的代码(我在开头谈到的):STL containers with 4 byte chars在Win上不编译,想修复代码,等Win支持这样的chars,代码将编译 OOTB,但我做不到,所以我不得不为 [=81= 做任何事情]OSX。因此,#ifdef __APPLE__ 应替换为 #if defined(__APPLE__) || defined(WIN32)(出现 5 次)


    注意#1。和#2。可以(应该)通过 cmdline 完成(-D 标志,但我无法为定义的标志指定值),或在 setup.py 中(因此即使需要在很多文件中声明它们也只定义一次),但我没有花太多时间在上面,所以我直接在源代码中替换它们。


    要么手动应用更改,要么保存:

    --- export.cpp.orig 2016-11-30 18:53:32.000000000 +0200
    +++ export.cpp  2018-02-14 13:36:31.317953200 +0200
    @@ -19,9 +19,18 @@
     #endif/*USE_LIBICONV_GNU*/
    
     #ifndef ICONV_CONST
    +#if defined (WIN32)
    +#define ICONV_CONST const
    +#else
     #define ICONV_CONST
    +#endif
     #endif/*ICONV_CONST*/
    
    +#if defined (WIN32)
    +#define __SIZEOF_WCHAR_T__ 2
    +#endif
    +
    +
     template <class source_type, class destination_type>
     bool iconv_convert(iconv_t cd, const source_type& src, destination_type& dst)
     {
    @@ -269,7 +278,7 @@
         iconv_close(bwd);
     }
    
    -#ifdef __APPLE__
    +#if defined(__APPLE__) || defined(WIN32)
     #include <cassert>
     #endif
    
    @@ -283,7 +292,7 @@
             retrieve_thru(dbr, query, this->measure, this->threshold, std::back_inserter(ret));
             break;
         case 2:
    -#ifdef __APPLE__
    +#if defined(__APPLE__) || defined(WIN32)
     #if __SIZEOF_WCHAR_T__ == 2
             retrieve_iconv<wchar_t>(dbr, query, UTF16, this->measure, this->threshold, std::back_inserter(ret));
     #else
    @@ -294,7 +303,7 @@
     #endif
             break;
         case 4:
    -#ifdef __APPLE__
    +#if defined(__APPLE__) || defined(WIN32)
     #if __SIZEOF_WCHAR_T__ == 4
             retrieve_iconv<wchar_t>(dbr, query, UTF32, this->measure, this->threshold, std::back_inserter(ret));
     #else
    @@ -317,7 +326,7 @@
             std::string qstr = query;
             return dbr.check(qstr, translate_measure(this->measure), this->threshold);
         } else if (dbr.char_size() == 2) {
    -#ifdef __APPLE__
    +#if defined(__APPLE__) || defined(WIN32)
     #if __SIZEOF_WCHAR_T__ == 2
             std::basic_string<wchar_t> qstr;
     #else
    @@ -333,7 +342,7 @@
             iconv_close(fwd);
             return dbr.check(qstr, translate_measure(this->measure), this->threshold);
         } else if (dbr.char_size() == 4) {
    -#ifdef __APPLE__
    +#if defined(__APPLE__) || defined(WIN32)
     #if __SIZEOF_WCHAR_T__ == 4
             std::basic_string<wchar_t> qstr;
     #else
    

    作为simstring_win.diff。那是一个 diff。见 (Patching utrunner section) for how to apply patches on Win (basically, every line that starts with one "+" sign goes in, and every line that starts with one "-" sign goes out). I am using Cygwin, btw.
    I also submitted this patch to [GitHub]: Georgetown-IR-Lab/simstring - Support for Win今天合并180222)。

    (py36x86_test) E:\Work\Dev\Whosebug\q048528041\simstring-master>"c:\Install\x64\Cygwin\Cygwin\AllVers\bin\patch.exe" -i "../simstring_win.diff"
    patching file export.cpp
    
    (py36x86_test) E:\Work\Dev\Whosebug\q048528041\simstring-master>rem Looking at export.cpp content, you'll notice the changes
    
    (py36x86_test) E:\Work\Dev\Whosebug\q048528041\simstring-master>"e:\Work\Dev\VEnvs\py36x86_test\Scripts\python.exe" setup.py build_ext  -I"../libiconv/include" -L"../libiconv/lib" -llibiconv
    running build_ext
    building '_simstring' extension
    C:\Install\x86\Microsoft\Visual Studio Community15\VC\BIN\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -I. -I../libiconv/include -Ic:\Install\x86\Python\Python.6\include -Ic:\Install\x86\Python\Python.6\include "-IC:\Install\x86\Microsoft\Visual Studio Community15\VC\INCLUDE" "-IC:\Install\x86\Microsoft\Visual Studio Community15\VC\ATLMFC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\include.0.16299.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\include.0.16299.0\shared" "-IC:\Program Files (x86)\Windows Kits\include.0.16299.0\um" "-IC:\Program Files (x86)\Windows Kits\include.0.16299.0\winrt" /EHsc /Tpexport.cpp /Fobuild\temp.win32-3.6\Release\export.obj
    export.cpp
    export.cpp(121): warning C4297: 'writer::~writer': function assumed not to throw an exception but does
    export.cpp(121): note: destructor or deallocator has a (possibly implicit) non-throwing exception specification
    export.cpp(135): warning C4297: 'writer::~writer': function assumed not to throw an exception but does
    export.cpp(135): note: destructor or deallocator has a (possibly implicit) non-throwing exception specification
    C:\Install\x86\Microsoft\Visual Studio Community15\VC\BIN\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -I. -I../libiconv/include -Ic:\Install\x86\Python\Python.6\include -Ic:\Install\x86\Python\Python.6\include "-IC:\Install\x86\Microsoft\Visual Studio Community15\VC\INCLUDE" "-IC:\Install\x86\Microsoft\Visual Studio Community15\VC\ATLMFC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\include.0.16299.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\include.0.16299.0\shared" "-IC:\Program Files (x86)\Windows Kits\include.0.16299.0\um" "-IC:\Program Files (x86)\Windows Kits\include.0.16299.0\winrt" /EHsc /Tpexport_wrap.cpp /Fobuild\temp.win32-3.6\Release\export_wrap.obj
    export_wrap.cpp
    C:\Install\x86\Microsoft\Visual Studio Community15\VC\BIN\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:c:\Install\x86\Python\Python.6\Libs /LIBPATH:../libiconv/lib /LIBPATH:e:\Work\Dev\VEnvs\py36x86_test\libs /LIBPATH:e:\Work\Dev\VEnvs\py36x86_test\PCbuild\win32 "/LIBPATH:C:\Install\x86\Microsoft\Visual Studio Community15\VC\LIB" "/LIBPATH:C:\Install\x86\Microsoft\Visual Studio Community15\VC\ATLMFC\LIB" "/LIBPATH:C:\Program Files (x86)\Windows Kits\lib.0.16299.0\ucrt\x86" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK.6.1\lib\um\x86" "/LIBPATH:C:\Program Files (x86)\Windows Kits\lib.0.16299.0\um\x86" libiconv.lib /EXPORT:PyInit__simstring build\temp.win32-3.6\Release\export.obj build\temp.win32-3.6\Release\export_wrap.obj /OUT:build\lib.win32-3.6\_simstring.cp36-win32.pyd /IMPLIB:build\temp.win32-3.6\Release\_simstring.cp36-win32.lib
       Creating library build\temp.win32-3.6\Release\_simstring.cp36-win32.lib and object build\temp.win32-3.6\Release\_simstring.cp36-win32.exp
    Generating code
    Finished generating code
    
    (py36x86_test) E:\Work\Dev\Whosebug\q048528041\simstring-master>dir /b "build\lib.win32-3.6"
    _simstring.cp36-win32.pyd
    
  • 终于建好了。 .pyd 只是一个 .dll。这是 Dependency Walker:

    中的样子

  • 我们试试看能不能用:

    (py36x86_test) E:\Work\Dev\Whosebug\q048528041\simstring-master>"e:\Work\Dev\VEnvs\py36x86_test\Scripts\python.exe" sample.py
    Traceback (most recent call last):
      File "E:\Work\Dev\Whosebug\q048528041\simstring-master\simstring.py", line 18, in swig_import_helper
        fp, pathname, description = imp.find_module('_simstring', [dirname(__file__)])
      File "e:\Work\Dev\VEnvs\py36x86_test\lib\imp.py", line 296, in find_module
        raise ImportError(_ERR_MSG.format(name), name=name)
    ImportError: No module named '_simstring'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "sample.py", line 3, in <module>
        import simstring
      File "E:\Work\Dev\Whosebug\q048528041\simstring-master\simstring.py", line 28, in <module>
        _simstring = swig_import_helper()
      File "E:\Work\Dev\Whosebug\q048528041\simstring-master\simstring.py", line 20, in swig_import_helper
        import _simstring
    ModuleNotFoundError: No module named '_simstring'
    

    那是因为在导入simstring时,又会导入_simstring.pyd), Python 没找到。要解决此问题:

    (py36x86_test) E:\Work\Dev\Whosebug\q048528041\simstring-master>set PYTHONPATH=%PYTHONPATH%;build\lib.win32-3.6
    
    (py36x86_test) E:\Work\Dev\Whosebug\q048528041\simstring-master>set PATH=%PATH%;..\libiconv\bin
    
    (py36x86_test) E:\Work\Dev\Whosebug\q048528041\simstring-master>"e:\Work\Dev\VEnvs\py36x86_test\Scripts\python.exe" sample.py
    ('Barack Hussein Obama II',)
    ('James Gordon Brown',)
    ()
    ('Barack Hussein Obama II',)
    

最后的笔记:

  • 模块有一些输出,它与 Lnx (Ubtu) 上的输出相同(我也在那里构建它 -在那里我没有遇到任何问题),我不确定它语义是否正确
  • 我没有运行setup.py安装 命令(我不会),我能想到的一件事可能会出错(虽然我不确定它会出错),不是 copying/including libiconv2.dll进入pkg。如果是这样,您可能需要修改 setup.py(更改应该很小)

我能够在 Cygwin 下构建该存储库。需要安装 libiconv-devel 和 python3-devel 包。

之后,我又进行了一项更改,以确保 libiconv 可用于 Windows 构建。我在这里做了那个单一的承诺:

https://github.com/burgersmoke/simstring

除了我关于在 Cygwin 下构建的其他回应之外,我还进行了一些其他更改,以允许使用 Anaconda 与 Windows 无缝构建和安装。原来 conda 可以很容易地安装 iconv。

其中大部分是基于 ChristiFati 在此线程中添加的工作,此更改旨在简化步骤和潜在安装。

此更改当前存在于我自己的分支中。步骤在此处的自述文件中。我还为此提交了一个 Pull Request。

更新:这个 pull request 现在已经被纳入 Georgetown repo,所以你可以在这里得到它: https://github.com/Georgetown-IR-Lab/simstring

附带说明一下,这样做的动机之一是让这个 repo 更容易设置:https://github.com/Georgetown-IR-Lab/QuickUMLS