TypeError: expected bytes, str found in custom python function

Question

我正在使用一种名为 Giggle 的新生物信息学工具，并且在我的系统上安装了 python 包装器。尽管场景很具体，但我认为这个问题很普遍。这个函数：

index = Giggle.create("index", "HMEC_hg19_BroadHMM_ALL.bed")

应该基于多个（或在本例中为一个）.bed 文件创建索引。床文件如下所示：

chr1    10000   10600   15_Repetitive/CNV   0   .   10000   10600   245,245,245
chr1    10600   11137   13_Heterochrom/lo   0   .   10600   11137   245,245,245
chr1    11137   11737   8_Insulator 0   .   11137   11737   10,190,254
chr1    11737   11937   11_Weak_Txn 0   .   11737   11937   153,255,102
chr1    11937   12137   7_Weak_Enhancer 0   .   11937   12137   255,252,4
chr1    12137   14537   11_Weak_Txn 0   .   12137   14537   153,255,102
chr1    14537   20337   10_Txn_Elongation   0   .   14537   20337   0,176,80

它基本上是一个大的制表符分隔文件，包含基因组间隔及其相应的染色体。当运行执行上述命令时，出现以下错误：

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "giggle/giggle.pyx", line 25, in giggle.giggle.Giggle.create
TypeError: expected bytes, str found

我不知道为什么会这样，我尝试将文件转换为其他类型的编码，但没有任何效果。错误所指的代码片段如下：

def create(self, char *path, char *glob):
    giggle_bulk_insert(to_bytes(glob), to_bytes(path), 1)
    return Giggle(path)

我在 Linux 子系统上为 windows 10 使用 Python 3.6。

Answer 1

问题是在 python 中 3 个字符串表示为 unicode 字符串，而不是 python 中的字节字符串 2. 当你安装 giggle 和运行你的使用 python 2 的代码一切正常。但你可以这样做：

index = Giggle.create("index".encode('utf-8'), "HMEC_hg19_BroadHMM_ALL.bed".encode('utf-8'))

或者

index = Giggle.create(b"index", b"HMEC_hg19_BroadHMM_ALL.bed")

具有明确的字节串。它对我有用，直到傻笑抱怨 .bed 文件格式不正确（我可能在复制时弄乱了格式）

更新： 如上所述调用它时会出现另一个问题：

File type not supported 'HMEC_hg19_BroadHMM_ALL.bed'

这是由于底层库giggle只接受.bed.gz个文件造成的，可见python-giggle/lib/giggle/src/file_read.c:

if ( (strlen(i->file_name) > 7) &&
    strcmp(".bed.gz", file_name + strlen(i->file_name) - 7) == 0) {
    i->type = BED;
}

所以我假设 python-giggle 站点上的自述文件声称您可以使用 .bed 文件调用它是不正确的。

我用 python-giggle\lib\giggle\test\data 中提供的文件之一对其进行了测试，运行没有错误

Answer 2

create() 方法需要字节字符串：

create(self, char *path, char *glob):

Cython 只能接受Python 3 中的bytes 对象，Python 2 中的str 对象，自动转换为char 数组。

要么在调用方法时传入 bytes 对象（首先对 str 对象进行编码），要么更改该方法签名以接受 str unicode 字符串。请参阅 Cython 教程中的 Accepting strings from Python code。

Answer 3

在 utf-8 中编码您的字符串将解决您的问题：

yourstr.encode('utf-8')

TypeError: expected bytes, str found in custom python function

TypeError: expected bytes, str found in custom python function

python

error-handling

encoding

function

bioinformatics