将值识别为字符串

Question

此代码应该从 excel 文件中获取字符串值。该值是如何不被识别为字符串的。如何获取字符串形式的查询？ str(string) 似乎不起作用。

def main():
    file_location = "/Users/ronald/Desktop/Twitter/TwitterData.xlsx" 
    workbook = xlrd.open_workbook(file_location) #open work book
    worksheet = workbook.sheet_by_index(0)
    num_rows = worksheet.nrows - 1
    num_cells = worksheet.ncols - 1
    curr_row = 0
    curr_cell = 3
    count = 0
    string = 'tweet'
    tweets = []
    while curr_row < num_rows:
        curr_row += 1
        tweet = worksheet.cell_value(curr_row, curr_cell)
        tweet.encode('ascii', 'ignore')
        #print tweet
        query = str(tweet)
        if (isinstance(query, str)):
            print "it is a string"
        else:
            print "it is not a string"

这是我不断收到的错误。

UnicodeEncodeError: 'ascii' codec can't encode characters in position 102-104: ordinal not in range(128)

Answer 1

Python 中有两种不同的类型，它们都以不同的方式表示字符串。

str 或 bytes：这是 Python 2 中的默认值（因此 str），在 Python 3. 它将字符串表示为一系列字节，这对于unicode不太适用，因为每个字符不一定是ASCII和其他一些编码中的一个字节。
unicode 或 str：这是 Python 中的默认值 3. Unicode 处理带有重音符号和国际字符的字符，因此尤其是在处理类似推特，这就是你想要的。在 Python 2 中，这也是导致某些字符串具有小 u'' 前缀的原因。

您的 "is this a string?" 测试由 isinstance(s, str) 组成，它只测试第一种类型而忽略另一种。相反，您可以针对 basestring -- isinstance(s, basestring) -- 进行测试，因为它是 str 和 unicode 的父级。这正确地回答了 "is this a string?" for Python 2 的问题，这就是为什么你得到误导性结果的原因。

请注意，如果您迁移到 Python 3，basestring 将不存在。这只是 Python 2 测试。

将值识别为字符串

Recognizing value as a string

python

string

ascii

xlrd

python-2.7