如何使用 MRJob 处理来自 SQL 查询的行

Question

我很难弄清楚 MRJob 是如何工作的。我正在尝试进行 sql 查询并生成其行，并且在文档中没有详细解释此类内容。

到目前为止我的代码：

# To be able to give db file as option.
def configure_options(self):
    super(MyClassName, self).configure_options()
    self.add_file_option('--database')

def mapper_init(self):
    # Making sqlite3 database available to mapper.
    self.sqlite_conn = sqlite3.connect(self.options.database)
    self.command= '''
        SELECT id
        FROM names
        '''

def mapper(self,_,val):        
    yield self.sqlite_conn.execute(self.command), 1

然后在控制台中我写

python myfile.py text.txt --database=mydb.db

其中 text.txt 是一个空的虚拟文件，因此脚本不会要求标准输入。

我希望输出为：

id1, 1
id2, 1

但是现在没有输出。我错过了什么？

Answer 1

我自己找到了解决方案，以备日后有人需要。在此示例中，数据库路径作为命令行中的选项给出。

def configure_options(self):
    super(MyClassName, self).configure_options()
    self.add_file_option('--database')

def mapper_init(self):
    # make sqlite3 database available to mapper
    self.sqlite_conn = sqlite3.connect(self.options.database)
    self.command = '''
        SELECT id
        FROM table
        '''

def mapper(self,_,val):        
    queryResult = self.sqlite_conn.execute(self.command)
    while 1:
        row = queryResult.fetchone()
        if row == None:
            break
        yield row[0], 1

从命令行执行：

python myfilename.py dummy.txt --database=mydatabase.db

请注意，当您添加虚拟文本文件时，它应该只包含一行，因为映射器将运行文本文件中行数的倍数。

如何使用 MRJob 处理来自 SQL 查询的行

How to process rows from SQL query with MRJob

python

sqlite

python-2.7

mrjob