Python Pandas 将循环的数据帧多处理器池追加到现有数据帧

Question

我有一个名为 df3 的数据框，有 5 列

我正在使用多处理器池将数据帧表从 bittrex.com 解析为名为 df2

的数据帧

我将进程减少到 2 只是为了简化我的代码作为测试

这是我的代码

import pandas as pd
import json
import urllib.request
import os
from urllib import parse
import csv
import datetime
from multiprocessing import Process, Pool
import time

df3 = pd.DataFrame(columns=['tickers', 'RSIS', 'CCIS', 'ICH', 'SMAS'])
tickers = ["BTC-1ST", "BTC-ADA"]

def http_get(url):
    result = {"url": url, "data": urllib.request.urlopen(url, timeout=60).read()}
    return result

urls = ["https://bittrex.com/Api/v2.0/pub/market/GetTicks?marketName=" + ticker + "&tickInterval=thirtyMin" for ticker in tickers ]

pool = Pool(processes=200)

results = pool.map(http_get, urls)

for result in results:
    j = json.loads(result['data'].decode())
    df2 = pd.DataFrame(data=j['result'])

    df2.rename(columns={'BV': 'BaseVolume', 'C': 'Close', 'H': 'High', 'L': 'Low', 'O': 'Open', 'T': 'TimeStamp',
                        'V': 'Volume'}, inplace=True)

    # Tenken-sen (Conversion Line): (9-period high + 9-period low)/2))
    nine_period_high = df2['High'].rolling(window=50).max()
    nine_period_low = df2['Low'].rolling(window=50).min()
    df2['tenkan_sen'] = (nine_period_high + nine_period_low) / 2

    # Kijun-sen (Base Line): (26-period high + 26-period low)/2))
    period26_high = df2['High'].rolling(window=250).max()
    period26_low = df2['Low'].rolling(window=250).min()
    df2['kijun_sen'] = (period26_high + period26_low) / 2

    TEN30L = df2.loc[df2.index[-1], 'tenkan_sen']
    TEN30LL = df2.loc[df2.index[-2], 'tenkan_sen']
    KIJ30L = df2.loc[df2.index[-1], 'kijun_sen']
    KIJ30LL = df2.loc[df2.index[-2], 'kijun_sen']

    if (TEN30LL < KIJ30LL) and (TEN30L > KIJ30L):
        df3.at[ticker, 'ICH'] = 'BUY'
    elif (TEN30LL > KIJ30LL) and (TEN30L < KIJ30L):
        df3.at[ticker, 'ICH'] = 'SELL'
    else:
        df3.at[ticker, 'ICH'] = 'NO'

    pool.close()
    pool.join()
    print(df2)

我的问题是我总是得到错误 NameError: name 'ticker' is not defined 这会让我生气为什么我收到此错误尽管我在 urls = ["https://bittrex.com/Api/v2.0/pub/market/GetTicks?marketName=" + ticker + "&tickInterval=thirtyMin" for ticker in tickers ] 行中将自动收报机预定义为 for 循环并且已经 python 成功使用了它。

我在谷歌上搜索了三天，尝试了多种解决方案，但都没有结果。

有什么想法吗???!!!!

Answer 1

我不认为你在看正确的行；当我运行你的代码时，我得到：

NameError                                 Traceback (most recent call last)
<ipython-input-1-fd766f4a9b8e> in <module>()
     49         df3.at[ticker, 'ICH'] = 'SELL'
     50     else:
---> 51         df3.at[ticker, 'ICH'] = 'NO'
     52 
     53     pool.close()

so 在第 51 行，而不是您创建 urls 列表的行。这是有道理的，因为 ticker 没有在该行的列表理解之外定义。问题与您使用多处理或 pandas 无关，但由于 Python 作用域规则：列表推导中的临时变量在其外部不可用；很难想象它会怎样，因为它已经迭代了几个值，除非你只对它的最后一个值感兴趣，这不是你想要的。

您可能必须在整个获取过程中跟踪代码，以便最终将结果与正确的代码相关联，例如：

def http_get(ticker):
    url = "https://bittrex.com/Api/v2.0/pub/market/GetTicks?marketName=" + ticker + "&tickInterval=thirtyMin"
    result = {"url": url, "data": urllib.request.urlopen(url, timeout=60).read(), "ticker": ticker}
    return result

pool = Pool(processes=200)

results = pool.map(http_get, tickers)

for result in results:
    j = json.loads(result['data'].decode())
    df2 = pd.DataFrame(data=j['result'])
    ticker = result['ticker']
    ...

Python Pandas 将循环的数据帧多处理器池追加到现有数据帧

Python Pandas append Data-frame multiprocessor pool for loop to exist Data-frame

python

for-loop

pool

multiprocessing

pandas