合并在 SQL 数据库数据上循环生成的 pandas 数据帧

Question

这有效，但输出与索引（日期）不匹配。取而代之的是添加新列，但从最后一行的第一个数据帧开始，即数据堆叠在彼此的“顶部”，因此重复日期索引。有没有办法迭代和创建与日期匹配的列？

indexReturnData = pd.DataFrame()
indexcount = int(input("how many indices do you want to use?  Enter a quantity: "))
i=0
while i < indexcount:
    indexList = pd.read_sql_query('SELECT * FROM Instruments', conn)
    indexList = indexList[['InstrumentName','InstrumentID']]
    indexList
    indexListFind = input('Enter partial index name: ')
    indexList = indexList[indexList['InstrumentName'].str.contains(indexListFind, case=False)]
    # #need to add and if else statement in case of errors....
    indexList = pd.DataFrame(indexList)
    print(indexList)
    indexID = input('Pick/Type in an INDEX list ID ("InstrumentID") From the List Above: ')
    
    indexName = indexList.query('InstrumentID ==' + str(indexID))['InstrumentName']
    indexName = list(indexName)  
    indexReturns = pd.read_sql_query("""
        SELECT *
        FROM InstrumentPrices 
        WHERE InstrumentID=""" + indexID
        , conn)

    indexReturns = indexReturns.filter(['ReportingDate', 'Returns'])  
    indexReturns = indexReturns.rename(columns={'ReportingDate': 'Date','Returns': indexName[0]})
    indexReturns = indexReturns.set_index('Date')
    
    indexReturnData = indexReturnData.append(indexReturns)
    i += 1

输出：

     Date       S&P500  S&P600
308 9/1/1995    0.042   
309 10/1/1995  -0.004   
310 11/1/1995   0.044   
311 12/1/1995   0.019   
…..    …..       …..      …..
603 4/1/2020    0.128   
604 5/1/2020    0.048   
605 6/1/2020    0.020   
606 7/1/2020    0.056   
623 9/1/1995             0.025
624 10/1/1995           -0.050
625 11/1/1995            0.038
626 12/1/1995            0.016
…..    …..       …..      …..
918 4/1/2020             0.126
919 5/1/2020             0.041
920 6/1/2020             0.036
921 7/1/2020             0.040

谢谢！

Answer 1

仅根据您当前的输出是什么以及我认为您想要的输出是什么，我认为您只需 df.groupby('Date').sum() 就可以逃脱。运行将对 'Date' 列中的所有重复项进行分组，并对它为每一列找到的所有值求和。如果我理解正确，每一列只有一个日期行值，所以它将 'sum' 那个数字：也就是说，它将 return 这个数字。

我复制了上面的小输出部分（并删除了空白行），然后 df.groupby('Date').sum() 得到了这个：

           S&P500  S&P600
Date
10/1/1995  -0.004  -0.050
11/1/1995   0.044   0.038
12/1/1995   0.019   0.016
4/1/2020    0.128   0.126
5/1/2020    0.048   0.041
6/1/2020    0.020   0.036
7/1/2020    0.056   0.040
9/1/1995    0.042   0.025

合并在 SQL 数据库数据上循环生成的 pandas 数据帧

Merging pandas DataFrames generated with a loop on SQL Database Data

python

sql

pyodbc

dataframe

pandas