Pandas 基于来自其他 dataFrame 的 dataframe 选择 Dataframe
Pandas Dataframe selection based on dataframe from other dataFrame
我有一个多比率的数据框。我正在创建分组数据框 pgDF,并希望根据分组数据计算出的平均值查询 summDF。
我的想法是创建如下所述的选区。 EquityMultiplierRatio 是 summDF 中的列之一。我想从基于 summDF 行业和细分市场的 pgDF 数据框中访问平均值。
summDF[summDF['EquityMultiplierRatio'] < pgDF.loc[summDF['Industry'],summDF['Segement']['meanEquityMultiplierRatio']]
dataDF = summDF.select_dtypes(include=['float64']).applymap(dataFormatter)
strDF = summDF.select_dtypes(include=['object'])
#summDF = summDF.apply(dataFormatter)
# print(summDF.head(10))
summDF = pd.concat([strDF,dataDF], axis=1)
#colsd = ['Symbol', 'Name']
colsd = []
for x in summDF.columns:
if 'Ratio' in x:
colsd.append(x)
groupInd = summDF.groupby(['Industry', 'Segment'])
pgDF = groupInd[colsd].mean()#.to_csv('GroupedData.csv')
pgDF.columns = ['mean' + x for x in pgDF.columns]
summDF 数据
Symbol Name PriceCCY YearLowDate YearHighDate \
0 NESN:VTX Nestle SA CHF Dec 02 2016 Jun 26 2017
1 NOVN:VTX Novartis AG CHF Nov 04 2016 Jun 23 2017
2 ROG:VTX Roche Holding AG CHF Dec 02 2016 May 09 2017
3 HSBA:LSE HSBC Holdings PLC GBX Aug 31 2016 Jul 31 2017
4 RDSA:LSE Royal Dutch Shell PLC GBX Sep 27 2016 Jan 16 2017
MarketCapCCY EPSCCY AnnualDiviCCY AnnualDiviYield DiviExDate ... \
0 CHF CHF CHF 2.84% Apr 10 2017 ...
1 CHF CHF CHF 3.47% Mar 02 2017 ...
2 CHF CHF CHF 3.40% Mar 16 2017 ...
3 GBP GBP GBX 5.32% Aug 03 2017 ...
4 EUR EUR GBX 6.91% Aug 10 2017 ...
STDebttoICRatio TaxBurdenRatio InterestBurdenRatio MarginRatio \
0 0.1424 0.7092 0.9051 0.1547
1 0.0577 0.8569 0.9168 0.1726
2 0.1189 0.7483 0.9498 0.2708
3 NaN 1.0000 NaN NaN
4 0.0298 0.8521 0.7364 0.0326
AssetTurnoverRatio EquityMultiplierRatio ReturnonEquityRatio \
0 0.6783 2.0421 0.1375
1 0.3795 1.7389 0.0895
2 0.6584 3.2127 0.4071
3 NaN 13.5415 0.0406
4 0.5680 2.2035 0.0256
PtoTBVPerShare PtoBVPerShare PtoSales
0 22.5302 3.9019 2.8169
1 15.9968 2.6747 4.0528
2 355.8525 8.6764 4.1020
3 1.2524 1.1000 NaN
4 1.3780 1.2010 0.9597
pgDF 数据
meanCashOprRatio meanCurrentRatio \
Industry Segment
Basic Materials Chemicals 0.578000 1.859360
Forestry & Paper 0.582920 1.328260
Industrial Metals 0.330800 1.564900
Mining 1.566150 2.814140
Consumer Goods Automobiles & Parts 0.272469 1.476006
meanQuickRatio meanCashRatio \
Industry Segment
Basic Materials Chemicals 1.180216 0.463956
Forestry & Paper 0.770320 0.217080
Industrial Metals 0.774533 0.365133
Mining 2.027540 1.401680
Consumer Goods Automobiles & Parts 1.113837 0.598381
meanDebttoAssetRatio \
Industry Segment
Basic Materials Chemicals 0.538152
Forestry & Paper 0.505640
Industrial Metals 0.524033
Mining 0.501900
Consumer Goods Automobiles & Parts 0.658581
meanDebtoCapitalRatio \
Industry Segment
Basic Materials Chemicals 0.339304
Forestry & Paper 0.302560
Industrial Metals 0.279833
Mining 0.391257
Consumer Goods Automobiles & Parts 0.426271
meanDebttoEquityRatio \
Industry Segment
Basic Materials Chemicals 0.652592
Forestry & Paper 0.446280
Industrial Metals 0.442300
Mining 0.695986
Consumer Goods Automobiles & Parts 0.921236
meanInterestCoverageRatio \
Industry Segment
Basic Materials Chemicals 66.589904
Forestry & Paper 13.094880
Industrial Metals 14.631200
Mining 38.101220
Consumer Goods Automobiles & Parts 25.721947
meanGrossProfitMarginRatio \
Industry Segment
Basic Materials Chemicals 0.660825
Forestry & Paper 0.668180
Industrial Metals 0.770800
Mining 0.597567
Consumer Goods Automobiles & Parts 1.922875
meanOperatingProfitMarginRatio \
Industry Segment
Basic Materials Chemicals 0.132728
Forestry & Paper 0.105120
Industrial Metals 0.077233
Mining 0.201630
Consumer Goods Automobiles & Parts 87.706919
meanNetProfitMarginRatio \
Industry Segment
Basic Materials Chemicals 0.103144
Forestry & Paper 0.087080
Industrial Metals 0.052533
Mining 0.163700
Consumer Goods Automobiles & Parts 85.941688
meanReturnonAssetsRatio \
Industry Segment
Basic Materials Chemicals 0.108065
Forestry & Paper 0.096200
Industrial Metals 0.060967
Mining 0.109590
Consumer Goods Automobiles & Parts 0.074853
meanLTDebttoICRatio meanSTDebttoICRatio \
Industry Segment
Basic Materials Chemicals 0.258408 0.056579
Forestry & Paper 0.220620 0.078540
Industrial Metals 0.182733 0.089633
Mining 0.271450 0.070014
Consumer Goods Automobiles & Parts 0.318544 0.281814
meanTaxBurdenRatio \
Industry Segment
Basic Materials Chemicals 0.880248
Forestry & Paper 0.913420
Industrial Metals 0.703733
Mining 0.874950
Consumer Goods Automobiles & Parts 0.875575
meanInterestBurdenRatio meanMarginRatio \
Industry Segment
Basic Materials Chemicals 0.805548 0.136052
Forestry & Paper 0.781280 0.120640
Industrial Metals 0.824800 0.086767
Mining 0.691380 0.244280
Consumer Goods Automobiles & Parts 0.840720 93.815747
meanAssetTurnoverRatio \
Industry Segment
Basic Materials Chemicals 0.895612
Forestry & Paper 0.791480
Industrial Metals 0.716767
Mining 0.531190
Consumer Goods Automobiles & Parts 0.864700
meanEquityMultiplierRatio \
Industry Segment
Basic Materials Chemicals 2.400252
Forestry & Paper 2.043860
Industrial Metals 2.195533
Mining 2.150580
Consumer Goods Automobiles & Parts 3.907062
meanReturnonEquityRatio
Industry Segment
Basic Materials Chemicals 0.165624
Forestry & Paper 0.143180
Industrial Metals 0.075767
Mining 0.157280
Consumer Goods Automobiles & Parts 0.262512
看来我已经解决了所以想分享一下。不确定它是否会帮助其他人但是..
我想要更多 sql 内部连接来查询两个数据帧,所以我首先使用内部连接在一个 DF 的列上使用另一个 DF 上的索引合并数据。
mergeDF = pd.merge(summDF, pgDF, left_on=['Industry', 'Segment'], right_index=True, how='inner')
之后,合并查询变得更加容易。
mergeDF[mergeDF['EquityMultiplierRatio'] < mergeDF['meanEquityMultiplierRatio']]
希望对您有所帮助..
我有一个多比率的数据框。我正在创建分组数据框 pgDF,并希望根据分组数据计算出的平均值查询 summDF。
我的想法是创建如下所述的选区。 EquityMultiplierRatio 是 summDF 中的列之一。我想从基于 summDF 行业和细分市场的 pgDF 数据框中访问平均值。
summDF[summDF['EquityMultiplierRatio'] < pgDF.loc[summDF['Industry'],summDF['Segement']['meanEquityMultiplierRatio']]
dataDF = summDF.select_dtypes(include=['float64']).applymap(dataFormatter)
strDF = summDF.select_dtypes(include=['object'])
#summDF = summDF.apply(dataFormatter)
# print(summDF.head(10))
summDF = pd.concat([strDF,dataDF], axis=1)
#colsd = ['Symbol', 'Name']
colsd = []
for x in summDF.columns:
if 'Ratio' in x:
colsd.append(x)
groupInd = summDF.groupby(['Industry', 'Segment'])
pgDF = groupInd[colsd].mean()#.to_csv('GroupedData.csv')
pgDF.columns = ['mean' + x for x in pgDF.columns]
summDF 数据
Symbol Name PriceCCY YearLowDate YearHighDate \
0 NESN:VTX Nestle SA CHF Dec 02 2016 Jun 26 2017
1 NOVN:VTX Novartis AG CHF Nov 04 2016 Jun 23 2017
2 ROG:VTX Roche Holding AG CHF Dec 02 2016 May 09 2017
3 HSBA:LSE HSBC Holdings PLC GBX Aug 31 2016 Jul 31 2017
4 RDSA:LSE Royal Dutch Shell PLC GBX Sep 27 2016 Jan 16 2017
MarketCapCCY EPSCCY AnnualDiviCCY AnnualDiviYield DiviExDate ... \
0 CHF CHF CHF 2.84% Apr 10 2017 ...
1 CHF CHF CHF 3.47% Mar 02 2017 ...
2 CHF CHF CHF 3.40% Mar 16 2017 ...
3 GBP GBP GBX 5.32% Aug 03 2017 ...
4 EUR EUR GBX 6.91% Aug 10 2017 ...
STDebttoICRatio TaxBurdenRatio InterestBurdenRatio MarginRatio \
0 0.1424 0.7092 0.9051 0.1547
1 0.0577 0.8569 0.9168 0.1726
2 0.1189 0.7483 0.9498 0.2708
3 NaN 1.0000 NaN NaN
4 0.0298 0.8521 0.7364 0.0326
AssetTurnoverRatio EquityMultiplierRatio ReturnonEquityRatio \
0 0.6783 2.0421 0.1375
1 0.3795 1.7389 0.0895
2 0.6584 3.2127 0.4071
3 NaN 13.5415 0.0406
4 0.5680 2.2035 0.0256
PtoTBVPerShare PtoBVPerShare PtoSales
0 22.5302 3.9019 2.8169
1 15.9968 2.6747 4.0528
2 355.8525 8.6764 4.1020
3 1.2524 1.1000 NaN
4 1.3780 1.2010 0.9597
pgDF 数据
meanCashOprRatio meanCurrentRatio \
Industry Segment
Basic Materials Chemicals 0.578000 1.859360
Forestry & Paper 0.582920 1.328260
Industrial Metals 0.330800 1.564900
Mining 1.566150 2.814140
Consumer Goods Automobiles & Parts 0.272469 1.476006
meanQuickRatio meanCashRatio \
Industry Segment
Basic Materials Chemicals 1.180216 0.463956
Forestry & Paper 0.770320 0.217080
Industrial Metals 0.774533 0.365133
Mining 2.027540 1.401680
Consumer Goods Automobiles & Parts 1.113837 0.598381
meanDebttoAssetRatio \
Industry Segment
Basic Materials Chemicals 0.538152
Forestry & Paper 0.505640
Industrial Metals 0.524033
Mining 0.501900
Consumer Goods Automobiles & Parts 0.658581
meanDebtoCapitalRatio \
Industry Segment
Basic Materials Chemicals 0.339304
Forestry & Paper 0.302560
Industrial Metals 0.279833
Mining 0.391257
Consumer Goods Automobiles & Parts 0.426271
meanDebttoEquityRatio \
Industry Segment
Basic Materials Chemicals 0.652592
Forestry & Paper 0.446280
Industrial Metals 0.442300
Mining 0.695986
Consumer Goods Automobiles & Parts 0.921236
meanInterestCoverageRatio \
Industry Segment
Basic Materials Chemicals 66.589904
Forestry & Paper 13.094880
Industrial Metals 14.631200
Mining 38.101220
Consumer Goods Automobiles & Parts 25.721947
meanGrossProfitMarginRatio \
Industry Segment
Basic Materials Chemicals 0.660825
Forestry & Paper 0.668180
Industrial Metals 0.770800
Mining 0.597567
Consumer Goods Automobiles & Parts 1.922875
meanOperatingProfitMarginRatio \
Industry Segment
Basic Materials Chemicals 0.132728
Forestry & Paper 0.105120
Industrial Metals 0.077233
Mining 0.201630
Consumer Goods Automobiles & Parts 87.706919
meanNetProfitMarginRatio \
Industry Segment
Basic Materials Chemicals 0.103144
Forestry & Paper 0.087080
Industrial Metals 0.052533
Mining 0.163700
Consumer Goods Automobiles & Parts 85.941688
meanReturnonAssetsRatio \
Industry Segment
Basic Materials Chemicals 0.108065
Forestry & Paper 0.096200
Industrial Metals 0.060967
Mining 0.109590
Consumer Goods Automobiles & Parts 0.074853
meanLTDebttoICRatio meanSTDebttoICRatio \
Industry Segment
Basic Materials Chemicals 0.258408 0.056579
Forestry & Paper 0.220620 0.078540
Industrial Metals 0.182733 0.089633
Mining 0.271450 0.070014
Consumer Goods Automobiles & Parts 0.318544 0.281814
meanTaxBurdenRatio \
Industry Segment
Basic Materials Chemicals 0.880248
Forestry & Paper 0.913420
Industrial Metals 0.703733
Mining 0.874950
Consumer Goods Automobiles & Parts 0.875575
meanInterestBurdenRatio meanMarginRatio \
Industry Segment
Basic Materials Chemicals 0.805548 0.136052
Forestry & Paper 0.781280 0.120640
Industrial Metals 0.824800 0.086767
Mining 0.691380 0.244280
Consumer Goods Automobiles & Parts 0.840720 93.815747
meanAssetTurnoverRatio \
Industry Segment
Basic Materials Chemicals 0.895612
Forestry & Paper 0.791480
Industrial Metals 0.716767
Mining 0.531190
Consumer Goods Automobiles & Parts 0.864700
meanEquityMultiplierRatio \
Industry Segment
Basic Materials Chemicals 2.400252
Forestry & Paper 2.043860
Industrial Metals 2.195533
Mining 2.150580
Consumer Goods Automobiles & Parts 3.907062
meanReturnonEquityRatio
Industry Segment
Basic Materials Chemicals 0.165624
Forestry & Paper 0.143180
Industrial Metals 0.075767
Mining 0.157280
Consumer Goods Automobiles & Parts 0.262512
看来我已经解决了所以想分享一下。不确定它是否会帮助其他人但是..
我想要更多 sql 内部连接来查询两个数据帧,所以我首先使用内部连接在一个 DF 的列上使用另一个 DF 上的索引合并数据。
mergeDF = pd.merge(summDF, pgDF, left_on=['Industry', 'Segment'], right_index=True, how='inner')
之后,合并查询变得更加容易。
mergeDF[mergeDF['EquityMultiplierRatio'] < mergeDF['meanEquityMultiplierRatio']]
希望对您有所帮助..