Python&Pandas 如何获取属于方法 "describe" 的 4 个四分位数中每一个的所有行？

Question

晚安！

我是编码新手，我的英语不太好，这是我第二次post，所以请耐心等待我=]

我有一个 huuuge csv 文件（超过 500k 行），最后一列有大量利率。

我需要：

a)使用vr_tx_jrs栏中描述的方法得到整个csv的最小值、最大值和4个四分位数的利率，之后打扫; 我已经做到了

b) 创建 4 个数据帧，每个四分位数一个，以存储属于 4 个四分位数中每个四分位数的所有利率 (vr_tx_jrs)然后，在每个数据帧上使用 describe 方法，因为我需要 4 个四分位数中每个四分位数的中位数； **我被困在这里，我不知道如何进行，我需要你们的帮助，伙计们，=D **

c) 计算这 4 个四分位数中每一个的频率。由于我被困在字母 b 上，所以我还没有到这里。但是我想我需要获取这 4 个数据帧中每一个的行的 len，并在清理后除以整个 csv 的 len，这样我就会得到每个四分位数的频率；

我启动了代码：

import pandas as pd
import numpy as np

'''Importing and cleaning data'''
df_quart = pd.read_csv(r"C:\Users\base_ob.csv", encoding='Latin-1', sep=";")
df_quart.head()
df_quart['vr_tx_jrs'] = df_quart['vr_tx_jrs'].str.replace(',','.').astype(np.float64)
df_quart['nr_cic'] = df_quart['nr_cic'].astype(np.int64)
df_quart.dtypes
df_quart.describe()
df_quart.groupby('nr_cic').mean().reset_index() '''cleaning doubles and exchange to the mean between them

'''Here is the output to letter "a". Creating a new dataframe to store minimum and max interest rates and the 4 quartiles of the whole CSV'''

df_final = df_quart.describe()
df_final.to_excel(r"C:\Users\describe_base_ob.xlsx")

现在我被困在字母“B”中，我需要你们的帮助，伙计们。我在网上搜索了很多，但我不知道如何获取属于方法“描述”中描述的 4 个四分位数中的每一个的所有行，并存储在 4 个新数据帧中，每个四分位数一个。

你能帮帮我吗？

谢谢，祝大家有美好的一天！！ =D

Answer 1

这是您要找的吗：

# Quartile value
qtile_value = 0.95

# Make new dataframe of original, being a subset as it filters for all values lower than # quartile value

quart_1 = df[df['vr_tx_jrs']<=np.quantile(df['vr_tx_jrs'], qtile_value )]

只需对其他 3 个分位数重复 quart_1。

Python&Pandas 如何获取属于方法 "describe" 的 4 个四分位数中每一个的所有行？

Python&Pandas How to get all the rows that belongs to each one of the 4 quartiles of the method "describe"?

python

analytics

numpy

dataframe

pandas