从生成器对象中提取数据
Extracting data from generator object
python 新手,在此先感谢您的帮助。
使用几个 csv 文件创建我想用几个 pandas.asfreq() 选项过滤的数据框,创建生成器对象,排序并列出顶部结果。
import pandas as pd
import numpy as np
N = 100
dates = pd.date_range('19971002', periods=N, freq='B')
df=pd.DataFrame(np.random.randn(len(dates),1),index=dates,columns=list('A'))
df1=pd.DataFrame(np.random.randn(len(dates),1),index=dates,columns=list('B'))
pieces = (df, df1)
data = pd.concat((pieces), join='outer', axis = 1)
df['custIndex'] = (df.groupby([df.index.year, df.index.month]).cumcount()+1) # 'CI' = custIndex increments by 1 for each occurance since month inception
data.head()
time_sets = ['W-Mon', 'W-Tue']
for time_set in time_sets:
grouped = data.asfreq(time_set).groupby(df.custIndex).sum()
print time_set
print grouped.head()
W-Mon
A B
custIndex
1 1.827512 -0.487051
3 -0.463776 -0.002071
6 2.074173 -0.232500
8 -0.282901 0.575820
11 0.505265 -3.844740
W-Tue
A B
custIndex
2 1.347802 -0.738638
4 0.273424 0.218833
7 1.439177 3.671049
9 1.722703 -0.962877
12 -3.415453 1.123824
这是我遇到麻烦的地方,目标是对值列 'A' 和 'B' 进行排序(首先是最高值),并提取具有最高值的 custIndex,并列出 custIndex、值和列。
t = (group.sort_index(by='',ascending=True)for key, group in grouped)
需要有关排序依据的帮助,尝试了几项('CI'、'key')但没有成功。
t
<generator object <genexpr> at 0x000000000AA9A318>
top = pd.DataFrame()
for line in t:
top = top.append(line)
ValueError: need more than 1 value to unpack
目标如下:
custIndex value time_set Column
6 2.074173 W_MON A
1 1.827512 W-MON A
9 1.722703 W-TUE B
再次感谢您。
为了使您的生成器表达式起作用,您需要对其进行如下修改:
t = (group.sort_index(ascending=True) for key, group in grouped.iteritems())
即使它可能 'work',它仍然可能无法达到您的预期。要查看输出,您可以尝试:
for line in t:
print line
对于建议的解决方案,如何:
top_n = 5 # The number of top items returned.
goal = pd.DataFrame([[None] * 4] * top_n, # 4 = number of columns
columns=['custIndex', 'value', 'time_set', 'Column'])
for time_set in time_sets:
grouped = data.asfreq(time_set).groupby(df.custIndex).sum()
t = (group for group in grouped.unstack().iteritems())
for [column, custIndex], val in t:
if val > min(goal.value):
# Append item to end of goal DataFrame and then re-sort.
goal.iloc[-1] = [custIndex, val, time_set, column]
goal.sort('value', ascending=False, inplace=True)
goal.set_index(['custIndex', 'time_set', 'Column'], inplace=True)
>>> goal
value
custIndex time_set Column
12 W-Tue B 3.048822
5 W-Fri A 2.63997
18 W-Wed B 2.570899
10 W-Wed B 2.493457
19 W-Thu B 2.164974
python 新手,在此先感谢您的帮助。
使用几个 csv 文件创建我想用几个 pandas.asfreq() 选项过滤的数据框,创建生成器对象,排序并列出顶部结果。
import pandas as pd
import numpy as np
N = 100
dates = pd.date_range('19971002', periods=N, freq='B')
df=pd.DataFrame(np.random.randn(len(dates),1),index=dates,columns=list('A'))
df1=pd.DataFrame(np.random.randn(len(dates),1),index=dates,columns=list('B'))
pieces = (df, df1)
data = pd.concat((pieces), join='outer', axis = 1)
df['custIndex'] = (df.groupby([df.index.year, df.index.month]).cumcount()+1) # 'CI' = custIndex increments by 1 for each occurance since month inception
data.head()
time_sets = ['W-Mon', 'W-Tue']
for time_set in time_sets:
grouped = data.asfreq(time_set).groupby(df.custIndex).sum()
print time_set
print grouped.head()
W-Mon
A B
custIndex
1 1.827512 -0.487051
3 -0.463776 -0.002071
6 2.074173 -0.232500
8 -0.282901 0.575820
11 0.505265 -3.844740
W-Tue
A B
custIndex
2 1.347802 -0.738638
4 0.273424 0.218833
7 1.439177 3.671049
9 1.722703 -0.962877
12 -3.415453 1.123824
这是我遇到麻烦的地方,目标是对值列 'A' 和 'B' 进行排序(首先是最高值),并提取具有最高值的 custIndex,并列出 custIndex、值和列。
t = (group.sort_index(by='',ascending=True)for key, group in grouped)
需要有关排序依据的帮助,尝试了几项('CI'、'key')但没有成功。
t
<generator object <genexpr> at 0x000000000AA9A318>
top = pd.DataFrame()
for line in t:
top = top.append(line)
ValueError: need more than 1 value to unpack
目标如下:
custIndex value time_set Column
6 2.074173 W_MON A
1 1.827512 W-MON A
9 1.722703 W-TUE B
再次感谢您。
为了使您的生成器表达式起作用,您需要对其进行如下修改:
t = (group.sort_index(ascending=True) for key, group in grouped.iteritems())
即使它可能 'work',它仍然可能无法达到您的预期。要查看输出,您可以尝试:
for line in t:
print line
对于建议的解决方案,如何:
top_n = 5 # The number of top items returned.
goal = pd.DataFrame([[None] * 4] * top_n, # 4 = number of columns
columns=['custIndex', 'value', 'time_set', 'Column'])
for time_set in time_sets:
grouped = data.asfreq(time_set).groupby(df.custIndex).sum()
t = (group for group in grouped.unstack().iteritems())
for [column, custIndex], val in t:
if val > min(goal.value):
# Append item to end of goal DataFrame and then re-sort.
goal.iloc[-1] = [custIndex, val, time_set, column]
goal.sort('value', ascending=False, inplace=True)
goal.set_index(['custIndex', 'time_set', 'Column'], inplace=True)
>>> goal
value
custIndex time_set Column
12 W-Tue B 3.048822
5 W-Fri A 2.63997
18 W-Wed B 2.570899
10 W-Wed B 2.493457
19 W-Thu B 2.164974