Pandas

Question

pandas 的新手。我正在尝试对客户端使用数据进行一些队列分析，其中值按日历日期存储，但我想按该客户端的 "start" 日期进行分析。数据框在该客户的开始日期之前包含零。

数据是这样的：

          2014-06-01 2014-07-01 2014-08-01 2014-09-01 2014-10-01 2014-11-01  \
100003211          0          0          0          0          0          0   
100000006          0          0          0          0         88        334   
100000018          0          0        332          0          0          0   
100000019          0          0          0        138        177          6   
100000023        558        179        243          0          0          0   
100000035          0          0        115          1          0          0

对于我正在尝试做的事情，我的心理印象是去除每行中的零直到最左边的非零值，然后 "left align" 该行。每行将以非零数字开头，然后像以前一样继续。

这是我试图从上述数据帧处理成 "cohorted" 数据帧的循环：

for client_id,row in df_raw.iterrows():
    while not row.empty and row[:0] == 0:
        row.pop(0)
    df_cohorted[client_id] = row

...但我收到此错误：ValueError：系列的真值不明确。使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。

与此同时，这甚至看起来都不是正确的方法。从阅读其他主题看来我可能想转置然后使用映射函数？

欢迎提出任何建议，无论是使用不同的方法还是（如果我的方法是最好的）帮助确定问题所在。

编辑

希望输出看起来像这样。这些列将指示每行数据的第一个非零月份，第二个将是其后的日历月，依此类推。

            1st  2nd  3rd  4th  5th  6th
100003211     0    0    0    0    0    0   
100000006    88  334   
100000018   332    0    0    0   
100000019   138  177    6   
100000023   558  179  243    0    0    0   
100000035   115    1    0    0

Answer 1

从这个数据框开始

           2014-06-01   2014-07-01  2014-08-01  2014-09-01  2014-10-01  2014-11-01
100003211   0           0           0           0           0           0
100000006   0           0           0           0           88          334
100000018   0           0           332         0           0           0
100000019   0           0           0           138         177         6
100000023   558         179         243         0           0           0
100000035   0           0           115         1           0           0

并定义这个函数

def getLeftAlignSeries(s):      
   a = np.array(np.trim_zeros(s),'f')
   b = np.pad(a,(0,(len(s) - len(a))),mode='constant',constant_values=0)     
   return b

然后一个 Apply()

dfNew = df.apply(getLeftAlignSeries,axis=1)    
dfNew   

            2014-06-01  2014-07-01  2014-08-01  2014-09-01  2014-10-01  2014-11-01
100003211   0           0           0           0           0           0
100000006   88          334         0           0           0           0
100000018   332         0           0           0           0           0
100000019   138         177         6           0           0           0
100000023   558         179         243         0           0           0
100000035   115         1           0           0           0           0

Pandas - 在数据框中分组

Pandas - cohorting in a dataframe

python