如何判断 Pandas 中的值是否随维度发生变化

How to tell if a value changed over dimension(s) in Pandas

假设我有某些日期的一些客户数据,我想看看他们的地址是否发生了变化。在那些日期。理想情况下,我想将发生更改的两列复制到新的 table 中,或者只是获取总更改量的指标。

所以,如果我有一个 table 喜欢

Date , Customer , Address
12/31/14, Cust1, 12 Rocky Hill Rd
12/31/15, Cust1, 12 Rocky Hill Rd
12/31/16, Cust1, 14 Rocky Hill Rd
12/31/14, Cust2, 12 Testing Rd
12/31/15, Cust2, 12 Testing Ln
12/31/16, Cust2, 12 Testing Rd

我最终会计算两次更改,客户 1 在 12/31/15 和 12/31/16 之间的 12 Rocky Hill Rd 之间的更改和 Cust2 在 12/31/14 和 12/31/15 之间的更改.

理想情况下我可以得到这样的table

Dates , Customer , Change
12/31/15 to 12/31/16, Cust1, 12 Rocky Hill Rd to 14 Rocky Hill Rd
12/31/14 to 12/31/15, Cust2, 12 Testing Rd to 12 Testing Ln

或者即使只是更改的总数也会很棒。有任何想法吗?理想情况下,我会有更多的日期,可能在这些日期之间进行多次更改,并且可能还有我想检查更改的其他列。实际上,只需对每一列在某个日期期间对客户记录的更改进行汇总就足够了。

我是 Panda's 的新手,不太确定从哪里开始。

编辑: 正如我在下面的解决方案中指出的那样,我希望能够传递一个更大的数据帧,而不仅仅是一个地址来检测变化。例如,我在 R 中使用如下内容完成了此操作: `在此处输入代码

`#How many changes have occured (unique values - 1)
UniLen <-  function(x){
  x <- length(unique(x))-1
  return(x)
}
#Create a vector of Address Features to check for changes in
Address_Features <- c("AddrLine1", "AddrLine2", "AddrLine3", "CityName", "State", "ZipCodeNum", "County")
#Check for changes in each address 'use this address for description' for each customer
AddressChanges_Detail <- mktData[,c("CustomerNumEID","AddressUniqueRelationDesc",Address_Features)] %>%
  group_by(CustomerNumEID, AddressUniqueRelationDesc) %>%
  summarise_each(funs(UniLen))

#Summarise results (how many changes for each feature)
AddressChanges_Summary <- AddressChanges_Detail[,Address_Features] %>%
  summarise_each(funs(sum))

这使我们能够计算发生了多少变化,但我错过了变化发生的日期以及功能变化的来源和变化...看来 Python 解决方案你'我们建议使用 .shift 来解决这个问题,而不仅仅是对某些组的唯一值进行总结。理想情况下,我希望两全其美:)。

df

输入数据帧

    Date    Customer    Address
0   12/31/14    Cust1   12 Rocky Hill Rd
1   12/31/15    Cust1   12 Rocky Hill Rd
2   12/31/16    Cust1   14 Rocky Hill Rd
3   12/31/14    Cust2   12 Testing Rd
4   12/31/15    Cust2   12 Testing Ln
5   12/31/16    Cust2   12 Testing Rd

地址变更功能:

def changeAdd(x):
    x=x[x.Address != x.shift(-1).Address]
    df1 = pd.DataFrame({'Date':x.shift(1).Date + ' to '+ x.Date,
              'Customer':x.Customer.max(),
              'Address':x.shift(1).Address +' to ' + x.Address})
    return df1[df1.Address.notnull()]


dm = df.groupby('Customer')\
   .apply(changeAdd)\
   .reset_index(drop=True)[['Date','Customer','Address']]

dm

输出数据帧:

Date    Customer    Address
0   12/31/15 to 12/31/16    Cust1   12 Rocky Hill Rd to 14 Rocky Hill Rd
1   12/31/14 to 12/31/15    Cust2   12 Testing Rd to 12 Testing Ln
2   12/31/15 to 12/31/16    Cust2   12 Testing Ln to 12 Testing Rd