如何 returns 自 Python 中的 EXPR 以来已通过的记录数
How to returns the number of records that have passed since EXPR in Python
我正在尝试使用 Python 和 Pandas 复制 IBM SPSS 函数@SINCE,但不幸的是,我卡在了我的过程的一部分。
如果有人知道使用 python 复制 IBM SPSS CLEM @SINCE 的直接函数,我将不胜感激。
这里是 link 了解更多信息:Link
IMB @SINCE function description
"This function returns the offset of the last record where this condition was true--that is, the number of records before this one in which the condition was true. If the condition has never been true, @SINCE returns @INDEX + 1." (IBM, 2020)
我一直在尝试从头开始复制这个功能,但我还没有找到正确的方法:
你们可以使用 Python / Pandas
帮助我解决这个问题吗
问题来了,
我的数据是这样的:
Original Data
+------+----------+
| Type | Flag |
+------+----------+
| d | |
+------+----------+
| A | myStatus |
+------+----------+
| c | |
+------+----------+
| B | myStatus |
+------+----------+
| c | |
+------+----------+
| c | myStatus |
+------+----------+
| c | |
+------+----------+
| d | |
+------+----------+
| d | |
+------+----------+
| A | myStatus |
+------+----------+
在 IBM SPSS 中,我使用此公式获取此数据:
if Type = 'A' or Type = 'B' then @SINCE(Flag = 'myStatus') else -1 endif
这是输出:
+------+----------+----------------+
| Type | Flag | Expected Count |
+------+----------+----------------+
| d | | -1 |
+------+----------+----------------+
| A | myStatus | 0 |
+------+----------+----------------+
| c | | -1 |
+------+----------+----------------+
| B | myStatus | 2 |
+------+----------+----------------+
| c | | -1 |
+------+----------+----------------+
| c | myStatus | -1 |
+------+----------+----------------+
| c | | -1 |
+------+----------+----------------+
| d | | -1 |
+------+----------+----------------+
| d | | -1 |
+------+----------+----------------+
| A | myStatus | 4 |
+------+----------+----------------+
提前致谢。
所以,我找到了解决这个问题的方法:代码如下:
df = pd.DataFrame({"Type":["d", "A", "c", "B", "c", "c", "c", "d", "d", "A"],
"Flag":[np.nan, "myStatus", np.nan, "myStatus", np.nan, "myStatus", np.nan, np.nan, np.nan, "myStatus"]})
函数解决问题:
def spssSince(df):
df_temp = df
df_temp = df[df.Flag=="myStatus"]
df_temp['last_ind'] = df_temp.index
df_temp['last_ind'] = df_temp.last_ind.shift(1)
df_temp['last_ind'] = df_temp['last_ind'].fillna(1)
df_temp["Expected Count"] = df_temp.index - df_temp.last_ind
df_temp.loc[~df_temp.Type.isin(["A", "B"]), "Expected Count"] = -1
DFreturn = pd.merge(left=df, right=df_temp.drop(['Type', 'Flag', 'last_ind'], axis=1), how="left", left_index=True, right_index=True)
DFreturn["Expected Count"] = DFreturn["Expected Count"].fillna(-1)
return DFreturn
基本上,该函数根据条件计算最后一个 SINCE 值,计算具有验证的索引中的实际索引(使用 shift())。
我正在尝试使用 Python 和 Pandas 复制 IBM SPSS 函数@SINCE,但不幸的是,我卡在了我的过程的一部分。
如果有人知道使用 python 复制 IBM SPSS CLEM @SINCE 的直接函数,我将不胜感激。
这里是 link 了解更多信息:Link
IMB @SINCE function description
"This function returns the offset of the last record where this condition was true--that is, the number of records before this one in which the condition was true. If the condition has never been true, @SINCE returns @INDEX + 1." (IBM, 2020)
我一直在尝试从头开始复制这个功能,但我还没有找到正确的方法:
你们可以使用 Python / Pandas
帮助我解决这个问题吗问题来了,
我的数据是这样的:
Original Data
+------+----------+
| Type | Flag |
+------+----------+
| d | |
+------+----------+
| A | myStatus |
+------+----------+
| c | |
+------+----------+
| B | myStatus |
+------+----------+
| c | |
+------+----------+
| c | myStatus |
+------+----------+
| c | |
+------+----------+
| d | |
+------+----------+
| d | |
+------+----------+
| A | myStatus |
+------+----------+
在 IBM SPSS 中,我使用此公式获取此数据:
if Type = 'A' or Type = 'B' then @SINCE(Flag = 'myStatus') else -1 endif
这是输出:
+------+----------+----------------+
| Type | Flag | Expected Count |
+------+----------+----------------+
| d | | -1 |
+------+----------+----------------+
| A | myStatus | 0 |
+------+----------+----------------+
| c | | -1 |
+------+----------+----------------+
| B | myStatus | 2 |
+------+----------+----------------+
| c | | -1 |
+------+----------+----------------+
| c | myStatus | -1 |
+------+----------+----------------+
| c | | -1 |
+------+----------+----------------+
| d | | -1 |
+------+----------+----------------+
| d | | -1 |
+------+----------+----------------+
| A | myStatus | 4 |
+------+----------+----------------+
提前致谢。
所以,我找到了解决这个问题的方法:代码如下:
df = pd.DataFrame({"Type":["d", "A", "c", "B", "c", "c", "c", "d", "d", "A"],
"Flag":[np.nan, "myStatus", np.nan, "myStatus", np.nan, "myStatus", np.nan, np.nan, np.nan, "myStatus"]})
函数解决问题:
def spssSince(df):
df_temp = df
df_temp = df[df.Flag=="myStatus"]
df_temp['last_ind'] = df_temp.index
df_temp['last_ind'] = df_temp.last_ind.shift(1)
df_temp['last_ind'] = df_temp['last_ind'].fillna(1)
df_temp["Expected Count"] = df_temp.index - df_temp.last_ind
df_temp.loc[~df_temp.Type.isin(["A", "B"]), "Expected Count"] = -1
DFreturn = pd.merge(left=df, right=df_temp.drop(['Type', 'Flag', 'last_ind'], axis=1), how="left", left_index=True, right_index=True)
DFreturn["Expected Count"] = DFreturn["Expected Count"].fillna(-1)
return DFreturn
基本上,该函数根据条件计算最后一个 SINCE 值,计算具有验证的索引中的实际索引(使用 shift())。