mapply 将 non-numeric 参数赋予 R 中的二元运算符错误

mapply giving non-numeric argument to binary operator error in R

我正在尝试生成一个变量,用于标记受访者与其在家庭名册中的 parents 之间令人难以置信的年龄差异。 mapply() 在 "non-numeric argument to binary operator" 上给出错误,而当我仅在两列上应用该函数时我没有收到此错误。很感谢任何形式的帮助。下面,我尝试制作一个可重现的示例。


# Variables
respbirth <- c(1974, 1950, 1990, 1980 ) 
B1010_1 <- c(1950, 1960, 1960, 1979 ) 
B1040_1 <- c(3,3,3,3)
B1010_2 <- c(1974, NA, NA, 1975 ) 
B1040_2 <- c(3,1,3,3)

# Data frame
df <- data.frame(respbirth, B1010_1, B1040_1, B1010_2, B1040_2 ) 
df

# Generate empty variable for flaging cases
df$flag_parent <- FALSE

## Generate a function flagging implausible differences using year of birth
attach(df)  # the function doesnt work without this for some reason
imp.parent <- function(data=df,parentAge=B1010_1,relationship=B1040_1) {
  df$flag_parent <- with(df, ((respbirth-parentAge)<18) & (relationship==3))
return(df)
}

# Test
df <- imp.parent(parentAge=B1010_1,relationship=B1040_1)

# Apply this function to all columns
parentAge <- c(paste0("B1010_",1:19, sep=""))
relationship <- c(paste0("B1040_",1:19, sep=""))
mapply(imp.parent, parentAge, relationship )

目前您的 mapply 尝试存在许多问题,包括参数类型、函数调用、返回值等。为避免列出一长串这些问题,请考虑以下重构代码。

# Generate a function flagging implausible differences using year of birth
imp.parent <- function(parentAge, relationship, data=df) {    
  ((df$respbirth - df[[parentAge]]) < 18) & (df[[relationship]] == 3)  
}

# Apply this function to all columns
parentAge <- c(paste0("B1010_", 1:2))
relationship <- c(paste0("B1040_", 1:2))

# Assign columns True/False
df[paste0("false_flag_", 1:2)] <- mapply(imp.parent, parentAge, relationship )

df    
#   respbirth B1010_1 B1040_1 B1010_2 B1040_2 false_flag_1 false_flag_2
# 1      1974    1950       3    1974       3        FALSE         TRUE
# 2      1950    1960       3      NA       1         TRUE        FALSE
# 3      1990    1960       3      NA       3        FALSE           NA
# 4      1980    1979       3    1975       3         TRUE         TRUE

其实你根本不需要mapply(隐藏循环)! R 可以计算跨等长数据块的逻辑条件,用于列块的矢量化分配:

# Apply this function to all columns
parentAge <- c(paste0("B1010_", 1:2))
relationship <- c(paste0("B1040_", 1:2))

# NOTICE USE OF `[` (NOT `[[`)
df[paste0("false_flag_", 1:2)] <- ((df$respbirth - df[parentAge]) < 18) & (df[relationship] == 3) 

df    
#   respbirth B1010_1 B1040_1 B1010_2 B1040_2 false_flag_1 false_flag_2
# 1      1974    1950       3    1974       3        FALSE         TRUE
# 2      1950    1960       3      NA       1         TRUE        FALSE
# 3      1990    1960       3      NA       3        FALSE           NA
# 4      1980    1979       3    1975       3         TRUE         TRUE

Online Demo