将子字符串复制到下面的字符串,条件是两个字符串的内容

Copying a substring to a string below, conditional on the contents of both strings

我的数据看起来像这样:

A         toberevised
 8:                                        <NA>
 9:                                        <NA>
10:                           Number of returns
11:                     Number of joint returns
12:       Number with paid preparer's signature
13:                        Number of exemptions
14:             Adjusted gross income (AGI) [3]
14:             Adjusted gross income (AGI) [3]
**15:       Salaries and wages in AGI: [4] Number
16:                                      Amount
17:                   Taxable interest:  Number
18:                                      Amount
19:                 Ordinary dividends:  Number
20:                                      Amount**
21:                                        <NA>
22:                                        <NA>
23:                           Number of returns
24:                     Number of joint returns
25:       Number with paid preparer's signature
26:                        Number of exemptions

DF <- structure(list(toberevised = c("[Money amounts are in thousands of dollars]", 
NA, NA, NA, "Item", NA, NA, NA, NA, "Number of returns", "Number of joint returns", 
"Number with paid preparer's signature", "Number of exemptions", 
"Adjusted gross income (AGI) [3]", "Salaries and wages in AGI: [4] Number", 
"Amount", "Taxable interest:  Number", "Amount", "Ordinary dividends:  Number", 
"Amount")), row.names = c(NA, -20L), class = c("data.table", 
"data.frame"))

我想写一段代码,将第15、17和19行:之前的部分复制到其他行Amount之前,所以:

 A        toberevised
 8:                                        <NA>
 9:                                        <NA>
10:                           Number of returns
11:                     Number of joint returns
12:       Number with paid preparer's signature
13:                        Number of exemptions
14:             Adjusted gross income (AGI) [3]
**15:       Salaries and wages in AGI: [4] Number
16:           Salaries and wages in AGI: Amount
17:                   Taxable interest:  Number
18:                    Taxable interest: Amount
19:                 Ordinary dividends:  Number
20:                Ordinary dividends:   Amount**
21:                                        <NA>
22:                                        <NA>
23:                           Number of returns
24:                     Number of joint returns
25:       Number with paid preparer's signature
26:                        Number of exemptions

我尝试了一些非常笨拙的解决方案,比如将具有 : 的单元格复制到一个新列,填充该列,然后尝试从该列中删除 Number,然后我可以连接列,之后我必须删除所有 debree。

DF <- setDT(DF)[grepl(":", DF$toberevised), type:=toberevised]
DF$type <- na.locf(DF$type, na.rm=FALSE)
DF$type <- gsub("[[:punct:]]*Number[[:punct:]]*", "", DF$type)
DF$fullname <- paste(DF$type,DF$toberevised)

除了行不通之外,还有点麻烦。

执行此操作的更好方法是什么?我正在考虑检查一个单元格是否有 : Number 并且下面的单元格是否有 Amount 在下面的字符串之前粘贴 : 之前的子字符串。但是我不知道怎么写这样的东西..

可能的解决方案之一

#Sample data

Sno <- c(1:8)
Values <- c("Number of returns", "Number of joint returns", "Salaries and wages in AGI: [4] Number", "Amount", "Taxable interest:  Number", "Amount", "Ordinary dividends:  Number", "Amount")
df <- data.frame(Sno, Values, stringsAsFactors = FALSE)

df

#  Sno                               Values
#   1                     Number of returns
#   2               Number of joint returns
#   3 Salaries and wages in AGI: [4] Number
#   4                                Amount
#   5             Taxable interest:  Number
#   6                                Amount
#   7           Ordinary dividends:  Number
#   8                                Amount

for(i in 2:nrow(df)){
    if(df[i,2]=="Amount" && grepl("Number",df[i-1,2])){
        df[i,2] <- paste0(strsplit(df[i-1,2],":", fixed = TRUE)[[1]][[1]],": ",df[i,2])
    }
}

#Updated dataframe

# Sno                                Values
#   1                     Number of returns
#   2               Number of joint returns
#   3 Salaries and wages in AGI: [4] Number
#   4     Salaries and wages in AGI: Amount
#   5             Taxable interest:  Number
#   6              Taxable interest: Amount
#   7           Ordinary dividends:  Number
#   8            Ordinary dividends: Amount

希望对您有所帮助。

你可以这样做:

#Get the index of row where current row has "Amount" and previous had "Number"
library(data.table)
inds <- which(DF$toberevised == 'Amount' & shift(grepl('Number', DF$toberevised)))

#Paste those rows with revised value from previous row.
DF$toberevised[inds] <- paste0(sub(':.*', '', DF$toberevised[inds - 1]), 
                                   ': Amount')