有没有一种简单的方法可以从 Stata 中以逗号或 space+逗号分隔的本地宏中提取 N 个第一个单词?

Is there a simple way of extract the N first words from a local macro which is comma or space+comma separated in Stata?

给定一个包含由逗号 (",") 或 逗号和 space (", ") 甚至只有 space (" "), 有没有简单的方法来提取第一个 N 这个本地宏的级别(或单词)?

字符串看起来像 "12, 123, 1321, 41",或 "12,123,1321,41""12 123 1321 41"

基本上我会对 宏函数 word # of string 的一个版本感到满意 或多或少会像 word 1/N of string 那样工作。 (参见“用于解析的宏函数” pg 12 in Macro definition and manipulation)

有关更多上下文,我正在处理 levelsof, local() sep() 的输出。所以 我可以选择更容易使用的分隔符。我想要 将生成的级别作为参数传递给 inlist() 函数。下列 通常有效,但 inlist() 最多只需要 250 个参数。这就是为什么我会 喜欢从 levelsof()

的结果中提取 250 个单词的块
sysuse auto, clear
levelsof mpg if trunk > 20, local(levels) sep(", ")
list if inlist(mpg, `levels')

到目前为止的“解决方案”

我想出了一个不太简单的方法来实现它,但它看起来不太好而且 我想知道是否有一种简单的内置方法可以做到这一点。

sysuse auto, clear

levelsof mpg if trunk > 20, local(levels) sep(", ")
scalar number_of_words = 3
forvalues i = 1 (1) `=number_of_words' {
        local word_i = `i'
        local this_level : word `word_i' of `levels'
        local list_of_levels = "`list_of_levels'`this_level'" 
        
        di as text "loop: `i'"
        di as text "this level: `this_level'"
        di as text "list of levels so far: `list_of_levels'"
    }

di "`list_of_levels'"

// trim trailing comma
local trimmed_list_of_levels = substr( "`list_of_levels'" , 1 , strlen( "`list_of_levels'" )-1) 

di "`trimmed_list_of_levels'"
list make mpg price trunk if inlist(mpg, `trimmed_list_of_levels')

输出

. sysuse auto, clear
(1978 Automobile Data)

. 
. levelsof mpg if trunk > 20, local(levels) sep(", ")
12, 15, 17, 18

. scalar number_of_words = 3

. forvalues i = 1 (1) `=number_of_words' {
  2.         local word_i = `i'
  3.         local this_level : word `word_i' of `levels'
  4.         local list_of_levels = "`list_of_levels'`this_level'" 
  5.         
.         di as text "loop: `i'"
  6.         di as text "this level: `this_level'"
  7.         di as text "list of levels so far: `list_of_levels'"
  8.     }
loop: 1
this level: 12,
list of levels so far: 12,
loop: 2
this level: 15,
list of levels so far: 12,15,
loop: 3
this level: 17,
list of levels so far: 12,15,17,

. 
. di "`list_of_levels'"
12,15,17,

. 
. // trim trailing comma
. local trimmed_list_of_levels = substr( "`list_of_levels'" , 1 , strlen( "`list_of_levels'" )-1) 

. 
. di "`trimmed_list_of_levels'"
12,15,17

. list make mpg price trunk if inlist(mpg, `trimmed_list_of_levels')

     +------------------------------------------+
     | make                mpg    price   trunk |
     |------------------------------------------|
  2. | AMC Pacer            17    4,749      11 |
  5. | Buick Electra        15    7,827      20 |
 23. | Dodge St. Regis      17    6,342      21 |
 26. | Linc. Continental    12   11,497      22 |
 27. | Linc. Mark V         12   13,594      18 |
     |------------------------------------------|
 31. | Merc. Marquis        15    6,165      23 |
 53. | Audi 5000            17    9,690      15 |
 74. | Volvo 260            17   11,995      14 |
     +------------------------------------------+

与评论相关的编辑。

编辑 01)

例如,以下内容不起作用。它 returns 错误 130 expression too long.

clear 

set obs 1000
gen id = _n 
gen x1 = rnormal()

sum * 
levelsof id if x1>0, local(levels) sep(", ")
sum * if inlist(id, `levels')

这个构造 (levelsof + inlist) 似乎是必要的例子

clear 

set obs 5000
gen id = round(_n/5)
gen x1 = rnormal()

sum * 
levelsof id if x1>2, local(levels) sep(", ")
sum * if x1>2 // if threshold is small enough, there will be too many values for inlist()
sum * if inlist(id, `levels')

使用您的附加示例作为基础,您可以使用 egen max 创建一个标志,该标志对整个 id 具有 任何情况 的 1 x1 值高于特定阈值。例如:

clear 
set seed 2021
set obs 5000
gen id = round(_n/5)
gen x1 = rnormal()

sum * 
levelsof id if x1>2, local(levels) sep(", ")
sum * if x1>2 // if threshold is small enough, there will be too many values for inlist()
sum * if inlist(id, `levels')

//This will do the same thing
gen over_threshold = x1>2 
egen id_over_thresh = max(over_threshold), by(id)

sum * if id_over_thresh