获取由连字符分隔的值
Obtain values separated by hyphens
我在数据集中有一堆值,它们的公式类似于 2000-3222
和 10/1-10
。
我想拆分这些,以便它列出 2000
、2001
等和 10/1
、10/2
等,它们都在自己的行中。
在 Stata 或 R 中是否有执行此操作的任何命令?
编辑:
示例数据:
input int SRNo str200 SchemeName str30 CTSNo1 str4 CTSNo2
69 "Khimji Nagar SRA Co-op.Housing Society Ltd." "467" ""
70 "Jai Bhavani CHS Ltd. (Proposed)" "7 (Pt.)" ""
71 "Shivshakti SRA CHS Ltd." "364 ‘A’" ""
72 "Shree Ram CHS Ltd. (Prop.)" "96 (Pt.) -99(Pt.)" ""
end
假设所有值都像您的示例,并且您的变量是字符串类型:
. clear
. set obs 1
number of observations (_N) was 0, now 1
.
. generate string1 = "2000-3222"
. generate new_string1 = substr(string1, 1, 4)
.
. generate string2 = "10/1-10"
. generate new_string2 = substr(string2, 1, 4)
.
. list
+-------------------------------------------+
| string1 new_st~1 string2 new_st~2 |
|-------------------------------------------|
1. | 2000-3222 2000 10/1-10 10/1 |
+-------------------------------------------+
如果您只需要原始变量的某个部分,此解决方案很有用。
编辑:
使用@Nick 的极好的建议:
clear
set obs 1
generate string1 = "2000-3222"
generate string2 = "10/1-10"
split string1, parse("-") generate(split_string1)
split string2, parse("/") generate(split_string2)
list
+-----------------------------------------------------------------+
| string1 string2 split~11 split~12 split~21 split~22 |
|-----------------------------------------------------------------|
1. | 2000-3222 10/1-10 2000 3222 10 1-10 |
+-----------------------------------------------------------------+
如您所见,此解决方案将为您提供两个用于 string1
的变量和另外两个用于 string2
的变量,每个变量都包含原始变量的两个(单独的)部分。
根据您的示例数据(我在其中添加了一些观察结果以使事情更具说明性),您需要以下内容:
clear
input int SRNo str200 SchemeName str30 CTSNo1 str4 CTSNo2
69 "Khimji Nagar SRA Co-op.Housing Society Ltd." "467" ""
70 "Jai Bhavani CHS Ltd. (Proposed)" "7 (Pt.)" ""
71 "Bhavani Housing" "12(Pt.)-21(Pt.)" ""
72 "Shivshakti SRA CHS Ltd." "364 ‘A’" ""
73 "Shree Ram CHS Ltd. (Prop.)" "96 (Pt.)- 99(Pt.)" ""
74 "Ram CHS Ltd. (Prop.)" "107 (Pt.)- 114 (Pt.)" ""
end
generate tag = 0
replace tag = 1 if strmatch(CTSNo1, "*-*")
keep if tag == 1
generate part1 = regexs(0) if regexm(CTSNo1, "([0-9]+)")
generate part2 = substr(regexs(0), 2, .) if regexm(CTSNo1, "-.*([0-9])")
local obs = _N
forvalues i = 1 / `obs' {
local xpa = abs(real(part1[`i']) - real(part2[`i'])) + 1
expand `xpa' if _n == `i'
}
bysort SRNo (CTSNo1): egen interim = seq()
bysort SRNo (CTSNo1): generate NCTSNo1 = real(part1) + interim - 1
drop tag part1 part2 interim
order SRNo SchemeName CTSNo1 NCTSNo1 CTSNo2
上面的代码片段产生了预期的结果:
list
+-----------------------------------------------------------------------------+
| SRNo SchemeName CTSNo1 NCTSNo1 CTSNo2 |
|-----------------------------------------------------------------------------|
1. | 71 Bhavani Housing 12(Pt.)-21(Pt.) 12 |
2. | 71 Bhavani Housing 12(Pt.)-21(Pt.) 13 |
3. | 71 Bhavani Housing 12(Pt.)-21(Pt.) 14 |
4. | 71 Bhavani Housing 12(Pt.)-21(Pt.) 15 |
5. | 71 Bhavani Housing 12(Pt.)-21(Pt.) 16 |
|-----------------------------------------------------------------------------|
6. | 71 Bhavani Housing 12(Pt.)-21(Pt.) 17 |
7. | 71 Bhavani Housing 12(Pt.)-21(Pt.) 18 |
8. | 71 Bhavani Housing 12(Pt.)-21(Pt.) 19 |
9. | 71 Bhavani Housing 12(Pt.)-21(Pt.) 20 |
10. | 71 Bhavani Housing 12(Pt.)-21(Pt.) 21 |
|-----------------------------------------------------------------------------|
11. | 73 Shree Ram CHS Ltd. (Prop.) 96 (Pt.)- 99(Pt.) 96 |
12. | 73 Shree Ram CHS Ltd. (Prop.) 96 (Pt.)- 99(Pt.) 97 |
13. | 73 Shree Ram CHS Ltd. (Prop.) 96 (Pt.)- 99(Pt.) 98 |
14. | 73 Shree Ram CHS Ltd. (Prop.) 96 (Pt.)- 99(Pt.) 99 |
15. | 74 Ram CHS Ltd. (Prop.) 107 (Pt.)- 114 (Pt.) 107 |
|-----------------------------------------------------------------------------|
16. | 74 Ram CHS Ltd. (Prop.) 107 (Pt.)- 114 (Pt.) 108 |
17. | 74 Ram CHS Ltd. (Prop.) 107 (Pt.)- 114 (Pt.) 109 |
18. | 74 Ram CHS Ltd. (Prop.) 107 (Pt.)- 114 (Pt.) 110 |
19. | 74 Ram CHS Ltd. (Prop.) 107 (Pt.)- 114 (Pt.) 111 |
20. | 74 Ram CHS Ltd. (Prop.) 107 (Pt.)- 114 (Pt.) 112 |
|-----------------------------------------------------------------------------|
21. | 74 Ram CHS Ltd. (Prop.) 107 (Pt.)- 114 (Pt.) 113 |
22. | 74 Ram CHS Ltd. (Prop.) 107 (Pt.)- 114 (Pt.) 114 |
+-----------------------------------------------------------------------------+
编辑:
我上面的解决方案中的 forvalues
循环不是必需的。另一种避免循环观察的不同方法如下:
bysort SRNo (CTSNo1): generate xpa = abs(real(part1) - real(part2)) + 1
expand xpa
我在数据集中有一堆值,它们的公式类似于 2000-3222
和 10/1-10
。
我想拆分这些,以便它列出 2000
、2001
等和 10/1
、10/2
等,它们都在自己的行中。
在 Stata 或 R 中是否有执行此操作的任何命令?
编辑:
示例数据:
input int SRNo str200 SchemeName str30 CTSNo1 str4 CTSNo2
69 "Khimji Nagar SRA Co-op.Housing Society Ltd." "467" ""
70 "Jai Bhavani CHS Ltd. (Proposed)" "7 (Pt.)" ""
71 "Shivshakti SRA CHS Ltd." "364 ‘A’" ""
72 "Shree Ram CHS Ltd. (Prop.)" "96 (Pt.) -99(Pt.)" ""
end
假设所有值都像您的示例,并且您的变量是字符串类型:
. clear
. set obs 1
number of observations (_N) was 0, now 1
.
. generate string1 = "2000-3222"
. generate new_string1 = substr(string1, 1, 4)
.
. generate string2 = "10/1-10"
. generate new_string2 = substr(string2, 1, 4)
.
. list
+-------------------------------------------+
| string1 new_st~1 string2 new_st~2 |
|-------------------------------------------|
1. | 2000-3222 2000 10/1-10 10/1 |
+-------------------------------------------+
如果您只需要原始变量的某个部分,此解决方案很有用。
编辑:
使用@Nick 的极好的建议:
clear
set obs 1
generate string1 = "2000-3222"
generate string2 = "10/1-10"
split string1, parse("-") generate(split_string1)
split string2, parse("/") generate(split_string2)
list
+-----------------------------------------------------------------+
| string1 string2 split~11 split~12 split~21 split~22 |
|-----------------------------------------------------------------|
1. | 2000-3222 10/1-10 2000 3222 10 1-10 |
+-----------------------------------------------------------------+
如您所见,此解决方案将为您提供两个用于 string1
的变量和另外两个用于 string2
的变量,每个变量都包含原始变量的两个(单独的)部分。
根据您的示例数据(我在其中添加了一些观察结果以使事情更具说明性),您需要以下内容:
clear
input int SRNo str200 SchemeName str30 CTSNo1 str4 CTSNo2
69 "Khimji Nagar SRA Co-op.Housing Society Ltd." "467" ""
70 "Jai Bhavani CHS Ltd. (Proposed)" "7 (Pt.)" ""
71 "Bhavani Housing" "12(Pt.)-21(Pt.)" ""
72 "Shivshakti SRA CHS Ltd." "364 ‘A’" ""
73 "Shree Ram CHS Ltd. (Prop.)" "96 (Pt.)- 99(Pt.)" ""
74 "Ram CHS Ltd. (Prop.)" "107 (Pt.)- 114 (Pt.)" ""
end
generate tag = 0
replace tag = 1 if strmatch(CTSNo1, "*-*")
keep if tag == 1
generate part1 = regexs(0) if regexm(CTSNo1, "([0-9]+)")
generate part2 = substr(regexs(0), 2, .) if regexm(CTSNo1, "-.*([0-9])")
local obs = _N
forvalues i = 1 / `obs' {
local xpa = abs(real(part1[`i']) - real(part2[`i'])) + 1
expand `xpa' if _n == `i'
}
bysort SRNo (CTSNo1): egen interim = seq()
bysort SRNo (CTSNo1): generate NCTSNo1 = real(part1) + interim - 1
drop tag part1 part2 interim
order SRNo SchemeName CTSNo1 NCTSNo1 CTSNo2
上面的代码片段产生了预期的结果:
list
+-----------------------------------------------------------------------------+
| SRNo SchemeName CTSNo1 NCTSNo1 CTSNo2 |
|-----------------------------------------------------------------------------|
1. | 71 Bhavani Housing 12(Pt.)-21(Pt.) 12 |
2. | 71 Bhavani Housing 12(Pt.)-21(Pt.) 13 |
3. | 71 Bhavani Housing 12(Pt.)-21(Pt.) 14 |
4. | 71 Bhavani Housing 12(Pt.)-21(Pt.) 15 |
5. | 71 Bhavani Housing 12(Pt.)-21(Pt.) 16 |
|-----------------------------------------------------------------------------|
6. | 71 Bhavani Housing 12(Pt.)-21(Pt.) 17 |
7. | 71 Bhavani Housing 12(Pt.)-21(Pt.) 18 |
8. | 71 Bhavani Housing 12(Pt.)-21(Pt.) 19 |
9. | 71 Bhavani Housing 12(Pt.)-21(Pt.) 20 |
10. | 71 Bhavani Housing 12(Pt.)-21(Pt.) 21 |
|-----------------------------------------------------------------------------|
11. | 73 Shree Ram CHS Ltd. (Prop.) 96 (Pt.)- 99(Pt.) 96 |
12. | 73 Shree Ram CHS Ltd. (Prop.) 96 (Pt.)- 99(Pt.) 97 |
13. | 73 Shree Ram CHS Ltd. (Prop.) 96 (Pt.)- 99(Pt.) 98 |
14. | 73 Shree Ram CHS Ltd. (Prop.) 96 (Pt.)- 99(Pt.) 99 |
15. | 74 Ram CHS Ltd. (Prop.) 107 (Pt.)- 114 (Pt.) 107 |
|-----------------------------------------------------------------------------|
16. | 74 Ram CHS Ltd. (Prop.) 107 (Pt.)- 114 (Pt.) 108 |
17. | 74 Ram CHS Ltd. (Prop.) 107 (Pt.)- 114 (Pt.) 109 |
18. | 74 Ram CHS Ltd. (Prop.) 107 (Pt.)- 114 (Pt.) 110 |
19. | 74 Ram CHS Ltd. (Prop.) 107 (Pt.)- 114 (Pt.) 111 |
20. | 74 Ram CHS Ltd. (Prop.) 107 (Pt.)- 114 (Pt.) 112 |
|-----------------------------------------------------------------------------|
21. | 74 Ram CHS Ltd. (Prop.) 107 (Pt.)- 114 (Pt.) 113 |
22. | 74 Ram CHS Ltd. (Prop.) 107 (Pt.)- 114 (Pt.) 114 |
+-----------------------------------------------------------------------------+
编辑:
我上面的解决方案中的 forvalues
循环不是必需的。另一种避免循环观察的不同方法如下:
bysort SRNo (CTSNo1): generate xpa = abs(real(part1) - real(part2)) + 1
expand xpa