Stata:导入带有多个多字符定界符的txt
Stata: importing txt with several multi character delimiters
我的数据包含非常奇怪的分隔符:
1,|ABC1|,|BUD|,|Fed Budget & Appropriations|,|t1|
2,|ABC2|,|LBR|,|Labor, Antitrust & Workplace|,|t2|
3,|ABC3|,|UNM|,|Unemployment|,|t1|
所以分隔符是逗号和每个变量,但第一个(标识符)在两个管道之间。问题是第四个变量也使用逗号,所以我不能简单地使用逗号作为分隔符并删除管道。我找到了一种通过终端执行一些查找和替换操作来处理数据的方法,但我想通过 Stata 来完成。有人知道怎么做吗?
我将您的数据示例放入文本文件中,发现可以很好地自动检测到分隔符。然后我 drop
使用 Stata Journal.
中的 findname
对所有全是逗号或全是缺失的变量进行了编辑
. import delimited "troublesome.txt"
(9 vars, 3 obs)
. list
+-------------------------------------------------------------------------+
| v1 v2 v3 v4 v5 v6 v7 v8 v9 |
|-------------------------------------------------------------------------|
1. | 1, ABC1 , BUD , Fed Budget & Appropriations , t1 . |
2. | 2, ABC2 , LBR , Labor, Antitrust & Workplace , t2 . |
3. | 3, ABC3 , UNM , Unemployment , t1 . |
+-------------------------------------------------------------------------+
. findname, all(@ == ",")
v3 v5 v7
. drop `r(varlist)'
. findname, all(missing(@))
v9
. drop `r(varlist)'
. destring v1, ignore(",") replace
v1: character , removed; replaced as byte
. list
+-----------------------------------------------------+
| v1 v2 v4 v6 v8 |
|-----------------------------------------------------|
1. | 1 ABC1 BUD Fed Budget & Appropriations t1 |
2. | 2 ABC2 LBR Labor, Antitrust & Workplace t2 |
3. | 3 ABC3 UNM Unemployment t1 |
+-----------------------------------------------------+
我的数据包含非常奇怪的分隔符:
1,|ABC1|,|BUD|,|Fed Budget & Appropriations|,|t1|
2,|ABC2|,|LBR|,|Labor, Antitrust & Workplace|,|t2|
3,|ABC3|,|UNM|,|Unemployment|,|t1|
所以分隔符是逗号和每个变量,但第一个(标识符)在两个管道之间。问题是第四个变量也使用逗号,所以我不能简单地使用逗号作为分隔符并删除管道。我找到了一种通过终端执行一些查找和替换操作来处理数据的方法,但我想通过 Stata 来完成。有人知道怎么做吗?
我将您的数据示例放入文本文件中,发现可以很好地自动检测到分隔符。然后我 drop
使用 Stata Journal.
findname
对所有全是逗号或全是缺失的变量进行了编辑
. import delimited "troublesome.txt"
(9 vars, 3 obs)
. list
+-------------------------------------------------------------------------+
| v1 v2 v3 v4 v5 v6 v7 v8 v9 |
|-------------------------------------------------------------------------|
1. | 1, ABC1 , BUD , Fed Budget & Appropriations , t1 . |
2. | 2, ABC2 , LBR , Labor, Antitrust & Workplace , t2 . |
3. | 3, ABC3 , UNM , Unemployment , t1 . |
+-------------------------------------------------------------------------+
. findname, all(@ == ",")
v3 v5 v7
. drop `r(varlist)'
. findname, all(missing(@))
v9
. drop `r(varlist)'
. destring v1, ignore(",") replace
v1: character , removed; replaced as byte
. list
+-----------------------------------------------------+
| v1 v2 v4 v6 v8 |
|-----------------------------------------------------|
1. | 1 ABC1 BUD Fed Budget & Appropriations t1 |
2. | 2 ABC2 LBR Labor, Antitrust & Workplace t2 |
3. | 3 ABC3 UNM Unemployment t1 |
+-----------------------------------------------------+