将日期和月份变量转换为数值(Stata)

Converting day and month variables into Numerical values (Stata)

我有关于在线职位发布的数据,但是当我希望将一些变量结构化为数字以创建时间序列图时,我将其结构化为字符串,如 here.

我有兴趣转换成数字变量的三个变量如下所示:

dataex month posted_date revenue
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input str10 month str19 posted_date str32 revenue
"March_2021" "2021-03-08 10:44:15" "Less than  million (USD)"      
"March_2021" "2021-03-08 10:44:15" "Less than  million (USD)"      
"Dec_2020"   "2020-12-13 08:04:59" "+ billion (USD)"              
"Nov_2020"   "44150.33611"         "+ billion (USD)"              
"Dec_2020"   "2021-01-04 04:59:40" "+ billion (USD)"              
"Nov_2020"   "44167.24444"         "+ billion (USD)"              
"Dec_2020"   "2020-12-16 10:49:38" "+ billion (USD)"              
"Nov_2020"   "44167.24514"         "+ billion (USD)"              
"Nov_2020"   "44172.01319"         "+ billion (USD)"              
"Dec_2020"   "2020-12-30 05:52:25" "+ billion (USD)"              
"April_2021" "2021-04-21 04:16:12" ""                                
"April_2021" "2021-04-21 04:16:12" ""                                
"Feb_2021"   "2021-03-01 01:03:09" ""                                
"Feb_2021"   "2021-03-01 01:03:09" ""                                
"Feb_2021"   "2021-03-01 01:03:09" ""                                
"April_2021" "2021-04-21 05:57:59" ""                                
"April_2021" "2021-04-21 05:57:59" ""                                
"Dec_2020"   "2020-12-22 08:13:06" "0 million to  billion (USD)"

我希望新变量如下所示:

month_n posted_date_n revenue_n 
02/21   09/02/21       0m_1B
03/21   14/03/21       +10B
04/21   11/04/21       +1m

所以根据说明here,我运行下面的代码:

// Destring variables string variables with numerical values 
gen posted_date_n = real(posted_date)
gen month_n = real(month)
gen revenue_n = real(revenue)

但是,我无法真正得到我要找的东西,相反,数据如下所示:

dataex revenue_n posted_date_n month_n
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(revenue_n posted_date_n month_n)
.        . .
.        . .
.        . .
. 44150.34 .
.        . .
. 44167.25 .
.        . .
. 44167.25 .
. 44172.01 .
.        . .
.        . .
.        . .

我能够 运行 代码将数据转换成几乎你想要的形式,但不是像 44150.33611 等日期值。 这些似乎是@JR96 指出的 excel 格式。

我建议使用 split 函数,Nick Cox 写的非常方便的文章值得一读 (source)。

// Month/Year
split month, p("_")
drop month
rename month1 month
gen month_n = date(month,"M")
format month_n %td_Month
rename month2 year
destring year, replace
format year %ty
rename year year_n

// Posted Date
split posted_date, p(" ")
drop posted_date
rename posted_date1 date
rename posted_date2 time
gen posted_date_n = date(date, "YMD")
format %tdNN/DD/CCYY posted_date_n

这并不能完全满足您的要求,但在我看来总比没有好。示例输出为,

month_n, year_n, posted_date_n
March, 2021, 03/08/2021
March, 2021, 03/08/2021

所有内容都被格式化为 Stata 可以识别的日期。也许其他人可以跳到此处合并 month_nyear_n 列?