将日期和月份变量转换为数值(Stata)
Converting day and month variables into Numerical values (Stata)
我有关于在线职位发布的数据,但是当我希望将一些变量结构化为数字以创建时间序列图时,我将其结构化为字符串,如 here.
我有兴趣转换成数字变量的三个变量如下所示:
dataex month posted_date revenue
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input str10 month str19 posted_date str32 revenue
"March_2021" "2021-03-08 10:44:15" "Less than million (USD)"
"March_2021" "2021-03-08 10:44:15" "Less than million (USD)"
"Dec_2020" "2020-12-13 08:04:59" "+ billion (USD)"
"Nov_2020" "44150.33611" "+ billion (USD)"
"Dec_2020" "2021-01-04 04:59:40" "+ billion (USD)"
"Nov_2020" "44167.24444" "+ billion (USD)"
"Dec_2020" "2020-12-16 10:49:38" "+ billion (USD)"
"Nov_2020" "44167.24514" "+ billion (USD)"
"Nov_2020" "44172.01319" "+ billion (USD)"
"Dec_2020" "2020-12-30 05:52:25" "+ billion (USD)"
"April_2021" "2021-04-21 04:16:12" ""
"April_2021" "2021-04-21 04:16:12" ""
"Feb_2021" "2021-03-01 01:03:09" ""
"Feb_2021" "2021-03-01 01:03:09" ""
"Feb_2021" "2021-03-01 01:03:09" ""
"April_2021" "2021-04-21 05:57:59" ""
"April_2021" "2021-04-21 05:57:59" ""
"Dec_2020" "2020-12-22 08:13:06" "0 million to billion (USD)"
我希望新变量如下所示:
month_n posted_date_n revenue_n
02/21 09/02/21 0m_1B
03/21 14/03/21 +10B
04/21 11/04/21 +1m
所以根据说明here,我运行下面的代码:
// Destring variables string variables with numerical values
gen posted_date_n = real(posted_date)
gen month_n = real(month)
gen revenue_n = real(revenue)
但是,我无法真正得到我要找的东西,相反,数据如下所示:
dataex revenue_n posted_date_n month_n
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(revenue_n posted_date_n month_n)
. . .
. . .
. . .
. 44150.34 .
. . .
. 44167.25 .
. . .
. 44167.25 .
. 44172.01 .
. . .
. . .
. . .
我能够 运行 代码将数据转换成几乎你想要的形式,但不是像 44150.33611 等日期值。 这些似乎是@JR96 指出的 excel 格式。
我建议使用 split 函数,Nick Cox 写的非常方便的文章值得一读 (source)。
// Month/Year
split month, p("_")
drop month
rename month1 month
gen month_n = date(month,"M")
format month_n %td_Month
rename month2 year
destring year, replace
format year %ty
rename year year_n
// Posted Date
split posted_date, p(" ")
drop posted_date
rename posted_date1 date
rename posted_date2 time
gen posted_date_n = date(date, "YMD")
format %tdNN/DD/CCYY posted_date_n
这并不能完全满足您的要求,但在我看来总比没有好。示例输出为,
month_n, year_n, posted_date_n
March, 2021, 03/08/2021
March, 2021, 03/08/2021
所有内容都被格式化为 Stata 可以识别的日期。也许其他人可以跳到此处合并 month_n
和 year_n
列?
我有关于在线职位发布的数据,但是当我希望将一些变量结构化为数字以创建时间序列图时,我将其结构化为字符串,如 here.
我有兴趣转换成数字变量的三个变量如下所示:
dataex month posted_date revenue
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input str10 month str19 posted_date str32 revenue
"March_2021" "2021-03-08 10:44:15" "Less than million (USD)"
"March_2021" "2021-03-08 10:44:15" "Less than million (USD)"
"Dec_2020" "2020-12-13 08:04:59" "+ billion (USD)"
"Nov_2020" "44150.33611" "+ billion (USD)"
"Dec_2020" "2021-01-04 04:59:40" "+ billion (USD)"
"Nov_2020" "44167.24444" "+ billion (USD)"
"Dec_2020" "2020-12-16 10:49:38" "+ billion (USD)"
"Nov_2020" "44167.24514" "+ billion (USD)"
"Nov_2020" "44172.01319" "+ billion (USD)"
"Dec_2020" "2020-12-30 05:52:25" "+ billion (USD)"
"April_2021" "2021-04-21 04:16:12" ""
"April_2021" "2021-04-21 04:16:12" ""
"Feb_2021" "2021-03-01 01:03:09" ""
"Feb_2021" "2021-03-01 01:03:09" ""
"Feb_2021" "2021-03-01 01:03:09" ""
"April_2021" "2021-04-21 05:57:59" ""
"April_2021" "2021-04-21 05:57:59" ""
"Dec_2020" "2020-12-22 08:13:06" "0 million to billion (USD)"
我希望新变量如下所示:
month_n posted_date_n revenue_n
02/21 09/02/21 0m_1B
03/21 14/03/21 +10B
04/21 11/04/21 +1m
所以根据说明here,我运行下面的代码:
// Destring variables string variables with numerical values
gen posted_date_n = real(posted_date)
gen month_n = real(month)
gen revenue_n = real(revenue)
但是,我无法真正得到我要找的东西,相反,数据如下所示:
dataex revenue_n posted_date_n month_n
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(revenue_n posted_date_n month_n)
. . .
. . .
. . .
. 44150.34 .
. . .
. 44167.25 .
. . .
. 44167.25 .
. 44172.01 .
. . .
. . .
. . .
我能够 运行 代码将数据转换成几乎你想要的形式,但不是像 44150.33611 等日期值。 这些似乎是@JR96 指出的 excel 格式。
我建议使用 split 函数,Nick Cox 写的非常方便的文章值得一读 (source)。
// Month/Year
split month, p("_")
drop month
rename month1 month
gen month_n = date(month,"M")
format month_n %td_Month
rename month2 year
destring year, replace
format year %ty
rename year year_n
// Posted Date
split posted_date, p(" ")
drop posted_date
rename posted_date1 date
rename posted_date2 time
gen posted_date_n = date(date, "YMD")
format %tdNN/DD/CCYY posted_date_n
这并不能完全满足您的要求,但在我看来总比没有好。示例输出为,
month_n, year_n, posted_date_n
March, 2021, 03/08/2021
March, 2021, 03/08/2021
所有内容都被格式化为 Stata 可以识别的日期。也许其他人可以跳到此处合并 month_n
和 year_n
列?