跟踪连续变量中每次观察的变化

Tracking changes per observation in a continuous variable

我正在尝试评估在 U.S 县推出的 属性 税收政策,其中超过阈值(即 500 平方米)的房产面临更高的 属性 税收率低于截止值。 我有 1990 年至 2006 年间该县所有房产的微观数据。 有趣的是,我知道一些房产超过 500 平方米的土地所有者试图通过将他们的 属性 分成几个子房产来避免征税,这样他们就在截止点以下。

但是,我试图通过跟踪两个变量“lot_number”和“面积”进行实证调查,这两个变量指的是该县每个 属性 的平面图和面积。具体来说,如果我注意到“总体规划”100 中假设的“lot_number”A 将其“面积”从税前 800 平方米更改为 400 平方米 post 政策公告,那么这就是证据避税行为。

但是,我不确定如何编码我的数据,以便我可以如上所述监控避税行为。

我的数据集如下所示:

* Example generated by -dataex-. For more info, type help dataex
clear
input str109 masterplan str66 lotnumber str40 area str30 pricesqm str42 transactiondate
"/2022"        " 46"                                 "625"      "260.8"     "2004/01/24"
"/2485"        " 261/2"                              "435"      "103.4483"  "2004/01/29"
"/2485"        " 63"                                 "625"      "75.2"      "2004/01/23"
"/3152"        " 114"                                "500"      "170"       "2004/01/28"
"/3152"        " 134"                                "993.05"   "160.6163"  "2004/01/06"
"/3152"        " 141"                                "600"      "131.44"    "2005/01/28"
"/3152"        " 159"                                "500"      "154"       "2003/01/28"
"/3152"        " 161"                                "500"      "155"       "2002/01/29"

一种分析方法是按照 Nick 的建议进行操作并使用 destring area pricesqm。请注意,在以下代码中,我向您的数据示例添加了四行,以便有一个 masterplan-lotnumber 随时间变化的示例:

clear

input str109 masterplan str66 lotnumber str40 area str30 pricesqm str42 transactiondate,
"/2022"        " 46"                                 "625"      "260.8"     "2004/01/24"
"/2485"        " 261/2"                              "435"      "103.4483"  "2004/01/29"
"/2485"        " 63"                                 "625"      "75.2"      "2004/01/23"
"/3152"        " 114"                                "500"      "170"       "2004/01/28"
"/3152"        " 134"                                "993.05"   "160.6163"  "2004/01/06"
"/3152"        " 141"                                "600"      "131.44"    "2005/01/28"
"/3152"        " 159"                                "500"      "154"       "2003/01/28"
"/3152"        " 161"                                "500"      "155"       "2002/01/29"
"/9998"        " 999"                                "800"      "155"       "2003/02/28"
"/9998"        " 999"                                "400"      "155"       "2004/03/15"
"/9999"        " 999"                                "800"      "155"       "2004/02/28"
"/9999"        " 999"                                "800"      "155"       "2005/03/15"

end

compress
destring area pricesqm, replace

*create a clean date from the transaction date and format for ease
gen trans_date_clean = daily(transactiondate, "YMD")
format trans_date_clean %tdnn/dd/YY

*create an id for each masterplan-lotnumber
sort masterplan lotnumber trans_date_clean
egen id = group(masterplan lot)

*create a flag that equals 1 if the id is the same as the previous
*id, the previous area is greater than 500, and the area of this 
*observation is less than 500. This approach depends on the sort
*before the egen command above so that masterplan-lotnumbers are
*grouped together.
gen flag = 0
replace flag = 1 if id == id[_n-1] & area[_n-1] > 500 & area < 500

此外,这只是解决此问题的一种方法。在此之后,您可能需要采取措施确保不会重复计算任何频繁更改区域的批次,或者如果区域更改需要在特定时间段内进行,您可能需要向标志添加额外条件。或者,您也可以考虑通过 id(masterplan-lotnumber)和交易年份(从交易日期开始)重塑数据,并比较两年之间的住房面积差异。