如何在 SAS 中绘制简单的线图
How to plot a simple lineplot in SAS
我的数据结构如下(这些只是样本数据,因为原始数据是保密的)
id | crime | location | crimedate
------------------------------
1 | Theft | public | 2019-01-04
1 | Theft | public | 2019-02-06
1 | Theft | public | 2019-02-20
1 | Theft | private | 2019-03-10
1 | Theft | private | 2019-03-21
1 | Theft | public | 2019-03-01
1 | Theft | private | 2019-03-14
1 | Theft | public | 2019-06-15
1 | Murder | private | 2019-01-04
1 | Murder | private | 2019-10-20
1 | Murder | private | 2019-11-18
1 | Murder | private | 2019-01-01
1 | Assault | private | 2019-03-19
1 | Assault | private | 2019-01-21
1 | Assault | public | 2019-04-11
1 | Assault | public | 2019-01-10
… | … | … | …
我的目标是创建一个线图(时间序列图)来显示一年中三种犯罪的数量如何变化。因此,我想在 x 轴上显示月份 (1-12),在 y 轴上显示每个月的犯罪数量。应该有两行(每个位置一行)。
我从这段代码开始:
DATA new;
SET old;
month=month(datepart(crimedate));
RUN;
PROC sgplot DATA=new;
series x=month y=no_of_crimes / group=location;
run;
但我不知道如何汇总每月的犯罪数量。谁能给我一个提示?我一直在互联网上寻找解决方案,但通常示例只使用已经聚合的数据。
如果您想按地点分组而不按犯罪类型定义:
proc sql noprint;
create table new as
select id,location
, month(crimedate) as month,count(crime) as crime_n
from old
group by id,location,CALCULATED month;
quit;
proc sgplot data=new;
series x=month y=crime_n /group=location;
run;
结果:
要按犯罪类型显示不同的系列,您可以使用 sgpanel
:
proc sql noprint;
create table new as
select id,crime,location, month(crimedate) as month,count(crime) as crime_n
from old
group by id,crime,location,CALCULATED month;
quit;
proc sgpanel DATA=new;
panelby location;
series x=month y=crime_n /group=crime;
run;
结果是:
执行此数据的另一种变体:
proc sql noprint;
create table new as
select id,crime,location, month(crimedate) as month,count(crime) as crime_n
from old
group by id,crime,location,CALCULATED month;
quit;
proc sgpanel DATA=new;
panelby crime;
series x=month y=crime_n /group=location GROUPDISPLAY=cluster;
run;
结果是:
当然,您可以根据需要指定此地块。
SG 例程将为 VBAR
或 HBAR
语句聚合 Y 轴值。 SERIES
语句中显示的相同聚合信息必须来自先验聚合计算,使用 Proc SUMMARY
.
可以轻松完成
此外,要在单独的视觉图中绘制每项犯罪的计数,您需要一个 BY CRIME
语句,或者 Proc SGPANEL
和 PANELBY crime
。
犯罪日期时间值不必向下转换为日期值,您可以在程序中使用适当的 datetime
格式,它们将根据格式化值自动聚合。
一些模拟犯罪数据的示例:
data have;
do precinct = 1 to 10;
do date = '01jan2018'd to '31dec2018'd;
do seq = 1 to 20*ranuni(123);
length crime location ;
crime = scan('theft,assault,robbery,dnd', ceil(4*ranuni(123)));
location = scan ('public,private', ceil(2*ranuni(123)));
crime_dt = dhms(date,0,0,floor('24:00't*ranuni(123)));
output;
end;
end;
end;
drop date;
format crime_dt datetime19.;
run;
* shorter graphs for SO answer;
ods graphics / height=300px;
proc sgplot data=have;
title "VBAR all crimes combined by location";
vbar crime_dt
/ group=location
groupdisplay=cluster
;
format crime_dt dtmonyy7.;
run;
proc sgpanel data=have;
title "VBAR crime * location";
panelby crime;
vbar crime_dt
/ group=location
groupdisplay=cluster
;
format crime_dt dtmonyy7.;
run;
proc summary data=have noprint;
class crime_dt crime location;
format crime_dt dtmonyy7.;
output out=freqs;
run;
proc sgplot data=freqs;
title "SERIES all crimes,summary _FREQ_ * location";
where _type_ = 5;
series x=crime_dt y=_freq_ / group=location;
xaxis type=discrete;
run;
proc sgpanel data=freqs;
title "SERIES all crimes,summary _FREQ_ * crime * location";
where _type_ = 7;
panelby crime;
series x=crime_dt y=_freq_ / group=location;
rowaxis min=0;
colaxis type=discrete;
run;
为了更直接地回答问题,VLINE
或 HLINE
图将为您汇总数据,类似于 运行 a proc freq
然后 proc sgplot
与 series
.
使用 Richard 的测试数据,您会发现这与他的 PROC FREQ -> SERIES 给出的图完全相同:
data have;
do precinct = 1 to 10;
do date = '01jan2018'd to '31dec2018'd;
do seq = 1 to 20*ranuni(123);
length crime location ;
crime = scan('theft,assault,robbery,dnd', ceil(4*ranuni(123)));
location = scan ('public,private', ceil(2*ranuni(123)));
crime_dt = dhms(date,0,0,floor('24:00't*ranuni(123)));
output;
end;
end;
end;
drop date;
format crime_dt datetime19.;
run;
proc sgplot data=have;
vline crime_dt/group=location groupdisplay=cluster;
format crime_dt dtmonyy7.;
run;
我的数据结构如下(这些只是样本数据,因为原始数据是保密的)
id | crime | location | crimedate
------------------------------
1 | Theft | public | 2019-01-04
1 | Theft | public | 2019-02-06
1 | Theft | public | 2019-02-20
1 | Theft | private | 2019-03-10
1 | Theft | private | 2019-03-21
1 | Theft | public | 2019-03-01
1 | Theft | private | 2019-03-14
1 | Theft | public | 2019-06-15
1 | Murder | private | 2019-01-04
1 | Murder | private | 2019-10-20
1 | Murder | private | 2019-11-18
1 | Murder | private | 2019-01-01
1 | Assault | private | 2019-03-19
1 | Assault | private | 2019-01-21
1 | Assault | public | 2019-04-11
1 | Assault | public | 2019-01-10
… | … | … | …
我的目标是创建一个线图(时间序列图)来显示一年中三种犯罪的数量如何变化。因此,我想在 x 轴上显示月份 (1-12),在 y 轴上显示每个月的犯罪数量。应该有两行(每个位置一行)。
我从这段代码开始:
DATA new;
SET old;
month=month(datepart(crimedate));
RUN;
PROC sgplot DATA=new;
series x=month y=no_of_crimes / group=location;
run;
但我不知道如何汇总每月的犯罪数量。谁能给我一个提示?我一直在互联网上寻找解决方案,但通常示例只使用已经聚合的数据。
如果您想按地点分组而不按犯罪类型定义:
proc sql noprint;
create table new as
select id,location
, month(crimedate) as month,count(crime) as crime_n
from old
group by id,location,CALCULATED month;
quit;
proc sgplot data=new;
series x=month y=crime_n /group=location;
run;
结果:
要按犯罪类型显示不同的系列,您可以使用 sgpanel
:
proc sql noprint;
create table new as
select id,crime,location, month(crimedate) as month,count(crime) as crime_n
from old
group by id,crime,location,CALCULATED month;
quit;
proc sgpanel DATA=new;
panelby location;
series x=month y=crime_n /group=crime;
run;
结果是:
执行此数据的另一种变体:
proc sql noprint;
create table new as
select id,crime,location, month(crimedate) as month,count(crime) as crime_n
from old
group by id,crime,location,CALCULATED month;
quit;
proc sgpanel DATA=new;
panelby crime;
series x=month y=crime_n /group=location GROUPDISPLAY=cluster;
run;
结果是:
当然,您可以根据需要指定此地块。
SG 例程将为 VBAR
或 HBAR
语句聚合 Y 轴值。 SERIES
语句中显示的相同聚合信息必须来自先验聚合计算,使用 Proc SUMMARY
.
此外,要在单独的视觉图中绘制每项犯罪的计数,您需要一个 BY CRIME
语句,或者 Proc SGPANEL
和 PANELBY crime
。
犯罪日期时间值不必向下转换为日期值,您可以在程序中使用适当的 datetime
格式,它们将根据格式化值自动聚合。
一些模拟犯罪数据的示例:
data have;
do precinct = 1 to 10;
do date = '01jan2018'd to '31dec2018'd;
do seq = 1 to 20*ranuni(123);
length crime location ;
crime = scan('theft,assault,robbery,dnd', ceil(4*ranuni(123)));
location = scan ('public,private', ceil(2*ranuni(123)));
crime_dt = dhms(date,0,0,floor('24:00't*ranuni(123)));
output;
end;
end;
end;
drop date;
format crime_dt datetime19.;
run;
* shorter graphs for SO answer;
ods graphics / height=300px;
proc sgplot data=have;
title "VBAR all crimes combined by location";
vbar crime_dt
/ group=location
groupdisplay=cluster
;
format crime_dt dtmonyy7.;
run;
proc sgpanel data=have;
title "VBAR crime * location";
panelby crime;
vbar crime_dt
/ group=location
groupdisplay=cluster
;
format crime_dt dtmonyy7.;
run;
proc summary data=have noprint;
class crime_dt crime location;
format crime_dt dtmonyy7.;
output out=freqs;
run;
proc sgplot data=freqs;
title "SERIES all crimes,summary _FREQ_ * location";
where _type_ = 5;
series x=crime_dt y=_freq_ / group=location;
xaxis type=discrete;
run;
proc sgpanel data=freqs;
title "SERIES all crimes,summary _FREQ_ * crime * location";
where _type_ = 7;
panelby crime;
series x=crime_dt y=_freq_ / group=location;
rowaxis min=0;
colaxis type=discrete;
run;
为了更直接地回答问题,VLINE
或 HLINE
图将为您汇总数据,类似于 运行 a proc freq
然后 proc sgplot
与 series
.
使用 Richard 的测试数据,您会发现这与他的 PROC FREQ -> SERIES 给出的图完全相同:
data have;
do precinct = 1 to 10;
do date = '01jan2018'd to '31dec2018'd;
do seq = 1 to 20*ranuni(123);
length crime location ;
crime = scan('theft,assault,robbery,dnd', ceil(4*ranuni(123)));
location = scan ('public,private', ceil(2*ranuni(123)));
crime_dt = dhms(date,0,0,floor('24:00't*ranuni(123)));
output;
end;
end;
end;
drop date;
format crime_dt datetime19.;
run;
proc sgplot data=have;
vline crime_dt/group=location groupdisplay=cluster;
format crime_dt dtmonyy7.;
run;