将多项式曲线拟合到时间序列数据
Fitting a polinomial curve to time series data
我有一个以每月文章频率为 y 轴的时间序列图。数据如下所示:
Count.V Date Month Week Year
2637 6 2006-01-02 2006-01-01 2006-01-02 2006-01-01
406 4 2006-01-03 2006-01-01 2006-01-02 2006-01-01
543 4 2006-01-04 2006-01-01 2006-01-02 2006-01-01
998 3 2006-01-05 2006-01-01 2006-01-02 2006-01-01
1400 4 2006-01-06 2006-01-01 2006-01-02 2006-01-01
2218 4 2006-02-01 2006-02-01 2006-01-30 2006-01-01
2792 6 2006-02-02 2006-02-01 2006-01-30 2006-01-01
2488 10 2006-02-03 2006-02-01 2006-01-30 2006-01-01
954 8 2006-02-04 2006-02-01 2006-01-30 2006-01-01
2622 3 2006-02-06 2006-02-01 2006-02-06 2006-01-01
2321 11 2006-02-07 2006-02-01 2006-02-06 2006-01-01
2452 10 2006-03-21 2006-03-01 2006-03-20 2006-01-01
2267 5 2006-03-22 2006-03-01 2006-03-20 2006-01-01
1408 3 2006-03-23 2006-03-01 2006-03-20 2006-01-01
2602 3 2006-03-24 2006-03-01 2006-03-20 2006-01-01
2489 5 2006-03-25 2006-03-01 2006-03-20 2006-01-01
2771 1 2006-03-27 2006-03-01 2006-03-27 2006-01-01
我使用 ggplot2 绘制它:
MyPlot <- ggplot(data = df, aes(x = Month, y = Count.V)) + stat_summary(fun.y = sum, geom ="line") + scale_x_date(
labels = date_format("%m-%y"),
breaks = "3 months")
然而,当我尝试将多项式曲线拟合到数据时,例如
MyPlot + stat_smooth(method = "lm", formula = y ~ poly(x, 3), size = 1)
出现问题:
我做错了什么?
编辑:
添加了多个月份的数据框部分:
> dput(df)
structure(list(Count.V = c(6L, 4L, 4L, 3L, 4L, 5L, 2L, 8L, 6L,
5L, 12L, 1L, 2L, 3L, 4L, 2L, 2L, 4L, 4L, 4L, 6L, 6L, 2L, 4L,
4L, 6L, 10L, 8L, 3L, 11L, 8L, 13L, 3L, 9L, 7L, 4L, 7L, 9L, 5L,
4L, 5L, 6L, 5L, 9L, 5L, 11L, 4L, 6L, 2L, 8L, 3L, 5L, 4L, 3L,
5L, 4L, 2L, 3L, 3L, 3L, 8L, 6L, 1L, 3L, 10L, 5L, 3L, 3L, 5L,
1L, 8L, 4L, 3L, 2L, 1L, 4L, 4L, 4L, 5L, 7L, 8L, 3L, 4L, 7L, 5L,
3L, 3L, 4L, 6L, 3L, 2L, 3L, 2L, 5L, 6L, 4L, 5L, 8L, 3L, 4L),
Date = structure(c(13150, 13151, 13152, 13153, 13154, 13155,
13157, 13158, 13159, 13161, 13162, 13164, 13165, 13166, 13168,
13169, 13171, 13172, 13173, 13174, 13175, 13176, 13178, 13179,
13180, 13181, 13182, 13183, 13185, 13186, 13187, 13188, 13189,
13190, 13192, 13193, 13194, 13195, 13196, 13197, 13199, 13200,
13201, 13202, 13203, 13204, 13206, 13207, 13208, 13209, 13210,
13211, 13214, 13215, 13216, 13217, 13218, 13220, 13221, 13222,
13223, 13224, 13225, 13227, 13228, 13229, 13230, 13231, 13232,
13234, 13235, 13236, 13237, 13238, 13239, 13241, 13242, 13243,
13244, 13245, 13246, 13248, 13249, 13250, 13251, 13252, 13253,
13256, 13257, 13258, 13259, 13260, 13262, 13263, 13264, 13265,
13266, 13267, 13270, 13271), class = "Date"), Month = structure(c(13149,
13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149,
13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149,
13149, 13149, 13149, 13149, 13149, 13180, 13180, 13180, 13180,
13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180,
13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180,
13180, 13180, 13208, 13208, 13208, 13208, 13208, 13208, 13208,
13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208,
13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208,
13208, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239,
13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239,
13239, 13239, 13239, 13239, 13239, 13239, 13239, 13269, 13269
), class = "Date"), Week = structure(c(13150, 13150, 13150,
13150, 13150, 13150, 13157, 13157, 13157, 13157, 13157, 13164,
13164, 13164, 13164, 13164, 13171, 13171, 13171, 13171, 13171,
13171, 13178, 13178, 13178, 13178, 13178, 13178, 13185, 13185,
13185, 13185, 13185, 13185, 13192, 13192, 13192, 13192, 13192,
13192, 13199, 13199, 13199, 13199, 13199, 13199, 13206, 13206,
13206, 13206, 13206, 13206, 13213, 13213, 13213, 13213, 13213,
13220, 13220, 13220, 13220, 13220, 13220, 13227, 13227, 13227,
13227, 13227, 13227, 13234, 13234, 13234, 13234, 13234, 13234,
13241, 13241, 13241, 13241, 13241, 13241, 13248, 13248, 13248,
13248, 13248, 13248, 13255, 13255, 13255, 13255, 13255, 13262,
13262, 13262, 13262, 13262, 13262, 13269, 13269), class = "Date"),
Year = structure(c(13149, 13149, 13149, 13149, 13149, 13149,
13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149,
13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149,
13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149,
13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149,
13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149,
13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149,
13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149,
13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149,
13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149,
13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149,
13149, 13149, 13149, 13149), class = "Date")), .Names = c("Count.V",
"Date", "Month", "Week", "Year"), row.names = c(2637L, 406L,
543L, 998L, 1400L, 2667L, 1211L, 140L, 737L, 545L, 2573L, 978L,
2119L, 842L, 1866L, 1002L, 1956L, 1229L, 2278L, 1889L, 1285L,
1020L, 964L, 1584L, 2218L, 2792L, 2488L, 954L, 2622L, 2321L,
796L, 501L, 294L, 2476L, 2541L, 642L, 177L, 1222L, 1249L, 990L,
2776L, 580L, 1181L, 1792L, 431L, 224L, 214L, 679L, 1601L, 1655L,
645L, 2785L, 1507L, 1580L, 1274L, 2083L, 157L, 2491L, 2733L,
1533L, 2332L, 328L, 1995L, 1598L, 2452L, 2267L, 1408L, 2602L,
2489L, 2771L, 2323L, 1714L, 907L, 1522L, 882L, 2727L, 844L, 2105L,
253L, 1160L, 2075L, 1435L, 821L, 1284L, 2406L, 2357L, 1499L,
2145L, 1539L, 1890L, 1856L, 27L, 887L, 1500L, 812L, 1677L, 1965L,
2580L, 823L, 1482L), class = "data.frame")
尝试像这样使用 mean
而不是 sum
ggplot(data = df, aes(x = Month, y = Count.V)) +
stat_summary(fun.y = mean, geom ="line")+
stat_smooth(method = "lm", formula = y ~ poly(x, 3), size = 1) +
geom_point()+
scale_x_date(labels = date_format("%m-%y"), breaks = "3 months")
我有一个以每月文章频率为 y 轴的时间序列图。数据如下所示:
Count.V Date Month Week Year
2637 6 2006-01-02 2006-01-01 2006-01-02 2006-01-01
406 4 2006-01-03 2006-01-01 2006-01-02 2006-01-01
543 4 2006-01-04 2006-01-01 2006-01-02 2006-01-01
998 3 2006-01-05 2006-01-01 2006-01-02 2006-01-01
1400 4 2006-01-06 2006-01-01 2006-01-02 2006-01-01
2218 4 2006-02-01 2006-02-01 2006-01-30 2006-01-01
2792 6 2006-02-02 2006-02-01 2006-01-30 2006-01-01
2488 10 2006-02-03 2006-02-01 2006-01-30 2006-01-01
954 8 2006-02-04 2006-02-01 2006-01-30 2006-01-01
2622 3 2006-02-06 2006-02-01 2006-02-06 2006-01-01
2321 11 2006-02-07 2006-02-01 2006-02-06 2006-01-01
2452 10 2006-03-21 2006-03-01 2006-03-20 2006-01-01
2267 5 2006-03-22 2006-03-01 2006-03-20 2006-01-01
1408 3 2006-03-23 2006-03-01 2006-03-20 2006-01-01
2602 3 2006-03-24 2006-03-01 2006-03-20 2006-01-01
2489 5 2006-03-25 2006-03-01 2006-03-20 2006-01-01
2771 1 2006-03-27 2006-03-01 2006-03-27 2006-01-01
我使用 ggplot2 绘制它:
MyPlot <- ggplot(data = df, aes(x = Month, y = Count.V)) + stat_summary(fun.y = sum, geom ="line") + scale_x_date(
labels = date_format("%m-%y"),
breaks = "3 months")
然而,当我尝试将多项式曲线拟合到数据时,例如
MyPlot + stat_smooth(method = "lm", formula = y ~ poly(x, 3), size = 1)
出现问题:
我做错了什么?
编辑: 添加了多个月份的数据框部分:
> dput(df)
structure(list(Count.V = c(6L, 4L, 4L, 3L, 4L, 5L, 2L, 8L, 6L,
5L, 12L, 1L, 2L, 3L, 4L, 2L, 2L, 4L, 4L, 4L, 6L, 6L, 2L, 4L,
4L, 6L, 10L, 8L, 3L, 11L, 8L, 13L, 3L, 9L, 7L, 4L, 7L, 9L, 5L,
4L, 5L, 6L, 5L, 9L, 5L, 11L, 4L, 6L, 2L, 8L, 3L, 5L, 4L, 3L,
5L, 4L, 2L, 3L, 3L, 3L, 8L, 6L, 1L, 3L, 10L, 5L, 3L, 3L, 5L,
1L, 8L, 4L, 3L, 2L, 1L, 4L, 4L, 4L, 5L, 7L, 8L, 3L, 4L, 7L, 5L,
3L, 3L, 4L, 6L, 3L, 2L, 3L, 2L, 5L, 6L, 4L, 5L, 8L, 3L, 4L),
Date = structure(c(13150, 13151, 13152, 13153, 13154, 13155,
13157, 13158, 13159, 13161, 13162, 13164, 13165, 13166, 13168,
13169, 13171, 13172, 13173, 13174, 13175, 13176, 13178, 13179,
13180, 13181, 13182, 13183, 13185, 13186, 13187, 13188, 13189,
13190, 13192, 13193, 13194, 13195, 13196, 13197, 13199, 13200,
13201, 13202, 13203, 13204, 13206, 13207, 13208, 13209, 13210,
13211, 13214, 13215, 13216, 13217, 13218, 13220, 13221, 13222,
13223, 13224, 13225, 13227, 13228, 13229, 13230, 13231, 13232,
13234, 13235, 13236, 13237, 13238, 13239, 13241, 13242, 13243,
13244, 13245, 13246, 13248, 13249, 13250, 13251, 13252, 13253,
13256, 13257, 13258, 13259, 13260, 13262, 13263, 13264, 13265,
13266, 13267, 13270, 13271), class = "Date"), Month = structure(c(13149,
13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149,
13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149,
13149, 13149, 13149, 13149, 13149, 13180, 13180, 13180, 13180,
13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180,
13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180,
13180, 13180, 13208, 13208, 13208, 13208, 13208, 13208, 13208,
13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208,
13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208,
13208, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239,
13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239,
13239, 13239, 13239, 13239, 13239, 13239, 13239, 13269, 13269
), class = "Date"), Week = structure(c(13150, 13150, 13150,
13150, 13150, 13150, 13157, 13157, 13157, 13157, 13157, 13164,
13164, 13164, 13164, 13164, 13171, 13171, 13171, 13171, 13171,
13171, 13178, 13178, 13178, 13178, 13178, 13178, 13185, 13185,
13185, 13185, 13185, 13185, 13192, 13192, 13192, 13192, 13192,
13192, 13199, 13199, 13199, 13199, 13199, 13199, 13206, 13206,
13206, 13206, 13206, 13206, 13213, 13213, 13213, 13213, 13213,
13220, 13220, 13220, 13220, 13220, 13220, 13227, 13227, 13227,
13227, 13227, 13227, 13234, 13234, 13234, 13234, 13234, 13234,
13241, 13241, 13241, 13241, 13241, 13241, 13248, 13248, 13248,
13248, 13248, 13248, 13255, 13255, 13255, 13255, 13255, 13262,
13262, 13262, 13262, 13262, 13262, 13269, 13269), class = "Date"),
Year = structure(c(13149, 13149, 13149, 13149, 13149, 13149,
13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149,
13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149,
13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149,
13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149,
13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149,
13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149,
13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149,
13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149,
13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149,
13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149,
13149, 13149, 13149, 13149), class = "Date")), .Names = c("Count.V",
"Date", "Month", "Week", "Year"), row.names = c(2637L, 406L,
543L, 998L, 1400L, 2667L, 1211L, 140L, 737L, 545L, 2573L, 978L,
2119L, 842L, 1866L, 1002L, 1956L, 1229L, 2278L, 1889L, 1285L,
1020L, 964L, 1584L, 2218L, 2792L, 2488L, 954L, 2622L, 2321L,
796L, 501L, 294L, 2476L, 2541L, 642L, 177L, 1222L, 1249L, 990L,
2776L, 580L, 1181L, 1792L, 431L, 224L, 214L, 679L, 1601L, 1655L,
645L, 2785L, 1507L, 1580L, 1274L, 2083L, 157L, 2491L, 2733L,
1533L, 2332L, 328L, 1995L, 1598L, 2452L, 2267L, 1408L, 2602L,
2489L, 2771L, 2323L, 1714L, 907L, 1522L, 882L, 2727L, 844L, 2105L,
253L, 1160L, 2075L, 1435L, 821L, 1284L, 2406L, 2357L, 1499L,
2145L, 1539L, 1890L, 1856L, 27L, 887L, 1500L, 812L, 1677L, 1965L,
2580L, 823L, 1482L), class = "data.frame")
尝试像这样使用 mean
而不是 sum
ggplot(data = df, aes(x = Month, y = Count.V)) +
stat_summary(fun.y = mean, geom ="line")+
stat_smooth(method = "lm", formula = y ~ poly(x, 3), size = 1) +
geom_point()+
scale_x_date(labels = date_format("%m-%y"), breaks = "3 months")