将 R 中的数据清理并转换为 XTS 的最佳方法
Best way to clean data in R to and convert to XTS
我正在尝试清理从网上下载的一些数据并将其转换为 XTS。我找到了一些关于使用 GREPL 清理数据的 CRAN 的文档,但我想知道是否有比使用 GREPL 更简单的方法来做到这一点。我希望有人能够帮助我使用 GREPL 或 R 中的其他函数清理这些数据的代码。在此先感谢您为我提供的任何帮助。
[1] "{"
[2] " \"Meta Data\": {"
[3] " \"1. Information\": \"Daily Prices (open, high, low, close) and Volumes\","
[4] " \"2. Symbol\": \"MSFT\","
[5] " \"3. Last Refreshed\": \"2017-06-08 15:15:00\","
[6] " \"4. Output Size\": \"Compact\","
[7] " \"5. Time Zone\": \"US/Eastern\""
[8] " },"
[9] " \"2017-01-19\": {"
[10] " \"1. open\": \"62.2400\","
[11] " \"2. high\": \"62.9800\","
[12] " \"3. low\": \"62.1950\","
[13] " \"4. close\": \"62.3000\","
[14] " \"5. volume\": \"18451655\""
[15] " },"
[16] " \"2017-01-18\": {"
[17] " \"1. open\": \"62.6700\","
[18] " \"2. high\": \"62.7000\","
[19] " \"3. low\": \"62.1200\","
[20] " \"4. close\": \"62.5000\","
[21] " \"5. volume\": \"19670102\""
[22] " },"
[23] " \"2017-01-17\": {"
[24] " \"1. open\": \"62.6800\","
[25] " \"2. high\": \"62.7000\","
[26] " \"3. low\": \"62.0300\","
[27] " \"4. close\": \"62.5300\","
[28] " \"5. volume\": \"20663983\""
[29] " }"
[30] " }"
[31] "}"
此数据的最终输出如下所示:
Open High Low Close Volume
2017-01-17 62.68 62.70 62.03 62.53 20663983
2017-01-18 62.67 62.70 62.12 62.50 19670102
2017-01-19 62.24 62.98 62.195 62.30 18451655
作为,您需要做的第一件事是解析JSON。
Lines <-
"{
\"Meta Data\": {
\"1. Information\": \"Daily Prices (open, high, low, close) and Volumes\",
\"2. Symbol\": \"MSFT\",
\"3. Last Refreshed\": \"2017-06-08 15:15:00\",
\"4. Output Size\": \"Compact\",
\"5. Time Zone\": \"US/Eastern\"
},
\"2017-01-19\": {
\"1. open\": \"62.2400\",
\"2. high\": \"62.9800\",
\"3. low\": \"62.1950\",
\"4. close\": \"62.3000\",
\"5. volume\": \"18451655\"
},
\"2017-01-18\": {
\"1. open\": \"62.6700\",
\"2. high\": \"62.7000\",
\"3. low\": \"62.1200\",
\"4. close\": \"62.5000\",
\"5. volume\": \"19670102\"
},
\"2017-01-17\": {
\"1. open\": \"62.6800\",
\"2. high\": \"62.7000\",
\"3. low\": \"62.0300\",
\"4. close\": \"62.5300\",
\"5. volume\": \"20663983\"
}
}"
parsedLines <- jsonlite::fromJSON(Lines)
现在数据处于可用结构中,我们可以开始清理它了。请注意 parsedLines
中的每个元素都是另一个列表。让我们使用 unlist
将它们转换为向量,这样我们将得到一个向量列表而不是列表列表。
parsedLines <- lapply(parsedLines, unlist)
现在您可能已经注意到 parsedLines
中的第一个元素是元数据。我们可以稍后将其附加到最终对象。但首先,让我们 rbind
将所有其他元素放入矩阵中。我们可以使用 do.call
.
对任何长度的列表执行此操作
ohlcv <- do.call(rbind, parsedLines[-1]) # [-1] removes the first element
现在我们可以清理列名并将数据从字符转换为数字。
colnames(ohlcv) <- gsub("^[[:digit:]]\.", "", colnames(ohlcv))
ohlcv <- type.convert(ohlcv)
此时,我会亲自转换为xts对象并附上元数据。但是您可以继续使用 ohlcv
矩阵,将其转换为 data.frame、tibble 等
# convert to xts
x <- as.xts(ohlcv, dateFormat = "Date")
# attach attributes
metadata <- parsedLines[[1]]
names(metadata) <- gsub("[[:digit:]]|\.|[[:space:]]", "", names(metadata))
xtsAttributes(x) <- metadata
# view attributes
str(x)
An 'xts' object on 2017-01-17/2017-01-19 containing:
Data: num [1:3, 1:5] 62.7 62.7 62.2 62.7 62.7 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:5] " open" " high" " low" " close" ...
Indexed by objects of class: [Date] TZ: UTC
xts Attributes:
List of 5
$ Information : chr "Daily Prices (open, high, low, close) and Volumes"
$ Symbol : chr "MSFT"
$ LastRefreshed: chr "2017-06-08 15:15:00"
$ OutputSize : chr "Compact"
$ TimeZone : chr "US/Eastern"
我正在尝试清理从网上下载的一些数据并将其转换为 XTS。我找到了一些关于使用 GREPL 清理数据的 CRAN 的文档,但我想知道是否有比使用 GREPL 更简单的方法来做到这一点。我希望有人能够帮助我使用 GREPL 或 R 中的其他函数清理这些数据的代码。在此先感谢您为我提供的任何帮助。
[1] "{"
[2] " \"Meta Data\": {"
[3] " \"1. Information\": \"Daily Prices (open, high, low, close) and Volumes\","
[4] " \"2. Symbol\": \"MSFT\","
[5] " \"3. Last Refreshed\": \"2017-06-08 15:15:00\","
[6] " \"4. Output Size\": \"Compact\","
[7] " \"5. Time Zone\": \"US/Eastern\""
[8] " },"
[9] " \"2017-01-19\": {"
[10] " \"1. open\": \"62.2400\","
[11] " \"2. high\": \"62.9800\","
[12] " \"3. low\": \"62.1950\","
[13] " \"4. close\": \"62.3000\","
[14] " \"5. volume\": \"18451655\""
[15] " },"
[16] " \"2017-01-18\": {"
[17] " \"1. open\": \"62.6700\","
[18] " \"2. high\": \"62.7000\","
[19] " \"3. low\": \"62.1200\","
[20] " \"4. close\": \"62.5000\","
[21] " \"5. volume\": \"19670102\""
[22] " },"
[23] " \"2017-01-17\": {"
[24] " \"1. open\": \"62.6800\","
[25] " \"2. high\": \"62.7000\","
[26] " \"3. low\": \"62.0300\","
[27] " \"4. close\": \"62.5300\","
[28] " \"5. volume\": \"20663983\""
[29] " }"
[30] " }"
[31] "}"
此数据的最终输出如下所示:
Open High Low Close Volume
2017-01-17 62.68 62.70 62.03 62.53 20663983
2017-01-18 62.67 62.70 62.12 62.50 19670102
2017-01-19 62.24 62.98 62.195 62.30 18451655
作为
Lines <-
"{
\"Meta Data\": {
\"1. Information\": \"Daily Prices (open, high, low, close) and Volumes\",
\"2. Symbol\": \"MSFT\",
\"3. Last Refreshed\": \"2017-06-08 15:15:00\",
\"4. Output Size\": \"Compact\",
\"5. Time Zone\": \"US/Eastern\"
},
\"2017-01-19\": {
\"1. open\": \"62.2400\",
\"2. high\": \"62.9800\",
\"3. low\": \"62.1950\",
\"4. close\": \"62.3000\",
\"5. volume\": \"18451655\"
},
\"2017-01-18\": {
\"1. open\": \"62.6700\",
\"2. high\": \"62.7000\",
\"3. low\": \"62.1200\",
\"4. close\": \"62.5000\",
\"5. volume\": \"19670102\"
},
\"2017-01-17\": {
\"1. open\": \"62.6800\",
\"2. high\": \"62.7000\",
\"3. low\": \"62.0300\",
\"4. close\": \"62.5300\",
\"5. volume\": \"20663983\"
}
}"
parsedLines <- jsonlite::fromJSON(Lines)
现在数据处于可用结构中,我们可以开始清理它了。请注意 parsedLines
中的每个元素都是另一个列表。让我们使用 unlist
将它们转换为向量,这样我们将得到一个向量列表而不是列表列表。
parsedLines <- lapply(parsedLines, unlist)
现在您可能已经注意到 parsedLines
中的第一个元素是元数据。我们可以稍后将其附加到最终对象。但首先,让我们 rbind
将所有其他元素放入矩阵中。我们可以使用 do.call
.
ohlcv <- do.call(rbind, parsedLines[-1]) # [-1] removes the first element
现在我们可以清理列名并将数据从字符转换为数字。
colnames(ohlcv) <- gsub("^[[:digit:]]\.", "", colnames(ohlcv))
ohlcv <- type.convert(ohlcv)
此时,我会亲自转换为xts对象并附上元数据。但是您可以继续使用 ohlcv
矩阵,将其转换为 data.frame、tibble 等
# convert to xts
x <- as.xts(ohlcv, dateFormat = "Date")
# attach attributes
metadata <- parsedLines[[1]]
names(metadata) <- gsub("[[:digit:]]|\.|[[:space:]]", "", names(metadata))
xtsAttributes(x) <- metadata
# view attributes
str(x)
An 'xts' object on 2017-01-17/2017-01-19 containing:
Data: num [1:3, 1:5] 62.7 62.7 62.2 62.7 62.7 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:5] " open" " high" " low" " close" ...
Indexed by objects of class: [Date] TZ: UTC
xts Attributes:
List of 5
$ Information : chr "Daily Prices (open, high, low, close) and Volumes"
$ Symbol : chr "MSFT"
$ LastRefreshed: chr "2017-06-08 15:15:00"
$ OutputSize : chr "Compact"
$ TimeZone : chr "US/Eastern"