将 R 中的数据清理并转换为 XTS 的最佳方法

Best way to clean data in R to and convert to XTS

我正在尝试清理从网上下载的一些数据并将其转换为 XTS。我找到了一些关于使用 GREPL 清理数据的 CRAN 的文档,但我想知道是否有比使用 GREPL 更简单的方法来做到这一点。我希望有人能够帮助我使用 GREPL 或 R 中的其他函数清理这些数据的代码。在此先感谢您为我提供的任何帮助。

  [1] "{"                                                                                 
  [2] "    \"Meta Data\": {"                                                              
  [3] "        \"1. Information\": \"Daily Prices (open, high, low, close) and Volumes\","
  [4] "        \"2. Symbol\": \"MSFT\","                                                  
  [5] "        \"3. Last Refreshed\": \"2017-06-08 15:15:00\","                           
  [6] "        \"4. Output Size\": \"Compact\","                                          
  [7] "        \"5. Time Zone\": \"US/Eastern\""     
  [8] "        },"                                                                        
  [9] "        \"2017-01-19\": {"                                                         
 [10] "            \"1. open\": \"62.2400\","                                             
 [11] "            \"2. high\": \"62.9800\","                                             
 [12] "            \"3. low\": \"62.1950\","                                              
 [13] "            \"4. close\": \"62.3000\","                                            
 [14] "            \"5. volume\": \"18451655\""                                           
 [15] "        },"                                                                        
 [16] "        \"2017-01-18\": {"                                                         
 [17] "            \"1. open\": \"62.6700\","                                             
 [18] "            \"2. high\": \"62.7000\","                                             
 [19] "            \"3. low\": \"62.1200\","                                              
 [20] "            \"4. close\": \"62.5000\","                                            
 [21] "            \"5. volume\": \"19670102\""                                           
 [22] "        },"                                                                        
 [23] "        \"2017-01-17\": {"                                                         
 [24] "            \"1. open\": \"62.6800\","                                             
 [25] "            \"2. high\": \"62.7000\","                                             
 [26] "            \"3. low\": \"62.0300\","                                              
 [27] "            \"4. close\": \"62.5300\","                                            
 [28] "            \"5. volume\": \"20663983\""                                           
 [29] "        }"                                                                         
 [30] "    }"                                                                             
 [31] "}"                                  

此数据的最终输出如下所示:

            Open        High        Low        Close        Volume
2017-01-17  62.68       62.70       62.03       62.53       20663983
2017-01-18  62.67       62.70       62.12       62.50       19670102
2017-01-19  62.24       62.98       62.195      62.30       18451655

作为,您需要做的第一件事是解析JSON。

Lines <-
"{                                                                                 
  \"Meta Data\": {
    \"1. Information\": \"Daily Prices (open, high, low, close) and Volumes\",
    \"2. Symbol\": \"MSFT\",
    \"3. Last Refreshed\": \"2017-06-08 15:15:00\",
    \"4. Output Size\": \"Compact\",
    \"5. Time Zone\": \"US/Eastern\"
  },
  \"2017-01-19\": {
      \"1. open\": \"62.2400\",
      \"2. high\": \"62.9800\",
      \"3. low\": \"62.1950\",
      \"4. close\": \"62.3000\",
      \"5. volume\": \"18451655\"
  },
  \"2017-01-18\": {
      \"1. open\": \"62.6700\",
      \"2. high\": \"62.7000\",
      \"3. low\": \"62.1200\",
      \"4. close\": \"62.5000\",
      \"5. volume\": \"19670102\"
  },
  \"2017-01-17\": {
      \"1. open\": \"62.6800\",
      \"2. high\": \"62.7000\",
      \"3. low\": \"62.0300\",
      \"4. close\": \"62.5300\",
      \"5. volume\": \"20663983\"
  }
}"
parsedLines <- jsonlite::fromJSON(Lines)

现在数据处于可用结构中,我们可以开始清理它了。请注意 parsedLines 中的每个元素都是另一个列表。让我们使用 unlist 将它们转换为向量,这样我们将得到一个向量列表而不是列表列表。

parsedLines <- lapply(parsedLines, unlist)

现在您可能已经注意到 parsedLines 中的第一个元素是元数据。我们可以稍后将其附加到最终对象。但首先,让我们 rbind 将所有其他元素放入矩阵中。我们可以使用 do.call.

对任何长度的列表执行此操作
 ohlcv <- do.call(rbind, parsedLines[-1])  # [-1] removes the first element

现在我们可以清理列名并将数据从字符转换为数字。

colnames(ohlcv) <- gsub("^[[:digit:]]\.", "", colnames(ohlcv))
ohlcv <- type.convert(ohlcv)

此时,我会亲自转换为xts对象并附上元数据。但是您可以继续使用 ohlcv 矩阵,将其转换为 data.frame、tibble 等

# convert to xts
x <- as.xts(ohlcv, dateFormat = "Date")
# attach attributes
metadata <- parsedLines[[1]]
names(metadata) <- gsub("[[:digit:]]|\.|[[:space:]]", "", names(metadata))
xtsAttributes(x) <- metadata
# view attributes
str(x)

An 'xts' object on 2017-01-17/2017-01-19 containing:
  Data: num [1:3, 1:5] 62.7 62.7 62.2 62.7 62.7 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:5] " open" " high" " low" " close" ...
  Indexed by objects of class: [Date] TZ: UTC
  xts Attributes:  
List of 5
 $ Information  : chr "Daily Prices (open, high, low, close) and Volumes"
 $ Symbol       : chr "MSFT"
 $ LastRefreshed: chr "2017-06-08 15:15:00"
 $ OutputSize   : chr "Compact"
 $ TimeZone     : chr "US/Eastern"