R：计算新列（mean/median）

Question

我想计算一列的平均值或中位数，但能够select哪些值是根据另一列计算的。（见下面的数据表）

仅计算百分比列的 mean/median 似乎没问题，但我在基于其他 select 离子执行此操作时遇到了一些麻烦。例如，日期为“2014”的所有条目的百分比中位数。

任何有关如何执行此操作的建议都将不胜感激！如果在 SO 的其他地方已经回答了这个问题，我深表歉意，但我找不到它。

如果需要重现数据，我的代码列在下面。

#Step 1: Load needed library 
library(tidyverse) 
library(rvest) 
library(jsonlite)
library(stringi)
library(dplyr)
library(data.table)
library(ggplot2)

#Step 2: Access the URL of where the data is located
url <- "https://www.forsvarsbygg.no/ListApi/ListContent/78635/SoldEstates/0/10/" 

#Step 3: Direct JSON as format of data in URL 
data <- jsonlite::fromJSON(url, flatten = TRUE) 

#Step 4: Access all items in API 
totalItems <- data$TotalNumberOfItems 

#Step 5: Summarize all data from API 
allData <- paste0('https://www.forsvarsbygg.no/ListApi/ListContent/78635/SoldEstates/0/', totalItems,'/') %>% 
  jsonlite::fromJSON(., flatten = TRUE) %>% 
  .[1] %>% 
  as.data.frame() %>% 
  rename_with(~str_replace(., "ListItems.", ""), everything())

#Step 6: removing colunms not needed
allData <- allData[, -c(1,4,8,9,11,12,13,14,15)]

#Step 7: remove whitespace and change to numeric in columns SoldAmount and Tax
#
allData[c("Tax", "SoldAmount")] <- lapply(allData[c("Tax", "SoldAmount")], function(z) as.numeric(gsub(" ", "", z)))

#Step 8: Remove rows where value is NA 
#
alldata <- allData %>%
  filter(across(where(is.numeric),
                ~ !is.na(.)))

#Step 9: Remove values below 10000 NOK on SoldAmount og Tax.
alldata <- alldata %>%
  filter_all(any_vars(is.numeric(.) & . > 10000))

#Step 10: Calculate percentage change between tax and sold amount and create new column with percent change
#df %>% mutate(Percentage = number/sum(number))
alldata_Percent <- alldata %>% mutate(Percentage = (SoldAmount-Tax)/Tax)

Answer 1

您只是在寻找 dplyr 中的 group_by 和 summarize 吗？

alldata_Percent %>% 
   group_by(Date) %>%
   summarize(median_percent = median(Percentage),
             mean_percent   = mean(Percentage))
## A tibble: 15 x 3
#>    Date  median_percent mean_percent
#>    <chr>          <dbl>        <dbl>
#>  1 1970          0           1.98   
#>  2 2003          0          -0.0345 
#>  3 2004          0           0.141  
#>  4 2005          0.0723      0.156  
#>  5 2006          0.0132      0.204  
#>  6 2007          0.024       0.131  
#>  7 2008          0          -0.00499
#>  8 2009          0.0247      0.0769 
#>  9 2010          0.0340      0.0422 
#> 10 2011          0           0.155  
#> 11 2012          0           0.0103 
#> 12 2013          0           0.0571 
#> 13 2014          0           0.0352 
#> 14 2015          0           0.0646 
#> 15 2016          0          -0.0195

R：计算新列（mean/median）

R: Calculating new column (mean/median)

r

dplyr

calculation