寻找 R 功能将数据框中的数字列拆分为具有 3 个级别的分类变量
Looking for R functionality to split a numerical column in a data frame into a categorical variable with 3 levels
问题
我正在尝试使用逻辑运算符将列出加密货币项目市值的数据列拆分为分类数据(在一列中)。
我尝试过的解决方案
我正在使用 tidyverse、管道运算符和 mutate 来实现带有逻辑运算符的 if-else 语句,并尝试保存为分类变量。无法在网络上找到正确的答案,但我最有可能让编码人员了解我一直在寻找的线程和解决方法。
library(httr)
library(jsonlite)
library(dplyr)
library(ggthemes)
library(ggplot2)
library(ggrepel)
library(googlesheets4)
library(tidyverse)
options(scipen=999)
# Get BTC API results from Nomics
NOMICS_API <- GET("https://api.nomics.com/v1/currencies/ticker?key=YOURKEY&interval=30d,365d&convert=EUR")
# Get json request results as text
NOMICS_API_TEXT <- content(NOMICS_API, "text")
# Make a useful object for R analysis
NOMICS_API_DF <- fromJSON(NOMICS_API_TEXT)
# Unnest nested tables 30d and 365d
DF_RAW <- unnest(NOMICS_API_DF, c("30d","365d"), names_repair = "universal")
# Clean, redefine and filter inactive projects
DF_CLEAN <- DF_RAW %>%
filter(status == "active") %>%
mutate(rank = as.integer(rank),
price = as.numeric(price),
num_pairs = as.integer(num_pairs),
num_exchanges = as.integer(num_exchanges),
circulating_supply = as.numeric(circulating_supply),
max_supply = as.numeric(max_supply),
market_cap = as.numeric(market_cap)/(1*10^9),
market_cap_dominance = as.numeric(market_cap_dominance)*100,
high = as.numeric(high),
high_timestamp = as.Date(high_timestamp),
market_cap_tier = if(market_cap => 50) as.character("Big Cap")
else if (50 > market_cap > 10) as.character("Medium Cap")
else if (10 > market_cap > 0) as.character("Small Cap")
)
非常感谢任何指点!
像下面那样使用case_when
。
管道本身只是一个例子,case_when
做了问题中嵌套的 if/else
试图做的事情。
DF_RAW %>%
mutate(
market_cap_tier = case_when(
market_cap => 50 ~ "Big Cap",
market_cap > 10 & market_cap < 50 ~ "Medium Cap",
market_cap > 0 & market_cap < 10 ~ "Small Cap",
TRUE ~ NA_character_
)
)
问题
我正在尝试使用逻辑运算符将列出加密货币项目市值的数据列拆分为分类数据(在一列中)。
我尝试过的解决方案 我正在使用 tidyverse、管道运算符和 mutate 来实现带有逻辑运算符的 if-else 语句,并尝试保存为分类变量。无法在网络上找到正确的答案,但我最有可能让编码人员了解我一直在寻找的线程和解决方法。
library(httr)
library(jsonlite)
library(dplyr)
library(ggthemes)
library(ggplot2)
library(ggrepel)
library(googlesheets4)
library(tidyverse)
options(scipen=999)
# Get BTC API results from Nomics
NOMICS_API <- GET("https://api.nomics.com/v1/currencies/ticker?key=YOURKEY&interval=30d,365d&convert=EUR")
# Get json request results as text
NOMICS_API_TEXT <- content(NOMICS_API, "text")
# Make a useful object for R analysis
NOMICS_API_DF <- fromJSON(NOMICS_API_TEXT)
# Unnest nested tables 30d and 365d
DF_RAW <- unnest(NOMICS_API_DF, c("30d","365d"), names_repair = "universal")
# Clean, redefine and filter inactive projects
DF_CLEAN <- DF_RAW %>%
filter(status == "active") %>%
mutate(rank = as.integer(rank),
price = as.numeric(price),
num_pairs = as.integer(num_pairs),
num_exchanges = as.integer(num_exchanges),
circulating_supply = as.numeric(circulating_supply),
max_supply = as.numeric(max_supply),
market_cap = as.numeric(market_cap)/(1*10^9),
market_cap_dominance = as.numeric(market_cap_dominance)*100,
high = as.numeric(high),
high_timestamp = as.Date(high_timestamp),
market_cap_tier = if(market_cap => 50) as.character("Big Cap")
else if (50 > market_cap > 10) as.character("Medium Cap")
else if (10 > market_cap > 0) as.character("Small Cap")
)
非常感谢任何指点!
像下面那样使用case_when
。
管道本身只是一个例子,case_when
做了问题中嵌套的 if/else
试图做的事情。
DF_RAW %>%
mutate(
market_cap_tier = case_when(
market_cap => 50 ~ "Big Cap",
market_cap > 10 & market_cap < 50 ~ "Medium Cap",
market_cap > 0 & market_cap < 10 ~ "Small Cap",
TRUE ~ NA_character_
)
)