SparkR 创建 table 相对频率

SparkR create table RelativeFrequency

您好,我正在研究 SparkR。我正在尝试计算我的数据的 RelativeFrequency。

SmsInt<-fread("smsCallInt.txt")
setnames(SmsInt,c("V1","V2","V3","V4","V5","V6","V7","V8"),
         c("SquareID","TimeInterval","CountryCode","SmsIn","SmsOut","CallIn","CallOut","Internet"))
#Also create a dataFrame from it.
SmsInt$TimeInterval<-as.numeric(SmsInt$TimeInterval)
SmsInt.df<-createDataFrame(sqlContext,SmsInt[1:500,])

str(SmsInt)
    Classes ‘data.table’ and 'data.frame':  2459324 obs. of  8 variables:
 $ SquareID    : int  10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 ...
 $ TimeInterval: num  1.38e+12 1.38e+12 1.38e+12 1.38e+12 1.38e+12 ...
 $ CountryCode : int  0 39 49 0 39 0 39 0 39 49 ...
 $ SmsIn       : num  0.109 1.001 NA 0.193 0.648 ...
 $ SmsOut      : num  NA 1.26 NA NA 1.06 ...
 $ CallIn      : num  NA 0.0876 NA NA 0.1751 ...
 $ CallOut     : num  0.0219 0.2196 NA NA 0.1532 ...
 $ Internet    : num  NA 10.1685 0.0219 NA 11.8671 ...
 - attr(*, ".internal.selfref")=<externalptr> 

我想要做的是从 SmsInt$CountryCode 创建一个 RelativeFrequency。 当我输入 Country<-table(SmsInt$CountryCode)

我收到这个错误:

Errore: class(objId) == "jobj" is not TRUE

我该怎么办?有没有办法手动或使用一些软件包计算它?

我创建了一个算法,但遇到了一些问题。

Country5<-SmsInt$CountryCode[1:90]
UniqueCountry<-unique(Country5)
VectorLen<-c()
Parsed<-c()
Freq<-c()
for(i in 1:length(UniqueCountry)){
    CountryCode.i<-UniqueCountry[i]
    if(CountryCode.i %in% Parsed){
        Vector<-0
        VectorLen[i]<-0
        Freq[i]<-0
    }
    else{
        Vector<-grep(CountryCode.i,Country5)
        Parsed[i]<-CountryCode.i
        VectorLen[i]<-length(Vector)
        Freq[i]<-VectorLen[i]/90
        Vector<-0
    }
}
Vector
VectorLen #92 it needs to be 90
Freq
sum(Freq) #1.022222 needs to be 1

80个全部作品。

好的,我做到了。错误是 grep 函数,所以当我寻找数字 1 时,它在数字 10 上找到了一次,例如。

我post解决方法在这里

RelativeFrequency<-function(DataSet){
  UniqueCountry<-unique(DataSet)
  VectorLen<-c()
  Parsed<-c()
  Freq<-c()
  for(i in 1:length(UniqueCountry)){
    CountryCode.i<-UniqueCountry[i]
    if(CountryCode.i %in% Parsed){
      Vector<-0
      VectorLen[i]<-0
      Freq[i]<-0
    }
    else{
      Vector<-which(DataSet %in% CountryCode.i) 
      Parsed[i]<-CountryCode.i
      VectorLen[i]<-length(Vector)
      Freq[i]<-VectorLen[i]/length(DataSet)
    }
  }
  print("Vector of RelativeFrequency")
  print(Freq)
  print("Frequency Sum (Needs to be 1)")
  print(sum(Freq))
  print("Parsed element ")
  print(Parsed)
  barplot(Freq,names=Parsed,space = 0.7,axisnames = TRUE,las=2)
}