在散列数组中,如何统计 'webpages' 具有最独特 'page' 视图的列表?
In an array of hashes, how to count the list of 'webpages' with most unique 'page' views?
我有一个文本文件,其中包含 IP 访问特定页面的次数,示例:
/help_page/1 126.318.035.038
/contact 184.123.665.067
/home 184.123.665.067
/about/2 444.701.448.104
/help_page/1 929.398.951.889
/index 444.701.448.104
/help_page/1 722.247.931.582
/about 061.945.150.735
/help_page/1 646.865.545.408
/home 235.313.352.950
现在我需要通过解析日志文件来打印一个列表,其中大多数页面浏览量从大多数页面浏览量到较少页面浏览量排序,我已经设法获得了正确的结果。
第二个任务是打印显示独特页面浏览量的网页列表,这里我遇到了几个问题。
下面是打印总页面浏览量的代码,从高到低排序:
require 'open-uri'
log_read = File.read('webserver.log')
split_log = log_read.split("\n/") # split_log = array
split_log[0] = split_log[0].sub('/', '')
split_array = split_log.map { |line| line.split(' ') }
# Most views
container = Hash.new(0) # empty
split_array.each do |item|
container[item[0]] += 1
end
sorted_container = container.sort_by { |_k, v| v }.reverse
# Number of page visits
sorted_container.each do |k, v|
puts "#{k} has #{v} visits"
end
the result of the above code is :
about/2 has 90 visits
contact has 89 visits
index has 82 visits
about has 81 visits
help_page/1 has 80 visits
home has 78 visits
现在是第二部分,我被要求显示具有独特页面浏览量的网页列表,我想像这样映射 'split_array':
sorted_unique_views = split_array.map { |h| h.to_a }.uniq.map { |k, v| { k => v } }
which will give me an array of hashes :
[
{"help_page/1"=>"126.318.035.038"}
{"contact"=>"184.123.665.067"}
{"home"=>"184.123.665.067"}
{"about/2"=>"444.701.448.104"}
{"help_page/1"=>"929.398.951.889"}
{"index"=>"444.701.448.104"}
{"help_page/1"=>"722.247.931.582"}
{"about"=>"061.945.150.735"}
{"help_page/1"=>"646.865.545.408"}
{"home"=>"235.313.352.950"}
{"help_page/1"=>"543.910.244.929"}
....etc ]
我真正想要的是以某种方式遍历 sorted_unique_views=[{...},{...},etc] 并对每个页面对应的唯一 IP 求和,最终结果将看起来像这样:
help_page/1 23
contact 23
home 22
about/2 22
index 23
about 22
我尝试注入,迭代 sorted_unique_views=[{...},{...},etc] ,但我得到:135,这是所有唯一页面的总和意见,或者我得到
{{"help_page/1"=>"126.318.035.038"}=>1}
如果可能的话,我想要一些指导和反馈,如果分裂然后映射的选择对我来说是正确的。
非常感谢
创建测试文件
我们先创建一个文件1.
text =<<-END
/help_page/1 126.318.035.038
/contact 184.123.665.067
/home 184.123.665.067
/about/2 444.701.448.104
/help_page/1 929.398.951.889
/index 444.701.448.104
/help_page/1 722.247.931.582
/about 061.945.150.735
/help_page/1 646.865.545.408
/home 235.313.352.950
END
FNAME = 'log'
File.write(FNAME, text)
#=> 256
确认内容。
puts File.read(FNAME)
/help_page/1 126.318.035.038
/contact 184.123.665.067
/home 184.123.665.067
...
/home 235.313.352.950
读取文件并构造一个有用的散列
h = File.foreach(FNAME).with_object(Hash.new { |h,k| h[k] = [] }) do |line,h|
key, url = line[1..-2].split
h[key] << url
end
#=> {"help_page/1"=>["126.318.035.038", "929.398.951.889", "722.247.931.582",
# "646.865.545.408"],
# "contact" =>["184.123.665.067"],
# "home" =>["184.123.665.067", "235.313.352.950"],
# "about/2" =>["444.701.448.104"],
# "index" =>["444.701.448.104"],
# "about" =>["061.945.150.735"]}
使用此哈希计算感兴趣的对象
确定每个键的查看次数
h.transform_values(&:count)
#=> {"help_page/1"=>4, "contact"=>1, "home"=>2, "about/2"=>1, "index"=>1, "about"=>1}
创建页面浏览量下降列表
h.sort_by { |_,a| -a.size }
#=> [["help_page/1", ["126.318.035.038", "929.398.951.889", "722.247.931.582",
# "646.865.545.408"]],
# ["home", ["184.123.665.067", "235.313.352.950"]],
# ["contact", ["184.123.665.067"]],
# ["about/2", ["444.701.448.104"]],
# ["index", ["444.701.448.104"]],
# ["about", ["061.945.150.735"]]]
或者,根据要求:
h.sort_by { |_,a| -a.size }.to_h
#=> {"help_page/1"=>["126.318.035.038", "929.398.951.889", "722.247.931.582",
# "646.865.545.408"],
# "home" =>["184.123.665.067", "235.313.352.950"],
# "contact" =>["184.123.665.067"],
# "about/2" =>["444.701.448.104"],
# "index" =>["444.701.448.104"],
# "about" =>["061.945.150.735"]}
确定哪些键只被查看过一次
h.select { |_,a| a.size == 1 }
#=> {"contact"=>["184.123.665.067"],
# "about/2"=>["444.701.448.104"],
# "index"=>["444.701.448.104"],
# "about"=>["061.945.150.735"]}
说明
参见 IO::write, IO::read, IO::foreach, Enumerator#with_object, Hash::new, Hash#transform_values, Enumerable#count and Enumerable#sort_by。2
h
的计算也可以这样写。
h = {}
File.foreach(FNAME) do |line|
key, url = line[1..-2].split
h[key] = [] unless h.key?(key)
h[key] << url
end
h
这解释了 .each_object
和 Hash.new { |h,k| h[k] = [] }
。 line[1..-2]
删除行的第一个字符 (/
) 和行尾的换行符 ("\n
)。
h.transform_values(&:count)
是 shorthand 用于:
h.transform_values { |v| v.count }
1.出于格式原因,我将 heredoc 的每一行缩进了 4 个空格以下。要 运行 代码,首先 un-indent heredoc 的行。
2。 Class 和模块方法由 class 或模块和方法名称之间的 double-colon 表示(例如,IO::write
);实例方法由 class 或模块与实例方法之间的井号表示(例如,Enumerator#each_object
)。 IO
方法通常在 class File
上调用(例如,File.foreach ...
而不是 IO.foreach ...
)。这是允许的,因为 File
是 IO
的子 class,因此继承了 IO
的 class 和实例方法。
我有一个文本文件,其中包含 IP 访问特定页面的次数,示例:
/help_page/1 126.318.035.038
/contact 184.123.665.067
/home 184.123.665.067
/about/2 444.701.448.104
/help_page/1 929.398.951.889
/index 444.701.448.104
/help_page/1 722.247.931.582
/about 061.945.150.735
/help_page/1 646.865.545.408
/home 235.313.352.950
现在我需要通过解析日志文件来打印一个列表,其中大多数页面浏览量从大多数页面浏览量到较少页面浏览量排序,我已经设法获得了正确的结果。
第二个任务是打印显示独特页面浏览量的网页列表,这里我遇到了几个问题。
下面是打印总页面浏览量的代码,从高到低排序:
require 'open-uri'
log_read = File.read('webserver.log')
split_log = log_read.split("\n/") # split_log = array
split_log[0] = split_log[0].sub('/', '')
split_array = split_log.map { |line| line.split(' ') }
# Most views
container = Hash.new(0) # empty
split_array.each do |item|
container[item[0]] += 1
end
sorted_container = container.sort_by { |_k, v| v }.reverse
# Number of page visits
sorted_container.each do |k, v|
puts "#{k} has #{v} visits"
end
the result of the above code is :
about/2 has 90 visits
contact has 89 visits
index has 82 visits
about has 81 visits
help_page/1 has 80 visits
home has 78 visits
现在是第二部分,我被要求显示具有独特页面浏览量的网页列表,我想像这样映射 'split_array':
sorted_unique_views = split_array.map { |h| h.to_a }.uniq.map { |k, v| { k => v } }
which will give me an array of hashes :
[
{"help_page/1"=>"126.318.035.038"}
{"contact"=>"184.123.665.067"}
{"home"=>"184.123.665.067"}
{"about/2"=>"444.701.448.104"}
{"help_page/1"=>"929.398.951.889"}
{"index"=>"444.701.448.104"}
{"help_page/1"=>"722.247.931.582"}
{"about"=>"061.945.150.735"}
{"help_page/1"=>"646.865.545.408"}
{"home"=>"235.313.352.950"}
{"help_page/1"=>"543.910.244.929"}
....etc ]
我真正想要的是以某种方式遍历 sorted_unique_views=[{...},{...},etc] 并对每个页面对应的唯一 IP 求和,最终结果将看起来像这样:
help_page/1 23
contact 23
home 22
about/2 22
index 23
about 22
我尝试注入,迭代 sorted_unique_views=[{...},{...},etc] ,但我得到:135,这是所有唯一页面的总和意见,或者我得到
{{"help_page/1"=>"126.318.035.038"}=>1}
如果可能的话,我想要一些指导和反馈,如果分裂然后映射的选择对我来说是正确的。
非常感谢
创建测试文件
我们先创建一个文件1.
text =<<-END
/help_page/1 126.318.035.038
/contact 184.123.665.067
/home 184.123.665.067
/about/2 444.701.448.104
/help_page/1 929.398.951.889
/index 444.701.448.104
/help_page/1 722.247.931.582
/about 061.945.150.735
/help_page/1 646.865.545.408
/home 235.313.352.950
END
FNAME = 'log'
File.write(FNAME, text)
#=> 256
确认内容。
puts File.read(FNAME)
/help_page/1 126.318.035.038
/contact 184.123.665.067
/home 184.123.665.067
...
/home 235.313.352.950
读取文件并构造一个有用的散列
h = File.foreach(FNAME).with_object(Hash.new { |h,k| h[k] = [] }) do |line,h|
key, url = line[1..-2].split
h[key] << url
end
#=> {"help_page/1"=>["126.318.035.038", "929.398.951.889", "722.247.931.582",
# "646.865.545.408"],
# "contact" =>["184.123.665.067"],
# "home" =>["184.123.665.067", "235.313.352.950"],
# "about/2" =>["444.701.448.104"],
# "index" =>["444.701.448.104"],
# "about" =>["061.945.150.735"]}
使用此哈希计算感兴趣的对象
确定每个键的查看次数
h.transform_values(&:count)
#=> {"help_page/1"=>4, "contact"=>1, "home"=>2, "about/2"=>1, "index"=>1, "about"=>1}
创建页面浏览量下降列表
h.sort_by { |_,a| -a.size }
#=> [["help_page/1", ["126.318.035.038", "929.398.951.889", "722.247.931.582",
# "646.865.545.408"]],
# ["home", ["184.123.665.067", "235.313.352.950"]],
# ["contact", ["184.123.665.067"]],
# ["about/2", ["444.701.448.104"]],
# ["index", ["444.701.448.104"]],
# ["about", ["061.945.150.735"]]]
或者,根据要求:
h.sort_by { |_,a| -a.size }.to_h
#=> {"help_page/1"=>["126.318.035.038", "929.398.951.889", "722.247.931.582",
# "646.865.545.408"],
# "home" =>["184.123.665.067", "235.313.352.950"],
# "contact" =>["184.123.665.067"],
# "about/2" =>["444.701.448.104"],
# "index" =>["444.701.448.104"],
# "about" =>["061.945.150.735"]}
确定哪些键只被查看过一次
h.select { |_,a| a.size == 1 }
#=> {"contact"=>["184.123.665.067"],
# "about/2"=>["444.701.448.104"],
# "index"=>["444.701.448.104"],
# "about"=>["061.945.150.735"]}
说明
参见 IO::write, IO::read, IO::foreach, Enumerator#with_object, Hash::new, Hash#transform_values, Enumerable#count and Enumerable#sort_by。2
h
的计算也可以这样写。
h = {}
File.foreach(FNAME) do |line|
key, url = line[1..-2].split
h[key] = [] unless h.key?(key)
h[key] << url
end
h
这解释了 .each_object
和 Hash.new { |h,k| h[k] = [] }
。 line[1..-2]
删除行的第一个字符 (/
) 和行尾的换行符 ("\n
)。
h.transform_values(&:count)
是 shorthand 用于:
h.transform_values { |v| v.count }
1.出于格式原因,我将 heredoc 的每一行缩进了 4 个空格以下。要 运行 代码,首先 un-indent heredoc 的行。
2。 Class 和模块方法由 class 或模块和方法名称之间的 double-colon 表示(例如,IO::write
);实例方法由 class 或模块与实例方法之间的井号表示(例如,Enumerator#each_object
)。 IO
方法通常在 class File
上调用(例如,File.foreach ...
而不是 IO.foreach ...
)。这是允许的,因为 File
是 IO
的子 class,因此继承了 IO
的 class 和实例方法。