使用 MySQL 计算 IP 地址列的变化

Question

我正在尝试检测使用代理滥用我网站的人。

他们经常会更换代理等等。但是肯定有一种他们多次使用一个代理地址的模式。远远超过合法访问者的正常水平。

通常访问我的网站的大部分是通过只访问过一次或几次的唯一 IP 地址。不重复。

假设我在一列中有这些 IP 地址：

89.46.74.56
89.46.74.56
89.46.74.56
91.14.37.249
104.233.103.6

这意味着 5 个中有 3 个唯一值。给出 "uniqueness score" 的 60%。

我如何使用 MySQL 有效地计算它？

Answer 1

计划

get count grouping by ip

divide by ( cross-joining ) the total rowcount

take maximum repeat ratio from above

设置

create table example
(
  id integer primary key auto_increment not null,
  ip varchar(13) not null
);

insert into example
( ip )
values
( '89.46.74.56'   ),
( '89.46.74.56'   ),
( '89.46.74.56'   ),
( '91.14.37.249'  ),
( '104.233.103.6' )
;

查询

select max(repeat_factor)
from
(
select ip, count(*) / rc.row_count as repeat_factor
from example
cross join ( select count(*) as row_count from example ) rc
group by ip
) q
;

输出

+--------------------+
| max(repeat_factor) |
+--------------------+
| 0.6                |
+--------------------+

sqlfiddle

使用 MySQL 计算 IP 地址列的变化

Calculate variation of IP addresses column using MySQL

mysql

ip

ip-address

variance

standard-deviation