根据 gnuplot 上的数据绘制 "perfect" Zipf 分布

Question

我的目标是拥有一个简单的 .dat 文件，并从中绘制完美 Zipf 分布的实际数据和理论点，即每个项目的值都等于 1 的分布/(等级).

例如，我关注最多的 Instagram 帐户的数据是：

# List of most followed users on instagram
# By rank and millions of followers
# From Wikipedia
# https://en.wikipedia.org/wiki/List_of_most_followed_users_on_Instagram
# rank, millions of followers

1 222
2 120
3 105
4 101
5 101
6 100
7 99 
8 93 
9 86 
10 85
11 80
12 79
13 76
14 73
15 71
16 69
17 67
18 65
19 63
20 63

从另一个线程我了解到我可以只附加一个新列，其中包含每个等级的理想 Zipf 分布值（在本例中为 222、111、74、55.5 等），然后运行第二个图as ,'' using 1:3 但这需要手动进行计算并将其附加到原始文件，那是我试图避免的步骤。这可能吗？我如何将其扩展到其他 distributions/calculations 数据？

Answer 1

用stats计算第二列的最大值用

stats 'file.dat' u 2 nooutput
max = STATS_max

然后你用(max/)

计算Zipf分布

plot 'file.dat' u 1:2 pt 7 t 'data',\
     '' u 1:(max/) w l t 'ideal Zipf'

根据 gnuplot 上的数据绘制 "perfect" Zipf 分布

Plotting a "perfect" Zipf distribution from data on gnuplot

gnuplot

zipf