我可以在不转换相关数据的情况下在 matplotlib 中模拟轴的对数刻度吗?
Can I mimic a log scale of an axis in matplotlib without transforming the associated data?
我正在尝试显示一个 Zipf plot,它通常以对数对数的比例显示。
我正在使用一个库,它以线性比例给出排名,以对数比例给出频率。我有以下代码可以相当正确地绘制我的数据:
ranks = [3541, 60219, 172644, 108926, 733215, 1297533, 1297534, 1297535]
# These frequencies are already log-scale
freqs = [-10.932271003723145, -15.213129043579102, -17.091760635375977, -16.27560806274414,
-19.482173919677734, -19.502029418945312, -19.502029418945312, -19.502029418945312]
data = {
'ranks': ranks,
'freqs': freqs,
}
df = pd.DataFrame(data=data)
_, ax = plt.subplots(figsize=(7, 7))
ax.set(xscale="log", yscale="linear")
ax.set_title("Zipf plot")
sns.regplot("ranks", "freqs", data=df, ax=ax, fit_reg=False)
ax.set_xlabel("Frequency rank of token")
ax.set_ylabel("Absolute frequency of token")
ax.grid(True, which="both")
plt.show()
结果图是:
情节看起来不错,但 y 标签很奇怪。我也希望它以日志增量显示。我目前的解决方法是将 freqs
列表中每个元素的 10 次方提高;即,
freqs = [10**freq for freq in freqs]
# ...
并将ax.set
中的yscale
改为log;即,
_, ax = plt.subplots(figsize=(7, 7))
ax.set(xscale="log", yscale="log")
ax.set_title("Zipf plot")
# ...
这给了我预期的情节(如下),但它需要对数据进行转换,a) 相对昂贵,b) 冗余,c) 有损。
有没有一种方法可以在 matplotlib 图中模拟轴的对数刻度而不 转换数据?
首先评论:我个人更喜欢重新缩放数据的方法,因为它使一切变得更容易,但会花费更多 memory/cpu 时间和准确性应该无关紧要
现在回答这个问题,实际上是 如何在线性轴上模拟对数刻度
解决方案 1:模拟对数刻度
这并不容易。将轴设置为对数刻度在背景中会发生很大变化,需要模仿所有这些。
- 最简单的部分是使用
matplotlib.ticker.MultipleLocator()
将主要刻度标记频率设置为 1
- 在看起来对数的位置创建小刻度线更难。我能想到的最佳解决方案是使用
matplotlib.ticker.FixedLocator()
手动设置它们
- 最后我们需要更改刻度线以表示实际数字,这意味着它们应该看起来像 10^(-x) 而不是 -x。我知道这里有两个选项:
- 使用
FuncFormatter
以科学格式设置值 10**x。
- 使用
FuncFormatter
以 Latex 格式设置值 10^x。这看起来好多了,但与情节的其余部分形成鲜明对比。
对于最后一点,我不知道有什么更好的解决方案,但也许其他人知道。
这是代码及其外观。
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
from matplotlib.ticker import MultipleLocator, FixedLocator, FuncFormatter
###### Locators for Y-axis
# set tickmarks at multiples of 1.
majorLocator = MultipleLocator(1.)
# create custom minor ticklabels at logarithmic positions
ra = np.array([ [n+(1.-np.log10(i))] for n in xrange(10,20) for i in [2,3,4,5,6,7,8,9][::-1]]).flatten()*-1.
minorLocator = FixedLocator(ra)
###### Formatter for Y-axis (chose any of the following two)
# show labels as powers of 10 (looks ugly)
majorFormatter= FuncFormatter(lambda x,p: "{:.1e}".format(10**x) )
# or using MathText (looks nice, but not conform to the rest of the layout)
majorFormatter= FuncFormatter(lambda x,p: r"^{"+"{x:d}".format(x=int(x))+r"}$" )
ranks = [3541, 60219, 172644, 108926, 733215, 1297533, 1297534, 1297535]
# These frequencies are already log-scale
freqs = [-10.932271003723145, -15.213129043579102, -17.091760635375977, -16.27560806274414,
-19.482173919677734, -19.502029418945312, -19.502029418945312, -19.502029418945312]
data = {
'ranks': ranks,
'freqs': freqs,
}
df = pd.DataFrame(data=data)
_, ax = plt.subplots(figsize=(6, 6))
ax.set(xscale="log", yscale="linear")
ax.set_title("Zipf plot")
sns.regplot("ranks", "freqs", data=df, ax=ax, fit_reg=False)
# Set the locators
ax.yaxis.set_major_locator(majorLocator)
ax.yaxis.set_minor_locator(minorLocator)
# Set formatter if you like to have the ticklabels consistently in power notation
ax.yaxis.set_major_formatter(majorFormatter)
ax.set_xlabel("Frequency rank of token")
ax.set_ylabel("Absolute frequency of token")
ax.grid(True, which="both")
plt.show()
解决方案 2:使用不同的轴
一个不同的解决方案,我一开始就没有想到,是使用两个不同的轴,一个带有 loglog 刻度,看起来不错并产生正确的标签和刻度,另一个用于绘制数据.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
ranks = [3541, 60219, 172644, 108926, 733215, 1297533, 1297534, 1297535]
# These frequencies are already log-scale
freqs = [-10.932271003723145, -15.213129043579102, -17.091760635375977, -16.27560806274414,
-19.482173919677734, -19.502029418945312, -19.502029418945312, -19.502029418945312]
data = {
'ranks': ranks,
'freqs': freqs,
}
df = pd.DataFrame(data=data)
fig, ax = plt.subplots(figsize=(6, 6))
# use 2 axes
# ax is the log, log scale which produces nice labels and ticks
ax.set(xscale="log", yscale="log")
ax.set_title("Zipf plot")
# ax2 is the axes where the values are plottet to
ax2 = ax.twinx()
#plot values to ax2
sns.regplot("ranks", "freqs", data=df, ax=ax2, fit_reg=False)
# set the limits of the log log axis to 10 to the power of the label of ax2
ax.set_ylim(10**np.array(ax2.get_ylim()) )
ax.set_xlabel("Frequency rank of token")
ax.set_ylabel("Absolute frequency of token")
# remove ticklabels and axislabel from ax2
ax2.set_yticklabels([])
ax2.set_ylabel("")
ax.grid(True, which="both")
plt.show()
我正在尝试显示一个 Zipf plot,它通常以对数对数的比例显示。
我正在使用一个库,它以线性比例给出排名,以对数比例给出频率。我有以下代码可以相当正确地绘制我的数据:
ranks = [3541, 60219, 172644, 108926, 733215, 1297533, 1297534, 1297535]
# These frequencies are already log-scale
freqs = [-10.932271003723145, -15.213129043579102, -17.091760635375977, -16.27560806274414,
-19.482173919677734, -19.502029418945312, -19.502029418945312, -19.502029418945312]
data = {
'ranks': ranks,
'freqs': freqs,
}
df = pd.DataFrame(data=data)
_, ax = plt.subplots(figsize=(7, 7))
ax.set(xscale="log", yscale="linear")
ax.set_title("Zipf plot")
sns.regplot("ranks", "freqs", data=df, ax=ax, fit_reg=False)
ax.set_xlabel("Frequency rank of token")
ax.set_ylabel("Absolute frequency of token")
ax.grid(True, which="both")
plt.show()
结果图是:
情节看起来不错,但 y 标签很奇怪。我也希望它以日志增量显示。我目前的解决方法是将 freqs
列表中每个元素的 10 次方提高;即,
freqs = [10**freq for freq in freqs]
# ...
并将ax.set
中的yscale
改为log;即,
_, ax = plt.subplots(figsize=(7, 7))
ax.set(xscale="log", yscale="log")
ax.set_title("Zipf plot")
# ...
这给了我预期的情节(如下),但它需要对数据进行转换,a) 相对昂贵,b) 冗余,c) 有损。
有没有一种方法可以在 matplotlib 图中模拟轴的对数刻度而不 转换数据?
首先评论:我个人更喜欢重新缩放数据的方法,因为它使一切变得更容易,但会花费更多 memory/cpu 时间和准确性应该无关紧要
现在回答这个问题,实际上是 如何在线性轴上模拟对数刻度
解决方案 1:模拟对数刻度
这并不容易。将轴设置为对数刻度在背景中会发生很大变化,需要模仿所有这些。
- 最简单的部分是使用
matplotlib.ticker.MultipleLocator()
将主要刻度标记频率设置为 1
- 在看起来对数的位置创建小刻度线更难。我能想到的最佳解决方案是使用
matplotlib.ticker.FixedLocator()
手动设置它们
- 最后我们需要更改刻度线以表示实际数字,这意味着它们应该看起来像 10^(-x) 而不是 -x。我知道这里有两个选项:
- 使用
FuncFormatter
以科学格式设置值 10**x。 - 使用
FuncFormatter
以 Latex 格式设置值 10^x。这看起来好多了,但与情节的其余部分形成鲜明对比。
- 使用
对于最后一点,我不知道有什么更好的解决方案,但也许其他人知道。
这是代码及其外观。
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
from matplotlib.ticker import MultipleLocator, FixedLocator, FuncFormatter
###### Locators for Y-axis
# set tickmarks at multiples of 1.
majorLocator = MultipleLocator(1.)
# create custom minor ticklabels at logarithmic positions
ra = np.array([ [n+(1.-np.log10(i))] for n in xrange(10,20) for i in [2,3,4,5,6,7,8,9][::-1]]).flatten()*-1.
minorLocator = FixedLocator(ra)
###### Formatter for Y-axis (chose any of the following two)
# show labels as powers of 10 (looks ugly)
majorFormatter= FuncFormatter(lambda x,p: "{:.1e}".format(10**x) )
# or using MathText (looks nice, but not conform to the rest of the layout)
majorFormatter= FuncFormatter(lambda x,p: r"^{"+"{x:d}".format(x=int(x))+r"}$" )
ranks = [3541, 60219, 172644, 108926, 733215, 1297533, 1297534, 1297535]
# These frequencies are already log-scale
freqs = [-10.932271003723145, -15.213129043579102, -17.091760635375977, -16.27560806274414,
-19.482173919677734, -19.502029418945312, -19.502029418945312, -19.502029418945312]
data = {
'ranks': ranks,
'freqs': freqs,
}
df = pd.DataFrame(data=data)
_, ax = plt.subplots(figsize=(6, 6))
ax.set(xscale="log", yscale="linear")
ax.set_title("Zipf plot")
sns.regplot("ranks", "freqs", data=df, ax=ax, fit_reg=False)
# Set the locators
ax.yaxis.set_major_locator(majorLocator)
ax.yaxis.set_minor_locator(minorLocator)
# Set formatter if you like to have the ticklabels consistently in power notation
ax.yaxis.set_major_formatter(majorFormatter)
ax.set_xlabel("Frequency rank of token")
ax.set_ylabel("Absolute frequency of token")
ax.grid(True, which="both")
plt.show()
解决方案 2:使用不同的轴
一个不同的解决方案,我一开始就没有想到,是使用两个不同的轴,一个带有 loglog 刻度,看起来不错并产生正确的标签和刻度,另一个用于绘制数据.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
ranks = [3541, 60219, 172644, 108926, 733215, 1297533, 1297534, 1297535]
# These frequencies are already log-scale
freqs = [-10.932271003723145, -15.213129043579102, -17.091760635375977, -16.27560806274414,
-19.482173919677734, -19.502029418945312, -19.502029418945312, -19.502029418945312]
data = {
'ranks': ranks,
'freqs': freqs,
}
df = pd.DataFrame(data=data)
fig, ax = plt.subplots(figsize=(6, 6))
# use 2 axes
# ax is the log, log scale which produces nice labels and ticks
ax.set(xscale="log", yscale="log")
ax.set_title("Zipf plot")
# ax2 is the axes where the values are plottet to
ax2 = ax.twinx()
#plot values to ax2
sns.regplot("ranks", "freqs", data=df, ax=ax2, fit_reg=False)
# set the limits of the log log axis to 10 to the power of the label of ax2
ax.set_ylim(10**np.array(ax2.get_ylim()) )
ax.set_xlabel("Frequency rank of token")
ax.set_ylabel("Absolute frequency of token")
# remove ticklabels and axislabel from ax2
ax2.set_yticklabels([])
ax2.set_ylabel("")
ax.grid(True, which="both")
plt.show()