从均匀分布中抽样时没有方法匹配 logpdf
no method matching logpdf when sampling from uniform distribution
我正在尝试在 julia 中使用强化学习来教一辆不断向后加速(但初始速度为正)的汽车施加制动,以便在向后移动之前尽可能接近目标距离.
为此,我使用了 POMDPs.jl
和 crux.jl
,它们有很多求解器(我使用的是 DQN)。我会先列出我认为是脚本的相关部分,然后在最后列出更多内容。
为了定义 MDP,我将制动器的初始位置、速度和力设置为在某些值上均匀分布。
@with_kw struct SliderMDP <: MDP{Array{Float32}, Array{Float32}}
x0 = Distributions.Uniform(0., 80.)# Distribution to sample initial position
v0 = Distributions.Uniform(0., 25.) # Distribution to sample initial velocity
d0 = Distributions.Uniform(0., 2.) # Distribution to sample brake force
...
end
我的状态保持 (position, velocity, brake force)
的值,初始状态为:
function POMDPs.initialstate(mdp::SliderMDP)
ImplicitDistribution((rng) -> Float32.([rand(rng, mdp.x0), rand(rng, mdp.v0), rand(rng, mdp.d0)]))
end
然后,我使用 crux.jl
设置了我的 DQN 求解器并调用了一个函数来求解策略
solver_dqn = DQN(π=Q_network(), S=s, N=30000)
policy_dqn = solve(solver_dqn, mdp)
调用 solve()
给我错误 MethodError: no method matching logpdf(::Distributions.Categorical{Float64, Vector{Float64}}, ::Nothing)
。我很确定这是来自初始状态采样,但我不确定为什么或如何修复它。我只是在很短的时间内从各种书籍和在线讲座中学习 RL,因此,对于错误或我设置的模型(或我忘记的任何其他内容)的任何帮助,我们将不胜感激。
更全面的代码:
包:
using POMDPs
using POMDPModelTools
using POMDPPolicies
using POMDPSimulators
using Parameters
using Random
using Crux
using Flux
using Distributions
其余部分:
@with_kw struct SliderMDP <: MDP{Array{Float32}, Array{Float32}}
x0 = Distributions.Uniform(0., 80.)# Distribution to sample initial position
v0 = Distributions.Uniform(0., 25.) # Distribution to sample initial velocity
d0 = Distributions.Uniform(0., 2.) # Distribution to sample brake force
m::Float64 = 1.
tension::Float64 = 3.
dmax::Float64 = 2.
target::Float64 = 80.
dt::Float64 = .05
γ::Float32 = 1.
actions::Vector{Float64} = [-.1, 0., .1]
end
function POMDPs.gen(env::SliderMDP, s, a, rng::AbstractRNG = Random.GLOBAL_RNG)
x, ẋ, d = s
if x >= env.target
a = .1
end
if d+a >= env.dmax || d+a <= 0
a = 0.
end
force = (d + env.tension) * -1
ẍ = force/env.m
# Simulation
x_ = x + env.dt * ẋ
ẋ_ = ẋ + env.dt * ẍ
d_ = d + a
sp = vcat(x_, ẋ_, d_)
reward = abs(env.target - x) * -1
return (sp=sp, r=reward)
end
function POMDPs.initialstate(mdp::SliderMDP)
ImplicitDistribution((rng) -> Float32.([rand(rng, mdp.x0), rand(rng, mdp.v0), rand(rng, mdp.d0)]))
end
POMDPs.isterminal(mdp::SliderMDP, s) = s[2] <= 0
POMDPs.discount(mdp::SliderMDP) = mdp.γ
mdp = SliderMDP();
s = state_space(mdp); # Using Crux.jl
function Q_network()
layer1 = Dense(3, 64, relu)
layer2 = Dense(64, 64, relu)
layer3 = Dense(64, length(3))
return DiscreteNetwork(Chain(layer1, layer2, layer3), [-.1, 0, .1])
end
solver_dqn = DQN(π=Q_network(), S=s, N=30000) # Using Crux.jl
policy_dqn = solve(solver_dqn, mdp) # Error comes here
堆栈跟踪:
policy_dqn
MethodError: no method matching logpdf(::Distributions.Categorical{Float64, Vector{Float64}}, ::Nothing)
Closest candidates are:
logpdf(::Distributions.DiscreteNonParametric, !Matched::Real) at C:\Users\name\.julia\packages\Distributions\Xrm9e\src\univariate\discrete\discretenonparametric.jl:106
logpdf(::Distributions.UnivariateDistribution{S} where S<:Distributions.ValueSupport, !Matched::AbstractArray) at deprecated.jl:70
logpdf(!Matched::POMDPPolicies.PlaybackPolicy, ::Any) at C:\Users\name\.julia\packages\POMDPPolicies\wMOK3\src\playback.jl:34
...
logpdf(::Crux.ObjectCategorical, ::Float32)@utils.jl:16
logpdf(::Crux.DistributionPolicy, ::Vector{Float64}, ::Float32)@policies.jl:305
var"#exploration#133"(::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, ::typeof(Crux.exploration), ::Crux.DistributionPolicy, ::Vector{Float64})@policies.jl:302
exploration@policies.jl:297[inlined]
action(::Crux.DistributionPolicy, ::Vector{Float64})@policies.jl:294
var"#exploration#136"(::Crux.DiscreteNetwork, ::Int64, ::typeof(Crux.exploration), ::Crux.MixedPolicy, ::Vector{Float64})@policies.jl:326
var"#step!#173"(::Bool, ::Int64, ::typeof(Crux.step!), ::Dict{Symbol, Array}, ::Int64, ::Crux.Sampler{Main.workspace#2.SliderMDP, Vector{Float32}, Crux.DiscreteNetwork, Crux.ContinuousSpace{Tuple{Int64}}, Crux.DiscreteSpace})@sampler.jl:55
var"#steps!#174"(::Int64, ::Bool, ::Int64, ::Bool, ::Bool, ::Bool, ::typeof(Crux.steps!), ::Crux.Sampler{Main.workspace#2.SliderMDP, Vector{Float32}, Crux.DiscreteNetwork, Crux.ContinuousSpace{Tuple{Int64}}, Crux.DiscreteSpace})@sampler.jl:108
var"#fillto!#177"(::Int64, ::Bool, ::typeof(Crux.fillto!), ::Crux.ExperienceBuffer{Array}, ::Crux.Sampler{Main.workspace#2.SliderMDP, Vector{Float32}, Crux.DiscreteNetwork, Crux.ContinuousSpace{Tuple{Int64}}, Crux.DiscreteSpace}, ::Int64)@sampler.jl:156
solve(::Crux.OffPolicySolver, ::Main.workspace#2.SliderMDP)@off_policy.jl:86
top-level scope@Local: 1[inlined]
简答:
将输出向量更改为 Float32
,即 Float32[-.1, 0, .1]
。
长答案:
Crux 在网络的输出值上创建一个 Distribution
,并且在某些时候 (policies.jl:298) samples a random value from it. It then converts this value to a Float32
. Later (utils.jl:15) 它会执行一个 findfirst
以在原始输出数组中找到该值的索引(在分布中存储为 objs
),但是因为原始数组仍然是 Float64
,所以失败并且 returns 一个 nothing
。因此错误。
我相信这(转换采样值而不是 objs
数组 and/or 不使用近似相等检查,即 findfirst(isapprox(x), d.objs)
)是包中的一个错误,并且会鼓励你在 Github.
上提出这个问题
我正在尝试在 julia 中使用强化学习来教一辆不断向后加速(但初始速度为正)的汽车施加制动,以便在向后移动之前尽可能接近目标距离.
为此,我使用了 POMDPs.jl
和 crux.jl
,它们有很多求解器(我使用的是 DQN)。我会先列出我认为是脚本的相关部分,然后在最后列出更多内容。
为了定义 MDP,我将制动器的初始位置、速度和力设置为在某些值上均匀分布。
@with_kw struct SliderMDP <: MDP{Array{Float32}, Array{Float32}}
x0 = Distributions.Uniform(0., 80.)# Distribution to sample initial position
v0 = Distributions.Uniform(0., 25.) # Distribution to sample initial velocity
d0 = Distributions.Uniform(0., 2.) # Distribution to sample brake force
...
end
我的状态保持 (position, velocity, brake force)
的值,初始状态为:
function POMDPs.initialstate(mdp::SliderMDP)
ImplicitDistribution((rng) -> Float32.([rand(rng, mdp.x0), rand(rng, mdp.v0), rand(rng, mdp.d0)]))
end
然后,我使用 crux.jl
设置了我的 DQN 求解器并调用了一个函数来求解策略
solver_dqn = DQN(π=Q_network(), S=s, N=30000)
policy_dqn = solve(solver_dqn, mdp)
调用 solve()
给我错误 MethodError: no method matching logpdf(::Distributions.Categorical{Float64, Vector{Float64}}, ::Nothing)
。我很确定这是来自初始状态采样,但我不确定为什么或如何修复它。我只是在很短的时间内从各种书籍和在线讲座中学习 RL,因此,对于错误或我设置的模型(或我忘记的任何其他内容)的任何帮助,我们将不胜感激。
更全面的代码:
包:
using POMDPs
using POMDPModelTools
using POMDPPolicies
using POMDPSimulators
using Parameters
using Random
using Crux
using Flux
using Distributions
其余部分:
@with_kw struct SliderMDP <: MDP{Array{Float32}, Array{Float32}}
x0 = Distributions.Uniform(0., 80.)# Distribution to sample initial position
v0 = Distributions.Uniform(0., 25.) # Distribution to sample initial velocity
d0 = Distributions.Uniform(0., 2.) # Distribution to sample brake force
m::Float64 = 1.
tension::Float64 = 3.
dmax::Float64 = 2.
target::Float64 = 80.
dt::Float64 = .05
γ::Float32 = 1.
actions::Vector{Float64} = [-.1, 0., .1]
end
function POMDPs.gen(env::SliderMDP, s, a, rng::AbstractRNG = Random.GLOBAL_RNG)
x, ẋ, d = s
if x >= env.target
a = .1
end
if d+a >= env.dmax || d+a <= 0
a = 0.
end
force = (d + env.tension) * -1
ẍ = force/env.m
# Simulation
x_ = x + env.dt * ẋ
ẋ_ = ẋ + env.dt * ẍ
d_ = d + a
sp = vcat(x_, ẋ_, d_)
reward = abs(env.target - x) * -1
return (sp=sp, r=reward)
end
function POMDPs.initialstate(mdp::SliderMDP)
ImplicitDistribution((rng) -> Float32.([rand(rng, mdp.x0), rand(rng, mdp.v0), rand(rng, mdp.d0)]))
end
POMDPs.isterminal(mdp::SliderMDP, s) = s[2] <= 0
POMDPs.discount(mdp::SliderMDP) = mdp.γ
mdp = SliderMDP();
s = state_space(mdp); # Using Crux.jl
function Q_network()
layer1 = Dense(3, 64, relu)
layer2 = Dense(64, 64, relu)
layer3 = Dense(64, length(3))
return DiscreteNetwork(Chain(layer1, layer2, layer3), [-.1, 0, .1])
end
solver_dqn = DQN(π=Q_network(), S=s, N=30000) # Using Crux.jl
policy_dqn = solve(solver_dqn, mdp) # Error comes here
堆栈跟踪:
policy_dqn
MethodError: no method matching logpdf(::Distributions.Categorical{Float64, Vector{Float64}}, ::Nothing)
Closest candidates are:
logpdf(::Distributions.DiscreteNonParametric, !Matched::Real) at C:\Users\name\.julia\packages\Distributions\Xrm9e\src\univariate\discrete\discretenonparametric.jl:106
logpdf(::Distributions.UnivariateDistribution{S} where S<:Distributions.ValueSupport, !Matched::AbstractArray) at deprecated.jl:70
logpdf(!Matched::POMDPPolicies.PlaybackPolicy, ::Any) at C:\Users\name\.julia\packages\POMDPPolicies\wMOK3\src\playback.jl:34
...
logpdf(::Crux.ObjectCategorical, ::Float32)@utils.jl:16
logpdf(::Crux.DistributionPolicy, ::Vector{Float64}, ::Float32)@policies.jl:305
var"#exploration#133"(::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, ::typeof(Crux.exploration), ::Crux.DistributionPolicy, ::Vector{Float64})@policies.jl:302
exploration@policies.jl:297[inlined]
action(::Crux.DistributionPolicy, ::Vector{Float64})@policies.jl:294
var"#exploration#136"(::Crux.DiscreteNetwork, ::Int64, ::typeof(Crux.exploration), ::Crux.MixedPolicy, ::Vector{Float64})@policies.jl:326
var"#step!#173"(::Bool, ::Int64, ::typeof(Crux.step!), ::Dict{Symbol, Array}, ::Int64, ::Crux.Sampler{Main.workspace#2.SliderMDP, Vector{Float32}, Crux.DiscreteNetwork, Crux.ContinuousSpace{Tuple{Int64}}, Crux.DiscreteSpace})@sampler.jl:55
var"#steps!#174"(::Int64, ::Bool, ::Int64, ::Bool, ::Bool, ::Bool, ::typeof(Crux.steps!), ::Crux.Sampler{Main.workspace#2.SliderMDP, Vector{Float32}, Crux.DiscreteNetwork, Crux.ContinuousSpace{Tuple{Int64}}, Crux.DiscreteSpace})@sampler.jl:108
var"#fillto!#177"(::Int64, ::Bool, ::typeof(Crux.fillto!), ::Crux.ExperienceBuffer{Array}, ::Crux.Sampler{Main.workspace#2.SliderMDP, Vector{Float32}, Crux.DiscreteNetwork, Crux.ContinuousSpace{Tuple{Int64}}, Crux.DiscreteSpace}, ::Int64)@sampler.jl:156
solve(::Crux.OffPolicySolver, ::Main.workspace#2.SliderMDP)@off_policy.jl:86
top-level scope@Local: 1[inlined]
简答:
将输出向量更改为 Float32
,即 Float32[-.1, 0, .1]
。
长答案:
Crux 在网络的输出值上创建一个 Distribution
,并且在某些时候 (policies.jl:298) samples a random value from it. It then converts this value to a Float32
. Later (utils.jl:15) 它会执行一个 findfirst
以在原始输出数组中找到该值的索引(在分布中存储为 objs
),但是因为原始数组仍然是 Float64
,所以失败并且 returns 一个 nothing
。因此错误。
我相信这(转换采样值而不是 objs
数组 and/or 不使用近似相等检查,即 findfirst(isapprox(x), d.objs)
)是包中的一个错误,并且会鼓励你在 Github.