DataArray不区分大小写匹配即returns匹配的索引值
DataArray case-insensitive match that returns the index value of the match
我在一个函数中有一个 DataFrame:
using DataFrames
myservs = DataFrame(serverName = ["elmo", "bigBird", "Oscar", "gRover", "BERT"],
ipAddress = ["12.345.6.7", "12.345.6.8", "12.345.6.9", "12.345.6.10", "12.345.6.11"])
myservs
5x2 DataFrame
| Row | serverName | ipAddress |
|-----|------------|---------------|
| 1 | "elmo" | "12.345.6.7" |
| 2 | "bigBird" | "12.345.6.8" |
| 3 | "Oscar" | "12.345.6.9" |
| 4 | "gRover" | "12.345.6.10" |
| 5 | "BERT" | "12.345.6.11" |
我如何编写函数来获取名为 server
的单个参数,不区分大小写匹配 myservs[:serverName]
DataArray 中的 server
参数,以及 return匹配对应的 ipAddress
?
在 R 中,这可以通过使用
来完成
myservs$ipAddress[grep("server", myservs$serverName, ignore.case = T)]
我不希望有人使用 ElMo
或 Elmo
作为 server
,或者 serverName
被保存为 elmo
或 ELMO
.
这是一种方法:
julia> using DataFrames
julia> myservs = DataFrame(serverName = ["elmo", "bigBird", "Oscar", "gRover", "BERT"],
ipAddress = ["12.345.6.7", "12.345.6.8", "12.345.6.9", "12.345.6.10", "12.345.6.11"])
5x2 DataFrames.DataFrame
| Row | serverName | ipAddress |
|-----|------------|---------------|
| 1 | "elmo" | "12.345.6.7" |
| 2 | "bigBird" | "12.345.6.8" |
| 3 | "Oscar" | "12.345.6.9" |
| 4 | "gRover" | "12.345.6.10" |
| 5 | "BERT" | "12.345.6.11" |
julia> grep{T <: String}(pat::String, dat::DataArray{T}, opts::String = "") = Bool[isna(d) ? false : ismatch(Regex(pat, opts), d) for d in dat]
grep (generic function with 2 methods)
julia> myservs[:ipAddress][grep("bigbird", myservs[:serverName], "i")]
1-element DataArrays.DataArray{ASCIIString,1}:
"12.345.6.8"
编辑
这个 grep
在我的平台上运行得更快。
julia> function grep{T <: String}(pat::String, dat::DataArray{T}, opts::String = "")
myreg = Regex(pat, opts)
return convert(Array{Bool}, map(d -> isna(d) ? false : ismatch(myreg, d), dat))
end
我参考了如何在 R 中完成任务并尝试使用 DataFrames
pkg 来完成它,但我这样做只是因为我来自 R
并且正在学习 Julia
。我问了很多同事的问题,以下是我们得出的结论:
This task is much cleaner if I was to stop thinking in terms of
vectors
in R
. Julia
runs plenty fast iterating through a loop.
Even still, looping wouldn't be the best solution here. I was told to look into
Dicts (check here for an example). Dict()
, zip()
, haskey()
, and
get()
blew my mind. These have many applications.
My solution doesn't even need to use the DataFrames
pkg, but instead
uses Julia's Matrix
and Array
data representations. By using let
we keep the global environment clutter free and the server name/ip
list stays hidden from view to those who are only running the
function.
In the sample code, I'm recreating the server matrix every time, but in reality/practice I'll have a permission restricted delimited file that gets read every time. This is OK for now since the delimited files are small, but this may not be efficient or the best way to do it.
# ONLY ALLOW THE FUNCTION TO BE SEEN IN THE GLOBAL ENVIRONMENT
let global myIP
# SERVER MATRIX
myservers = ["elmo" "12.345.6.7"; "bigBird" "12.345.6.8";
"Oscar" "12.345.6.9"; "gRover" "12.345.6.10";
"BERT" "12.345.6.11"]
# SERVER DICT
servDict = Dict(zip(pmap(lowercase, myservers[:, 1]), myservers[:, 2]))
# GET SERVER IP FUNCTION: INPUT = SERVER NAME; OUTPUT = IP ADDRESS
function myIP(servername)
sn = lowercase(servername)
get(servDict, sn, "That name isn't in the server list.")
end
end
# Test it out
myIP("SLIMEY")
#>"That name isn't in the server list."
myIP("elMo")
#>"12.345.6.7"
我在一个函数中有一个 DataFrame:
using DataFrames
myservs = DataFrame(serverName = ["elmo", "bigBird", "Oscar", "gRover", "BERT"],
ipAddress = ["12.345.6.7", "12.345.6.8", "12.345.6.9", "12.345.6.10", "12.345.6.11"])
myservs
5x2 DataFrame
| Row | serverName | ipAddress |
|-----|------------|---------------|
| 1 | "elmo" | "12.345.6.7" |
| 2 | "bigBird" | "12.345.6.8" |
| 3 | "Oscar" | "12.345.6.9" |
| 4 | "gRover" | "12.345.6.10" |
| 5 | "BERT" | "12.345.6.11" |
我如何编写函数来获取名为 server
的单个参数,不区分大小写匹配 myservs[:serverName]
DataArray 中的 server
参数,以及 return匹配对应的 ipAddress
?
在 R 中,这可以通过使用
来完成myservs$ipAddress[grep("server", myservs$serverName, ignore.case = T)]
我不希望有人使用 ElMo
或 Elmo
作为 server
,或者 serverName
被保存为 elmo
或 ELMO
.
这是一种方法:
julia> using DataFrames
julia> myservs = DataFrame(serverName = ["elmo", "bigBird", "Oscar", "gRover", "BERT"],
ipAddress = ["12.345.6.7", "12.345.6.8", "12.345.6.9", "12.345.6.10", "12.345.6.11"])
5x2 DataFrames.DataFrame
| Row | serverName | ipAddress |
|-----|------------|---------------|
| 1 | "elmo" | "12.345.6.7" |
| 2 | "bigBird" | "12.345.6.8" |
| 3 | "Oscar" | "12.345.6.9" |
| 4 | "gRover" | "12.345.6.10" |
| 5 | "BERT" | "12.345.6.11" |
julia> grep{T <: String}(pat::String, dat::DataArray{T}, opts::String = "") = Bool[isna(d) ? false : ismatch(Regex(pat, opts), d) for d in dat]
grep (generic function with 2 methods)
julia> myservs[:ipAddress][grep("bigbird", myservs[:serverName], "i")]
1-element DataArrays.DataArray{ASCIIString,1}:
"12.345.6.8"
编辑
这个 grep
在我的平台上运行得更快。
julia> function grep{T <: String}(pat::String, dat::DataArray{T}, opts::String = "")
myreg = Regex(pat, opts)
return convert(Array{Bool}, map(d -> isna(d) ? false : ismatch(myreg, d), dat))
end
我参考了如何在 R 中完成任务并尝试使用 DataFrames
pkg 来完成它,但我这样做只是因为我来自 R
并且正在学习 Julia
。我问了很多同事的问题,以下是我们得出的结论:
This task is much cleaner if I was to stop thinking in terms of
vectors
inR
.Julia
runs plenty fast iterating through a loop.Even still, looping wouldn't be the best solution here. I was told to look into Dicts (check here for an example).
Dict()
,zip()
,haskey()
, andget()
blew my mind. These have many applications.My solution doesn't even need to use the
DataFrames
pkg, but instead uses Julia'sMatrix
andArray
data representations. By usinglet
we keep the global environment clutter free and the server name/ip list stays hidden from view to those who are only running the function.In the sample code, I'm recreating the server matrix every time, but in reality/practice I'll have a permission restricted delimited file that gets read every time. This is OK for now since the delimited files are small, but this may not be efficient or the best way to do it.
# ONLY ALLOW THE FUNCTION TO BE SEEN IN THE GLOBAL ENVIRONMENT
let global myIP
# SERVER MATRIX
myservers = ["elmo" "12.345.6.7"; "bigBird" "12.345.6.8";
"Oscar" "12.345.6.9"; "gRover" "12.345.6.10";
"BERT" "12.345.6.11"]
# SERVER DICT
servDict = Dict(zip(pmap(lowercase, myservers[:, 1]), myservers[:, 2]))
# GET SERVER IP FUNCTION: INPUT = SERVER NAME; OUTPUT = IP ADDRESS
function myIP(servername)
sn = lowercase(servername)
get(servDict, sn, "That name isn't in the server list.")
end
end
# Test it out
myIP("SLIMEY")
#>"That name isn't in the server list."
myIP("elMo")
#>"12.345.6.7"