DataArray不区分大小写匹配即returns匹配的索引值

DataArray case-insensitive match that returns the index value of the match

我在一个函数中有一个 DataFrame:

using DataFrames

myservs = DataFrame(serverName = ["elmo", "bigBird", "Oscar", "gRover", "BERT"],
                    ipAddress = ["12.345.6.7", "12.345.6.8", "12.345.6.9", "12.345.6.10", "12.345.6.11"])
myservs
5x2 DataFrame
| Row | serverName | ipAddress     |
|-----|------------|---------------|
| 1   | "elmo"     | "12.345.6.7"  |
| 2   | "bigBird"  | "12.345.6.8"  |
| 3   | "Oscar"    | "12.345.6.9"  |
| 4   | "gRover"   | "12.345.6.10" |
| 5   | "BERT"     | "12.345.6.11" |

我如何编写函数来获取名为 server 的单个参数,不区分大小写匹配 myservs[:serverName] DataArray 中的 server 参数,以及 return匹配对应的 ipAddress?

在 R 中,这可以通过使用

来完成
myservs$ipAddress[grep("server", myservs$serverName, ignore.case = T)]

我不希望有人使用 ElMoElmo 作为 server,或者 serverName 被保存为 elmoELMO.

这是一种方法:

julia> using DataFrames

julia> myservs = DataFrame(serverName = ["elmo", "bigBird", "Oscar", "gRover", "BERT"],
                           ipAddress = ["12.345.6.7", "12.345.6.8", "12.345.6.9", "12.345.6.10", "12.345.6.11"])
5x2 DataFrames.DataFrame
| Row | serverName | ipAddress     |
|-----|------------|---------------|
| 1   | "elmo"     | "12.345.6.7"  |
| 2   | "bigBird"  | "12.345.6.8"  |
| 3   | "Oscar"    | "12.345.6.9"  |
| 4   | "gRover"   | "12.345.6.10" |
| 5   | "BERT"     | "12.345.6.11" |

julia> grep{T <: String}(pat::String, dat::DataArray{T}, opts::String = "") = Bool[isna(d) ? false : ismatch(Regex(pat, opts), d) for d in dat]
grep (generic function with 2 methods)

julia> myservs[:ipAddress][grep("bigbird", myservs[:serverName], "i")]
1-element DataArrays.DataArray{ASCIIString,1}:
 "12.345.6.8"

编辑

这个 grep 在我的平台上运行得更快。

julia> function grep{T <: String}(pat::String, dat::DataArray{T}, opts::String = "")
           myreg = Regex(pat, opts)
           return convert(Array{Bool}, map(d -> isna(d) ? false : ismatch(myreg, d), dat))
       end

我参考了如何在 R 中完成任务并尝试使用 DataFrames pkg 来完成它,但我这样做只是因为我来自 R 并且正在学习 Julia。我问了很多同事的问题,以下是我们得出的结论:

This task is much cleaner if I was to stop thinking in terms of vectors in R. Julia runs plenty fast iterating through a loop.

Even still, looping wouldn't be the best solution here. I was told to look into Dicts (check here for an example). Dict(), zip(), haskey(), and get() blew my mind. These have many applications.

My solution doesn't even need to use the DataFrames pkg, but instead uses Julia's Matrix and Array data representations. By using let we keep the global environment clutter free and the server name/ip list stays hidden from view to those who are only running the function.

In the sample code, I'm recreating the server matrix every time, but in reality/practice I'll have a permission restricted delimited file that gets read every time. This is OK for now since the delimited files are small, but this may not be efficient or the best way to do it.

# ONLY ALLOW THE FUNCTION TO BE SEEN IN THE GLOBAL ENVIRONMENT
let global myIP

  # SERVER MATRIX
  myservers = ["elmo" "12.345.6.7"; "bigBird" "12.345.6.8";
               "Oscar" "12.345.6.9"; "gRover" "12.345.6.10";
               "BERT" "12.345.6.11"]

  # SERVER DICT
  servDict = Dict(zip(pmap(lowercase, myservers[:, 1]), myservers[:, 2]))

  # GET SERVER IP FUNCTION: INPUT = SERVER NAME; OUTPUT = IP ADDRESS
  function myIP(servername)
    sn = lowercase(servername)
    get(servDict, sn, "That name isn't in the server list.")
  end
end

​# Test it out
myIP("SLIMEY")
​#>​"That name isn't in the server list."

myIP("elMo"​)
#>​"12.345.6.7"