使用 Go 的 "testing/quick" 包生成随机数字和字母字符串

Generating Random String of Numbers and Letters Using Go's "testing/quick" Package

几天来我一直在为这个问题绞尽脑汁,似乎无法弄明白。也许它非常明显,但我似乎无法发现它。我已经阅读了 unicode、UTF-8、UTF-16、规范化等的所有基础知识,但无济于事。希望有人能在这里帮助我...

我正在使用来自 testing/quick package to generate random values for the fields in my data structs, in order to implement the Generator interface 的 Go 的 Value 函数来处理相关结构。具体来说,给定一个 Metadata 结构,我将实现定义如下:

func (m *Metadata) Generate(r *rand.Rand, size int) (value reflect.Value) {
    value = reflect.ValueOf(m).Elem()
    for i := 0; i < value.NumField(); i++ {
        if t, ok := quick.Value(value.Field(i).Type(), r); ok {
            value.Field(i).Set(t)
        }
    }
    return
}

现在,在这样做的过程中,我最终将接收器和 return 值都设置为随机生成的适当类型的值(接收器中的字符串、整数等,以及 reflect.Value 在 returned reflect.Value).

现在,Value 函数的实现声明它将 return 类型 []rune 的内容转换为类型 string。据我所知,这应该允许我使用 runesunicodenorm 包中的函数来定义一个过滤器,过滤掉不属于 [=] 的所有内容43=]'Latin'、'Letter''Number'。我定义了以下过滤器,它使用转换来过滤掉那些字符 rangetable 中不存在的字母(如 unicode 包中所定义):

func runefilter(in reflect.Value) (out reflect.Value) {
    out = in // Make sure you return something
    if in.Kind() == reflect.String {
        instr := in.String()
        t := transform.Chain(norm.NFD, runes.Remove(runes.NotIn(rangetable.Merge(unicode.Letter, unicode.Latin, unicode.Number))), norm.NFC)
        outstr, _, _ := transform.String(t, instr)
        out = reflect.ValueOf(outstr)
    }
    return
}

现在,我想我已经尝试了几乎所有方法,但我总是以一系列远离拉丁语范围的字符串结束,例如:

똿穊

嚶
秓䝏小䮋
ท솲
䂾

ʋᦸ
堮憨ꥆ
併怃
鯮

⓿ꐠ槹黟
踁퓺
俇


쩈詢


欓

所以,谁能解释一下我在这里忽略了什么,以及我如何可以定义 transformer 其中 removes/replaces non-letter/number/latin 个字符,以便我可以使用 Value 按预期运行(但具有 'random' 个字符的较小子集)?

谢谢!

令人困惑的是,Generate 接口需要一个使用类型而不是类型指针的函数。您希望您的类型签名看起来像

func (m Metadata) Generate(r *rand.Rand, size int) (value reflect.Value)

你可以玩这个here。注意:在那个操场上要做的最重要的事情是将生成函数的类型从 m Metadata 切换到 m *Metadata 并看到嗨妈妈!从不打印。

此外,我认为您最好使用自己的类型并使用您要使用的所有字符的列表为该类型编写生成方法。例如:

type LatinString string
const latin = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01233456789"

然后使用生成器

func (l LatinString) Generate(rand *rand.Rand, size int) reflect.Value {
    var buffer bytes.Buffer
    for i := 0; i < size; i++ {
        buffer.WriteString(string(latin[rand.Intn(len(latin))]))
    }
    s := LatinString(buffer.String())
    return reflect.ValueOf(s)
}

playground

编辑:这个库也很酷,谢谢你给我看

The answer to my own question is, it seems, a combination of the answers provided in the comments by @nj_ and @jimb and the answer provided by @benjaminkadish.

In short, the answer boils down to:

  1. "Not such a great idea as you thought it was", or "Bit of an ill-posed question"
  2. "You were using the union of 'Letter', 'Latin' and 'Number' (Letter || Number || Latin), instead of the intersection of 'Latin' with the union of 'Letter' and 'Number' ((Letter || Number) && Latin))

Now for the longer version...

The idea behind me using the testing/quick package is that I wanted random data for (fuzzy) testing of my code. In the past, I've always written the code for doing things like that myself, again and again. This meant a lot of the same code across different projects. Now, I could of course written my own package for it, but it turns out that, even better than that, there's actually a standard package which does just about exactly what I want.

Now, it turns out the package does exactly what I want very well. The codepoints in the strings which it generates are actually random and not just restricted to what we're accustomed to using in everyday life. Now, this is of course exactly the thing which you want in doing fuzzy testing in order to test the code with values outside the usual assumptions.

In practice, that means I'm 运行ning into two problems:

  1. There's some limits on what I would consider reasonable input for a string. Meaning that, in testing the processing of a Name field or a URL field, I can reasonably assume there's not going to be a value like 'James Mc⌢' (let alone 'James Mc') or 'www.site.com', but just 'James McFrown' and 'www.website.com'. Hence, I can't expect a reasonable system to be able to support it. Of course, things shouldn't completely break down, but it also can't be expected to handle the former examples without any problems.
  2. When I filter the generated string on values which one might consider reasonable, the chance of ending up with a valid string is very small. The set of possible characters in the set used by the testing/quick is just so large (0x10FFFF) and the set of reasonable characters so small, you end up with empty strings most of the time.

So, what do we need to take away from this?

So, whilst I hoped to use the standard testing/quick package to replace my often repeated code to generate random data for fuzzy testing, it does this so well that it provides data outside the range of what I would consider reasonable for the code to be able to handle. It seems that the choice, in the end, is to:

  1. Either be able to actually handle all fuzzy options, meaning that if somebody's name is 'Arnold ' ('Arnold Moneybags'), it shouldn't go arse over end. Or...
  2. Use custom/derived types with their own Generator. This means you're going to have to use the derived type instead of the basic type throughout the code. (Comparable to defining a string as wchar_t instead of char in C++ and working with those by default.). Or...
  3. Don't use testing/quick for fuzzy testing, because as soon as you 运行 into a generated string value, you can (and should) get a very random string.

As always, further comments are of course welcome, as it's quite possible I overlooked something.