使 ULID 字典顺序对时间更敏感

Make ULID lexicographic ordering more sensitive to time

我在一个项目中使用了 this ULID example,我不仅需要 ULID 提供的唯一性,还需要它的字典排序能力。

然而,我发现无论我尝试了多少,我都无法对循环生成的 ID 进行排序

例如

class Test{
    public static void main(String[] args) {
            ArrayList<String> ulids = new ArrayList<>();
    
            for (int i = 0; i < 10; i++) {
               
                ulids.add(ULID.generate());
            }
    
            System.out.println("Original:\n..." + ulids);
    
            Collections.shuffle(ulids);
    
            System.out.println("Shuffled:\n..." + ulids);
    
            ulids.sort(new Comparator<String>() {
                @Override
                public int compare(String o1, String o2) {
                    return o1.compareTo(o2);
                }
            });
    
            System.out.println("Sorted:\n..." + ulids);
    
        }
}

Sample output:
Original:
...[01edrp4ng81d3mvkp8s7z19znm, 01edrp4ng872nwfj6b9fsxjkkd, 01edrp4ng86v07r6c9sh62ghr7, 01edrp4ng8bpfw3m2q8bynd5st, 01edrp4ng896t1qhsngrz3h251, 01edrp4ng8jne084nsw5saesfe, 01edrp4ng8w8qz9qtgy3958r1v, 01edrp4ng8fdn30qnr2ktddyz4, 01edrp4ng8ekj0vt393tw12x8j, 01edrp4ng80wacxxskgej5d8mm]
Shuffled:
...[01edrp4ng896t1qhsngrz3h251, 01edrp4ng8w8qz9qtgy3958r1v, 01edrp4ng86v07r6c9sh62ghr7, 01edrp4ng8bpfw3m2q8bynd5st, 01edrp4ng8fdn30qnr2ktddyz4, 01edrp4ng80wacxxskgej5d8mm, 01edrp4ng872nwfj6b9fsxjkkd, 01edrp4ng81d3mvkp8s7z19znm, 01edrp4ng8jne084nsw5saesfe, 01edrp4ng8ekj0vt393tw12x8j]
Sorted:
...[01edrp4ng80wacxxskgej5d8mm, 01edrp4ng81d3mvkp8s7z19znm, 01edrp4ng86v07r6c9sh62ghr7, 01edrp4ng872nwfj6b9fsxjkkd, 01edrp4ng896t1qhsngrz3h251, 01edrp4ng8bpfw3m2q8bynd5st, 01edrp4ng8ekj0vt393tw12x8j, 01edrp4ng8fdn30qnr2ktddyz4, 01edrp4ng8jne084nsw5saesfe, 01edrp4ng8w8qz9qtgy3958r1v]

我检查了实现并认为由于时间是生成 ULID 的核心因素,而且由于使用的时间的敏感性是毫秒,即 (System.currentTimeMillis()) ,我可以通过引入一些来对它们进行排序我的 ID 生成循环延迟。

我引入了大约 5 毫秒的延迟,所有 id 都排序了;例如:

class TestWithMsDelay{

   public static void main(String[] args) {
        ArrayList<String> ulids = new ArrayList<>();

        for (int i = 0; i < 10; i++) {
            try {
                Thread.sleep(5L);
                ulids.add(ULID.generate());
            } catch (Exception ex) {
                ex.printStackTrace();
            }
        }

        System.out.println("Original:\n..." + ulids);

        Collections.shuffle(ulids);

        System.out.println("Shuffled:\n..." + ulids);

        ulids.sort(new Comparator<String>() {
            @Override
            public int compare(String o1, String o2) {
                return o1.compareTo(o2);
            }
        });

        System.out.println("Sorted:\n..." + ulids);

    }


}
Sample output:
Original:
...[2rjdme5a5h2ntcd20xq4z487tx, 2rjdme63a23ddsy0km21n6n34a, 2rjdme6pnrenx79zd3jj18est4, 2rjdme70bv45b648p82dbj584n, 2rjdme7d8gx9v9db66ftsxbmqq, 2rjdme7psqdykt24qfymn2e4ba, 2rjdme80as7t1h1rr00m676718, 2rjdme8rztp50bad6ktkhrfhk8, 2rjdme93ngkxkfmf6aegqxer9e, 2rjdme9ea04x22rpx2f3rp5gez]
Shuffled:
...[2rjdme7psqdykt24qfymn2e4ba, 2rjdme6pnrenx79zd3jj18est4, 2rjdme80as7t1h1rr00m676718, 2rjdme63a23ddsy0km21n6n34a, 2rjdme93ngkxkfmf6aegqxer9e, 2rjdme70bv45b648p82dbj584n, 2rjdme9ea04x22rpx2f3rp5gez, 2rjdme8rztp50bad6ktkhrfhk8, 2rjdme7d8gx9v9db66ftsxbmqq, 2rjdme5a5h2ntcd20xq4z487tx]
Sorted:
...[2rjdme5a5h2ntcd20xq4z487tx, 2rjdme63a23ddsy0km21n6n34a, 2rjdme6pnrenx79zd3jj18est4, 2rjdme70bv45b648p82dbj584n, 2rjdme7d8gx9v9db66ftsxbmqq, 2rjdme7psqdykt24qfymn2e4ba, 2rjdme80as7t1h1rr00m676718, 2rjdme8rztp50bad6ktkhrfhk8, 2rjdme93ngkxkfmf6aegqxer9e, 2rjdme9ea04x22rpx2f3rp5gez]

这对我的工作来说不够好...我不想等待任何长度的时间来生成 ulids(即使是 10us - 100us),人为延迟的概念非常困扰我, 哈哈

所以,我修改了ULID.java,把时间源从System.currentTimeMillis()改成了System.nanoTime()

令我惊讶的是,我不再需要循环中的任何时间延迟来使输出 ULID 可排序。

不过我觉得肯定有什么问题;因为 Java 规范警告说 System.nanoTime() 不一定比 System.currentTimeMillis()

更准确

例如 Javadoc for System.nanoTime() ,它说:

此方法提供纳秒级精度,但不一定是纳秒级分辨率(即值更改的频率)- 除了分辨率至少一样好外,不做任何保证作为 currentTimeMillis().

此外,System.nanoTime() 的 Java 文档似乎表明它与纪元无关(System.currentTimeMillis()

我认为这可能会导致在 ULID.java 中使用 System.nanoTime() 而不是 System.currentTimeMillis()

问题

  1. 我的恐惧合理吗
  2. 如果 (1.) 为真,如何在不破坏其优点的情况下将 ULID 的时间灵敏度提高到 1 毫秒以上?

ULID有两部分:时间部分和随机部分。

时间部分是自 1970 年以来的毫秒数。

随机分量更新分两种情况:

  1. 当毫秒变化时,产生一个新的随机值;
  2. 毫秒相同时,随机值加1

您在此处显示的实现不执行第二步。

也许你可以包含一些像这样的代码(只是一个例子):

if (timestamp == previousTimestamp) {
    randomComponent++;
} else {
    randomComponent = RANDOM.nextLong();
}

我发现的另一个问题是它使用 Math.random(),而不是 java.security.SecureRandom。要解决这个问题,这是一个建议:

import java.security.SecureRandom;
private static final RANDOM = new SecureRandom();

最后,不建议使用 System.nanoTime(),因为它 returns 自任意时间点以来的纳秒数。这不是从您主板上的实时时钟 (RTC) 返回的白天时间。此函数用于测量代码中两点之间经过的时间,可能用于基准测试。示例:


long startNanos = System.nanoTime();

// do some expensive tasks here

long endNanos = System.nanoTime();

long elapsedNanos = endNanos - startNanos;

如果您愿意,可以查看图书馆ulid-creator。也许它可以帮助。示例:

// Generate a ULID as UUID
UUID ulid = UlidCreator.getUlid();

// Or generate a ULID as String (Crockford's base32)
String ulid = UlidCreator.getUlidString();

项目页面:https://github.com/f4b6a3/ulid-creator

编辑

对不起。我没有回答问题。

我的恐惧合理吗

是的,你的赖特。

如果(1.)为真,如何在不破坏其优点的情况下将 ULID 的时间灵敏度提高到 1 毫秒以上?

您可以增加 ULID 分辨率,但它不符合 Jimmy Wilson 创建的 ULID Spec (which is not a formal standard like RFC-4122 by the way). The resulting UUID is like a COMB GUID。两者的主要思想是相同的。

您可以为时间戳组件预留更多的位,但它会消耗一些位。例如,如果将时间分量从 48 位增加到 64 位,它将在公元 2262 年左右滚动,但随机分量将从 1208925819614629174706176 (2^80) 减少到 18446744073709551616 (2^64)。如果成本影响到ULID的强项,就看你的项目了。

我刚刚实现了一个具有纳秒分辨率的 ULID 生成器。巧合的是,几天前我正在研究它。使用 System.currentTimeMillis() 方法,它实际上具有毫秒精度。纳秒分辨率是 simulated 在两个后续调用之间使用方法 System.nanoTime()

如果你还打算使用纳秒ULID,欢迎测试:

package your.package.name;

import java.security.SecureRandom;
import java.time.Instant;
import java.util.UUID;

/**
 * Utility class that creates a COMB GUID with nanoseconds resolution.
 * 
 * It borrows the main idea from ULID and COMB generators: a concatenation of
 * time and random bytes. It is composed of 64 bits for time and 64 for random
 * bits.
 * 
 * A Nano COMB has two components:
 * 
 * 1. Time camponent (64 bits): nanoseconds since 1970
 * 
 * 2. Random component (64 bits): a value generated by a secure random
 * generator.
 * 
 * Maximum time component year is ~2262 A.D. (2^63/10^9/60/60/24/365.25 + 1970)
 * 
 * @author: Fabio Lima 2020
 */
public final class NanoCombCreator {

    private long prevTime = 0;
    private long prevNano = 0;

    private static final long ONE_MILLION_NANOSECONDS = 1_000_000L;

    private static final SecureRandom SECURE_RANDOM = new SecureRandom();

    /**
     * Returns a time component in nanoseconds.
     * 
     * It uses `System.currentTimeMillis()` to get the system time in milliseconds
     * accuracy. The nanoseconds resolution is simulated by calling
     * `System.nanoTime()` between subsequent calls within the same millisecond.
     * It's not precise, but it provides some monotonicity to the values generates.
     * 
     * @return the current time in nanoseconds
     */
    private synchronized long getTimeComponent() {

        final long time = System.currentTimeMillis();
        final long nano = System.nanoTime();
        final long elapsed; // nanoseconds since last call

        if (time == prevTime) {
            elapsed = (nano - prevNano);
            if (elapsed > ONE_MILLION_NANOSECONDS) {
                try {
                    // make the clock to catch up
                    Thread.sleep(1);
                } catch (InterruptedException e) {
                    System.err.println("something went wrong...");
                }
            }
        } else {
            prevTime = time;
            prevNano = nano;
            elapsed = 0;
        }

        return (time * ONE_MILLION_NANOSECONDS) + elapsed;
    }

    /**
     * Returns the random component using a secure random generator.
     * 
     * @return a random value.
     */
    private synchronized long getRandomComponent() {
        return SECURE_RANDOM.nextLong();
    }

    /**
     * Returns a Nano COMB.
     * 
     * A Nano COMB is inspired on ULID and COMB generators.
     * 
     * It is composed of 64 bits for time and 64 for random bits.
     * 
     * @return a UUID
     */
    public synchronized UUID create() {

        final long timeBits = getTimeComponent();
        final long randomBits = getRandomComponent();

        return new UUID(timeBits, randomBits);
    }

    /**
     * Test method that generates many Nano COMBs in a loop.
     * 
     * @param args
     */
    public static void main(String[] args) {

        NanoCombCreator creator = new NanoCombCreator();

        for (int i = 0; i < 100; i++) {
            // Generate a Nano COMB
            UUID uuid = creator.create();

            // Extract the milliseconds and nanoseconds
            long milliseconds = uuid.getMostSignificantBits() / ONE_MILLION_NANOSECONDS;
            long nanoseconds = uuid.getMostSignificantBits() & ONE_MILLION_NANOSECONDS;

            // Instantiate an instant using the milliseconds and nanoseconds
            Instant time = Instant.ofEpochMilli(milliseconds).plusNanos(nanoseconds);

            // Print the UUID and the time it was generated (UTC)
            System.out.println("UUID: '" + uuid + "', time: " + time);
        }
    }
}

OUTPUT:

UUID: '16240ee8-3865-1503-d1fb-b4e85f991c6b', time: 2020-07-22T11:15:58.537327680Z
UUID: '16240ee8-3865-f90a-ca19-3ec529750ef7', time: 2020-07-22T11:15:58.537344064Z
UUID: '16240ee8-3866-dd7c-f32f-7acaebcf7766', time: 2020-07-22T11:15:58.537409664Z
UUID: '16240ee8-3868-0a99-3ead-b114e1d61520', time: 2020-07-22T11:15:58.537524800Z
UUID: '16240ee8-3868-efc8-937d-599c72de71a6', time: 2020-07-22T11:15:58.537541248Z
UUID: '16240ee8-386a-3643-6a5e-e3b5e3b03c71', time: 2020-07-22T11:15:58.537655936Z
UUID: '16240ee8-386b-132f-7016-057ab30a2920', time: 2020-07-22T11:15:58.537721408Z
UUID: '16240ee8-386b-f929-d5b0-f70b68aea3d9', time: 2020-07-22T11:15:58.537737280Z