为什么对 javascript 字符串排序比对数字排序更快?
Why is it faster to sort javascript strings than numbers?
我正在调查我的先入之见,即在 javascript 中对字符串进行排序比对整数进行排序要慢。这是基于我读过的东西(现在找不到),它似乎是错误的,它指出 javascript 将字符串存储为 Array<Array<int>>
而不仅仅是 Array<int>
。 MDN documentation 似乎与此矛盾:
JavaScript's String type is used to represent textual data. It is a set of "elements" of 16-bit unsigned integer values. Each element in the String occupies a position in the String. The first element is at index 0, the next at index 1, and so on. The length of a String is the number of elements in it.
如果我们将元素(数字或字符串)的 "size" 定义为其文本表示的长度(因此 size = String(x).length
用于数字元素或字符串元素),那么对于相同大小的大数组元素(一个数字和一个字符串),我期望字符串的排序 等于或稍慢 比数组排序,但是当我运行 一个简单的测试(下面的代码),事实证明字符串的排序速度大约是原来的两倍。
我想知道字符串和数字是什么,以及 javascript 如何进行排序,这使得字符串排序比数字排序更快。可能是我理解错了。
结果:
~/sandbox > node strings-vs-ints.js 10000 16
Sorting 10000 numbers of magnitude 10^16
Sorting 10000 strings of length 16
Numbers: 18
Strings: 9
~/sandbox > node strings-vs-ints.js 1000000 16
Sorting 1000000 numbers of magnitude 10^16
Sorting 1000000 strings of length 16
Numbers: 3418
Strings: 1529
~/sandbox > node strings-vs-ints.js 1000000 32
Sorting 1000000 numbers of magnitude 10^32
Sorting 1000000 strings of length 32
Numbers: 3634
Strings: 1474
来源:
"use strict";
const CHARSET = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghjijklmnopqrstuvwxyz0123456789:.";
function generateString(L) {
const chars = [];
while(chars.length < L) {
chars.push(CHARSET[Math.floor(Math.random() * CHARSET.length)]);
}
return chars.join("");
}
function generateNumber(L) {
return Math.floor(Math.random() * Math.pow(10, (L - 1))) + Math.pow(10, L - 1);
}
function generateList(generator, L, N) {
const elements = [];
while(elements.length < N) {
elements.push(generator.call(null, L));
}
return elements;
}
function now() {
return Date.now();
}
function getTime(baseTime) {
return now() - baseTime;
}
function main(count, size) {
console.log(`Sorting ${count} numbers of magnitude 10^${size}`);
const numbers = generateList(generateNumber, size, count);
const numBaseTime = now();
numbers.sort();
const numTime = getTime(numBaseTime);
console.log(`Sorting ${count} strings of length ${size}`);
const strings = generateList(generateString, size, count);
const strBaseTime = now();
strings.sort();
const strTime = getTime(strBaseTime);
console.log(`Numbers: ${numTime}\nStrings: ${strTime}`);
}
main(process.argv[2], process.argv[3]);
I was investigating a preconception I had that sorting strings in javascript would be slower than sorting integers.
的确如此,字符串比较比数字比较成本更高。
This is based on something I read which stated that javascript stores strings as Array<Array<int>>
instead of just Array<int>
. The MDN documentation seems to contradict this.
是的,你看的好像确实有误。字符串只是字符序列(每个字符都是一个 16 位值),因此它们通常存储为整数数组,或者 pointers to them。你的字符串数组确实可以被视为数组的数组。
When I ran a simple test, it turned out that the strings were about twice as fast to sort.
您的代码存在的问题是您将数字作为字符串进行排序,这会将每个数字转换为一个字符串,然后进行比较。参见 How to sort an array of integers correctly。当你解决这个问题时,请注意对比较函数的调用仍然对内置字符串比较有相当多的开销,所以如果你真的对关系运算符(<
、==
、>
) 在不同的类型上我希望数字表现得更好。
我正在调查我的先入之见,即在 javascript 中对字符串进行排序比对整数进行排序要慢。这是基于我读过的东西(现在找不到),它似乎是错误的,它指出 javascript 将字符串存储为 Array<Array<int>>
而不仅仅是 Array<int>
。 MDN documentation 似乎与此矛盾:
JavaScript's String type is used to represent textual data. It is a set of "elements" of 16-bit unsigned integer values. Each element in the String occupies a position in the String. The first element is at index 0, the next at index 1, and so on. The length of a String is the number of elements in it.
如果我们将元素(数字或字符串)的 "size" 定义为其文本表示的长度(因此 size = String(x).length
用于数字元素或字符串元素),那么对于相同大小的大数组元素(一个数字和一个字符串),我期望字符串的排序 等于或稍慢 比数组排序,但是当我运行 一个简单的测试(下面的代码),事实证明字符串的排序速度大约是原来的两倍。
我想知道字符串和数字是什么,以及 javascript 如何进行排序,这使得字符串排序比数字排序更快。可能是我理解错了。
结果:
~/sandbox > node strings-vs-ints.js 10000 16
Sorting 10000 numbers of magnitude 10^16
Sorting 10000 strings of length 16
Numbers: 18
Strings: 9
~/sandbox > node strings-vs-ints.js 1000000 16
Sorting 1000000 numbers of magnitude 10^16
Sorting 1000000 strings of length 16
Numbers: 3418
Strings: 1529
~/sandbox > node strings-vs-ints.js 1000000 32
Sorting 1000000 numbers of magnitude 10^32
Sorting 1000000 strings of length 32
Numbers: 3634
Strings: 1474
来源:
"use strict";
const CHARSET = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghjijklmnopqrstuvwxyz0123456789:.";
function generateString(L) {
const chars = [];
while(chars.length < L) {
chars.push(CHARSET[Math.floor(Math.random() * CHARSET.length)]);
}
return chars.join("");
}
function generateNumber(L) {
return Math.floor(Math.random() * Math.pow(10, (L - 1))) + Math.pow(10, L - 1);
}
function generateList(generator, L, N) {
const elements = [];
while(elements.length < N) {
elements.push(generator.call(null, L));
}
return elements;
}
function now() {
return Date.now();
}
function getTime(baseTime) {
return now() - baseTime;
}
function main(count, size) {
console.log(`Sorting ${count} numbers of magnitude 10^${size}`);
const numbers = generateList(generateNumber, size, count);
const numBaseTime = now();
numbers.sort();
const numTime = getTime(numBaseTime);
console.log(`Sorting ${count} strings of length ${size}`);
const strings = generateList(generateString, size, count);
const strBaseTime = now();
strings.sort();
const strTime = getTime(strBaseTime);
console.log(`Numbers: ${numTime}\nStrings: ${strTime}`);
}
main(process.argv[2], process.argv[3]);
I was investigating a preconception I had that sorting strings in javascript would be slower than sorting integers.
的确如此,字符串比较比数字比较成本更高。
This is based on something I read which stated that javascript stores strings as
Array<Array<int>>
instead of justArray<int>
. The MDN documentation seems to contradict this.
是的,你看的好像确实有误。字符串只是字符序列(每个字符都是一个 16 位值),因此它们通常存储为整数数组,或者 pointers to them。你的字符串数组确实可以被视为数组的数组。
When I ran a simple test, it turned out that the strings were about twice as fast to sort.
您的代码存在的问题是您将数字作为字符串进行排序,这会将每个数字转换为一个字符串,然后进行比较。参见 How to sort an array of integers correctly。当你解决这个问题时,请注意对比较函数的调用仍然对内置字符串比较有相当多的开销,所以如果你真的对关系运算符(<
、==
、>
) 在不同的类型上我希望数字表现得更好。