有没有在没有内存开销的情况下将位存储在 Java 中?
Is there anyway to store bits in Java without memory overhead?
从昨天开始,这引起了我的兴趣和关注。我正在尝试将位存储在 Java 中并受到内存开销的影响。
我的第一个问题是
根据答案,我查看了其他参考资料并找到了 Memory Usage 指南。
然后我查看了 BitSet
看起来像
的源代码
public class BitSet implements Cloneable, java.io.Serializable {
/*
* BitSets are packed into arrays of "words." Currently a word is
* a long, which consists of 64 bits, requiring 6 address bits.
* The choice of word size is determined purely by performance concerns.
*/
private final static int ADDRESS_BITS_PER_WORD = 6;
private final static int BITS_PER_WORD = 1 << ADDRESS_BITS_PER_WORD;
private final static int BIT_INDEX_MASK = BITS_PER_WORD - 1;
/* Used to shift left or right for a partial word mask */
private static final long WORD_MASK = 0xffffffffffffffffL;
/**
* @serialField bits long[]
*
* The bits in this BitSet. The ith bit is stored in bits[i/64] at
* bit position i % 64 (where bit position 0 refers to the least
* significant bit and 63 refers to the most significant bit).
*/
private static final ObjectStreamField[] serialPersistentFields = {
new ObjectStreamField("bits", long[].class),
};
/**
* The internal field corresponding to the serialField "bits".
*/
private long[] words;
/**
* The number of words in the logical size of this BitSet.
*/
private transient int wordsInUse = 0;
/**
* Whether the size of "words" is user-specified. If so, we assume
* the user knows what he's doing and try harder to preserve it.
*/
private transient boolean sizeIsSticky = false;
/* use serialVersionUID from JDK 1.0.2 for interoperability */
private static final long serialVersionUID = 7997698588986878753L;
/**
* Given a bit index, return word index containing it.
*/
private static int wordIndex(int bitIndex) {
return bitIndex >> ADDRESS_BITS_PER_WORD;
}
.....
}
根据Memory Guide
的计算,这是我计算出来的
8 Bytes: housekeeping space
12 Bytes: 3 ints
8 Bytes: long
12 Bytes: long[]
4 Bytes: transient int // does it count?
1 Byte : transient boolean
3 Bytes: padding
这总和为 45 + 3 bytes (padding to reach multiple of 8)
这意味着空 BitSet
本身保留 48 bytes
。
但我的要求是存储位,我错过了什么?我在这里有什么选择?
非常感谢
更新
我的要求是我想在两个单独的字段中存储总计 64 bits
class MyClass{
BitSet timeStamp
BitSet id
}
我想在内存中存储数百万个 MyClass
对象
My requirement is that I want to store total of 64 bits in two
separate fields
所以只使用 long(64 位整数)。并将其用作位域。我曾经需要这样的东西,但 32 位对我来说已经足够了,所以写了一个小库 class 来使用 int 作为位集:
https://github.com/claudemartin/smallset
随意分叉它,只需将 int 替换为 long,将 32 替换为 64,将 1 替换为 1L 等
This sums to 45 + 3 bytes (padding to reach multiple of 8) This means
an empty BitSet itself reserves 48 bytes.
首先,我想推荐正确的工具来分析 JVM 中的对象布局方案 - JOL。在您的情况下 (java -jar jol-cli/target/jol-cli.jar internals java.util.BitSet
) JOL 产生以下结果:
Running 64-bit HotSpot VM.
Using compressed references with 3-bit shift.
Objects are 8 bytes aligned.
Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
java.util.BitSet object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 (00000001 00000000 00000000 00000000) (1)
4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4 (object header) f4 df 9f e0 (11110100 11011111 10011111 11100000) (-526393356)
12 4 int BitSet.wordsInUse 0
16 1 boolean BitSet.sizeIsSticky false
17 3 (alignment/padding gap) N/A
20 4 long[] BitSet.words [0]
Instance size: 24 bytes (reported by Instrumentation API)
Space losses: 3 bytes internal + 0 bytes external = 3 bytes total
由于静态字段,您的计算不正确,因此空 BitSet
本身保留 24 个字节。请注意,这些计算并非 100% 准确,因为它没有考虑 long[]
对象的大小。所以正确的结果是 java -jar jol-cli/target/jol-cli.jar externals java.util.BitSet
:
Running 64-bit HotSpot VM.
Using compressed references with 3-bit shift.
Objects are 8 bytes aligned.
Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
java.util.BitSet@6b25f76bd object externals:
ADDRESS SIZE TYPE PATH VALUE
7ae321a48 24 java.util.BitSet (object)
7ae321a60 24 [J .words [0]
这意味着一个空的 BitSet 本身使用 48 个字节,包括长数组。
为了优化内存占用,您可以编写自己的 BitSet
实现。例如,在您的用例中,可以使用以下选项:
public class MyOwnBitSet {
long word1;
long word2;
}
public class MyOwnBitSet2 {
long[] word = new long[2];
}
public class MyOwnBitSet3 {
int index;
}
JOL 产生以下结果:
MyOwnBitSet@443b7951d object externals:
ADDRESS SIZE TYPE PATH VALUE
76ea4c7f8 32 MyOwnBitSet (object)
MyOwnBitSet2@69663380d object externals:
ADDRESS SIZE TYPE PATH VALUE
76ea53800 16 MyOwnBitSet2 (object)
76ea53810 32 [J .word [0, 0]
MyOwnBitSet3@5a2e4553d object externals:
ADDRESS SIZE TYPE PATH VALUE
76ea5c070 16 MyOwnBitSet3 (object)
让我解释一下最后一个例子MyOwnBitSet3
。为了减少内存占用,您可以预先分配一个巨大的 long
/int
对象数组,并仅将指针存储在正确的单元格中。对于足够多的对象,这个选项是最有利的。
要在一个对象中存储总共 64 位,您可以这样做
class MyClass{
int timeStamp
int id
}
或者,如果您不想开销对象,您可以这样做
long timeStampAndId;
问题是如何封装你的操作。对于原始人。 Java 没有多大帮助,但你可以做的是
enum TimeStampAndId {
/* no instances */ ;
public static boolean isTimeStampSet(long timeStampAndId, int n) { ... }
public static boolean isIdSet(long timeStampAndId, int n) { ... }
即使用实用程序 class 来支持原始类型。
将来 Java 将支持没有对象开销的值类型。
从昨天开始,这引起了我的兴趣和关注。我正在尝试将位存储在 Java 中并受到内存开销的影响。
我的第一个问题是
根据答案,我查看了其他参考资料并找到了 Memory Usage 指南。
然后我查看了 BitSet
看起来像
public class BitSet implements Cloneable, java.io.Serializable {
/*
* BitSets are packed into arrays of "words." Currently a word is
* a long, which consists of 64 bits, requiring 6 address bits.
* The choice of word size is determined purely by performance concerns.
*/
private final static int ADDRESS_BITS_PER_WORD = 6;
private final static int BITS_PER_WORD = 1 << ADDRESS_BITS_PER_WORD;
private final static int BIT_INDEX_MASK = BITS_PER_WORD - 1;
/* Used to shift left or right for a partial word mask */
private static final long WORD_MASK = 0xffffffffffffffffL;
/**
* @serialField bits long[]
*
* The bits in this BitSet. The ith bit is stored in bits[i/64] at
* bit position i % 64 (where bit position 0 refers to the least
* significant bit and 63 refers to the most significant bit).
*/
private static final ObjectStreamField[] serialPersistentFields = {
new ObjectStreamField("bits", long[].class),
};
/**
* The internal field corresponding to the serialField "bits".
*/
private long[] words;
/**
* The number of words in the logical size of this BitSet.
*/
private transient int wordsInUse = 0;
/**
* Whether the size of "words" is user-specified. If so, we assume
* the user knows what he's doing and try harder to preserve it.
*/
private transient boolean sizeIsSticky = false;
/* use serialVersionUID from JDK 1.0.2 for interoperability */
private static final long serialVersionUID = 7997698588986878753L;
/**
* Given a bit index, return word index containing it.
*/
private static int wordIndex(int bitIndex) {
return bitIndex >> ADDRESS_BITS_PER_WORD;
}
.....
}
根据Memory Guide
的计算,这是我计算出来的
8 Bytes: housekeeping space
12 Bytes: 3 ints
8 Bytes: long
12 Bytes: long[]
4 Bytes: transient int // does it count?
1 Byte : transient boolean
3 Bytes: padding
这总和为 45 + 3 bytes (padding to reach multiple of 8)
这意味着空 BitSet
本身保留 48 bytes
。
但我的要求是存储位,我错过了什么?我在这里有什么选择?
非常感谢
更新
我的要求是我想在两个单独的字段中存储总计 64 bits
class MyClass{
BitSet timeStamp
BitSet id
}
我想在内存中存储数百万个 MyClass
对象
My requirement is that I want to store total of 64 bits in two separate fields
所以只使用 long(64 位整数)。并将其用作位域。我曾经需要这样的东西,但 32 位对我来说已经足够了,所以写了一个小库 class 来使用 int 作为位集: https://github.com/claudemartin/smallset
随意分叉它,只需将 int 替换为 long,将 32 替换为 64,将 1 替换为 1L 等
This sums to 45 + 3 bytes (padding to reach multiple of 8) This means an empty BitSet itself reserves 48 bytes.
首先,我想推荐正确的工具来分析 JVM 中的对象布局方案 - JOL。在您的情况下 (java -jar jol-cli/target/jol-cli.jar internals java.util.BitSet
) JOL 产生以下结果:
Running 64-bit HotSpot VM.
Using compressed references with 3-bit shift.
Objects are 8 bytes aligned.
Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
java.util.BitSet object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 (00000001 00000000 00000000 00000000) (1)
4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4 (object header) f4 df 9f e0 (11110100 11011111 10011111 11100000) (-526393356)
12 4 int BitSet.wordsInUse 0
16 1 boolean BitSet.sizeIsSticky false
17 3 (alignment/padding gap) N/A
20 4 long[] BitSet.words [0]
Instance size: 24 bytes (reported by Instrumentation API)
Space losses: 3 bytes internal + 0 bytes external = 3 bytes total
由于静态字段,您的计算不正确,因此空 BitSet
本身保留 24 个字节。请注意,这些计算并非 100% 准确,因为它没有考虑 long[]
对象的大小。所以正确的结果是 java -jar jol-cli/target/jol-cli.jar externals java.util.BitSet
:
Running 64-bit HotSpot VM.
Using compressed references with 3-bit shift.
Objects are 8 bytes aligned.
Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
java.util.BitSet@6b25f76bd object externals:
ADDRESS SIZE TYPE PATH VALUE
7ae321a48 24 java.util.BitSet (object)
7ae321a60 24 [J .words [0]
这意味着一个空的 BitSet 本身使用 48 个字节,包括长数组。
为了优化内存占用,您可以编写自己的 BitSet
实现。例如,在您的用例中,可以使用以下选项:
public class MyOwnBitSet {
long word1;
long word2;
}
public class MyOwnBitSet2 {
long[] word = new long[2];
}
public class MyOwnBitSet3 {
int index;
}
JOL 产生以下结果:
MyOwnBitSet@443b7951d object externals:
ADDRESS SIZE TYPE PATH VALUE
76ea4c7f8 32 MyOwnBitSet (object)
MyOwnBitSet2@69663380d object externals:
ADDRESS SIZE TYPE PATH VALUE
76ea53800 16 MyOwnBitSet2 (object)
76ea53810 32 [J .word [0, 0]
MyOwnBitSet3@5a2e4553d object externals:
ADDRESS SIZE TYPE PATH VALUE
76ea5c070 16 MyOwnBitSet3 (object)
让我解释一下最后一个例子MyOwnBitSet3
。为了减少内存占用,您可以预先分配一个巨大的 long
/int
对象数组,并仅将指针存储在正确的单元格中。对于足够多的对象,这个选项是最有利的。
要在一个对象中存储总共 64 位,您可以这样做
class MyClass{
int timeStamp
int id
}
或者,如果您不想开销对象,您可以这样做
long timeStampAndId;
问题是如何封装你的操作。对于原始人。 Java 没有多大帮助,但你可以做的是
enum TimeStampAndId {
/* no instances */ ;
public static boolean isTimeStampSet(long timeStampAndId, int n) { ... }
public static boolean isIdSet(long timeStampAndId, int n) { ... }
即使用实用程序 class 来支持原始类型。
将来 Java 将支持没有对象开销的值类型。