Cassandra:差异 b/w TEXT(VARCHAR) 和 ASCII

Cassandra: Difference b/w TEXT(VARCHAR) and ASCII

我了解到text和varchar是别名,存储UTF-8字符串。 文档中说的 ASCII 是 "US-ASCII character string" 吗?除了编码还有什么区别?

大小有区别吗?当我存储大字符串(~500KB)时,这两者之间的首选是什么?

关于

If the data is a piece of text, for example a String in Java, which is encoded in UTF-16 in the runtime, but when serialized in Cassandra with text type then UTF-8 is used. UTF-16 always use 2 bytes per character and sometime 4 bytes, but UTF-8 is space efficient and depending on the character can be 1, 2, 3 or 4 bytes long.

That mean that there's CPU work to serialize such data for encoding/decoding purpose. Also depending on the text for example 158786464563, data will be stored with 12 bytes. That means more space is used and more IO as well.

Note cassandra offers the ascii type that follows the US-ASCII character set and is always using 1 byte per character.


Is there any size difference?

Is the a preferred choice between these two when I'm storing large strings (~500KB)?

因为 ascii 比 UTF-8 更 space 高效,而 UTF-8 比 UTF-16 更 space 高效。同样,所有事情都取决于您如何 serializing/encoding/decoding 这些数据。有关更多信息,请查看此“what-is-the-advantage-of-choosing-ascii-encoding-over-utf-8