在 Redshift 中创建较大的 VARCHAR 值是否有缺点?
Are there downsides for creating a large VARCHAR value in Redshift?
源数据不断为长度越来越大的字段抛出值。现在我正在使用 VARCHAR(200),但我可能会选择 VARCHAR(400)
。使用大数字有什么缺点吗?
你是什么意思"downside"?如果您没有使该列足够大,那么就会有一个很大的缺点——您不能用它来存储您想要存储在那里的值。
至于额外的开销,您不必担心。 varchar()
类型基本上只占用值所需的存储空间,加上长度的少量开销。此外,“400”并不是一个很大的数字,尤其是与“200”相比时。
因此,如果您需要 400 个字节来存储值,请更改 table 来存储它。更改值的长度可能会产生开销。我不确定 RedShift 是否会因为类型更改而感到需要复制数据。不过对性能的影响应该可以忽略不计。
Don’t make it a practice to use the maximum column size for convenience.
Instead, consider the largest values you are likely to store in a VARCHAR column, for example, and size your columns accordingly. Because Amazon Redshift compresses column data very effectively, creating columns much larger than necessary has minimal impact on the size of data tables. During processing for complex queries, however, intermediate query results might need to be stored in temporary tables. Because temporary tables are not compressed, unnecessarily large columns consume excessive memory and temporary disk space, which can affect query performance.
http://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-smallest-column-size.html
源数据不断为长度越来越大的字段抛出值。现在我正在使用 VARCHAR(200),但我可能会选择 VARCHAR(400)
。使用大数字有什么缺点吗?
你是什么意思"downside"?如果您没有使该列足够大,那么就会有一个很大的缺点——您不能用它来存储您想要存储在那里的值。
至于额外的开销,您不必担心。 varchar()
类型基本上只占用值所需的存储空间,加上长度的少量开销。此外,“400”并不是一个很大的数字,尤其是与“200”相比时。
因此,如果您需要 400 个字节来存储值,请更改 table 来存储它。更改值的长度可能会产生开销。我不确定 RedShift 是否会因为类型更改而感到需要复制数据。不过对性能的影响应该可以忽略不计。
Don’t make it a practice to use the maximum column size for convenience.
Instead, consider the largest values you are likely to store in a VARCHAR column, for example, and size your columns accordingly. Because Amazon Redshift compresses column data very effectively, creating columns much larger than necessary has minimal impact on the size of data tables. During processing for complex queries, however, intermediate query results might need to be stored in temporary tables. Because temporary tables are not compressed, unnecessarily large columns consume excessive memory and temporary disk space, which can affect query performance.
http://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-smallest-column-size.html