实施 `memcpy()`：需要 `unsigned char ` 还是只需要 `char `？

Question

我正在实施 memcpy() 的一个版本，以便能够与 volatile 一起使用。使用 char * 安全还是我需要 unsigned char *？

volatile void *memcpy_v(volatile void *dest, const volatile void *src, size_t n)
{
    const volatile char *src_c  = (const volatile char *)src;
    volatile char *dest_c       = (volatile char *)dest;

    for (size_t i = 0; i < n; i++) {
        dest_c[i]   = src_c[i];
    }

    return  dest;
}

我认为如果缓冲区的任何单元格中的数据是 > INT8_MAX，我认为 unsigned 应该是避免溢出问题所必需的，我认为这可能是 UB。

Answer 1

你不需要unsigned。

像这样：

volatile void *memcpy_v(volatile void *dest, const volatile void *src, size_t n)
{
    const volatile char *src_c  = (const volatile char *)src;
    volatile char *dest_c       = (volatile char *)dest;

    for (size_t i = 0; i < n; i++) {
        dest_c[i]   = src_c[i];
    }

    return  dest;
}

尝试在 char 具有陷阱值的地方进行确认实施最终会导致矛盾：

fopen("", "rb") 不需要仅使用 fread() 和 fwrite()
fgets() 将 char * 作为其第一个参数，可用于二进制文件。
strlen() 查找给定 char * 到下一个空值的距离。由于 fgets() 保证已写入一个，因此它不会读取数组末尾，因此不会捕获

Answer 2

理论上，您的代码可能运行在禁止有符号 char 中的一位模式的机器上。它可能使用负整数的补码或符号大小表示，其中一位模式将被解释为带负号的 0。即使在二进制补码架构上，该标准也允许实现限制负整数的范围，以便 INT_MIN == -INT_MAX，尽管我不知道有任何实际机器会这样做。

因此，根据 §6.2.6.2p2，可能有一个带符号的字符值，一种实现可能将其视为陷阱表示：

Which of these [representations of negative integers] applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two [sign-magnitude and two's complement]), or with sign bit and all value bits 1 (for ones' complement), is a trap representation or a normal value. In the case of sign and magnitude and ones’ complement, if this representation is a normal value it is called a negative zero.

（字符类型不能有任何其他陷阱值，因为 §6.2.6.2 要求 signed char 没有任何填充位，这是可以形成陷阱表示的唯一其他方式。对于同样的原因，没有位模式是 unsigned char 的陷阱表示。）

因此，如果这个假设的机器有一个 C 实现，其中 char 被签名，那么通过 char 复制任意字节可能会涉及复制陷阱表示。

对于 char（如果它碰巧是有符号的）和 signed char 以外的有符号整数类型，读取作为陷阱表示的值是未定义的行为。但是 §6.2.6.1/5 允许读取和写入这些值 仅适用于字符类型:

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined. Such a representation is called a trap representation. (Emphasis added)

（第三句有点笨拙，但为了简化：将值存储到内存中是一个"side effect that modifies all of the object"，所以它也是允许的。）

简而言之，多亏了这个例外，您可以在 memcpy 的实现中使用 char，而不必担心未定义的行为。

然而，strcpy却并非如此。 strcpy 必须检查终止字符串的尾随 NUL 字节，这意味着它需要将它从内存中读取的值与 0 进行比较。比较运算符（实际上，所有算术运算符）首先对其操作数执行整数提升，这会将 char 转换为 int。据我所知，陷阱表示的整数提升是未定义的行为，因此在假设机器上的假设 C 实现运行ning 上，您需要使用 unsigned char 才能实现 strcpy.

Answer 3

Is it safe to use char * or do I need unsigned char *?

也许

"String handling"函数如memcpy()有规范：

For all functions in this subclause, each character shall be interpreted as if it had the type unsigned char (and therefore every possible object representation is valid and has a different value). C11dr §7.23.1 3

使用unsigned char是指定的"as if"类型。尝试其他方法收效甚微 - 这可能有效也可能无效。

将 char 与 memcpy() 一起使用可能 有效，但将该范例扩展到其他类似函数会导致问题。

避免 char 用于 str...() 和 mem...() 之类的函数的一个重要原因是，有时它会产生 functional 意想不到的差异。

memcmp(), strcmp() 肯定不同于 (signed) char 与 unsigned char.

迂腐：在遗物非 2 与 signed char 的补码上，只有 '[=24=]' 应该结束 string.然而 negative_zero == 0 和 negative_zero 的 char 不应表示 string.

的结尾

Answer 4

unsigned 不需要，但没有理由为此功能使用普通的 char。 Plain char 应该只用于实际的字符串。对于其他用途，类型 unsigned char 或 uint8_t 和 int8_t 更精确，因为已明确指定符号。

如果想简化函数代码，可以去掉强制转换：

volatile void *memcpy_v(volatile void *dest, const volatile void *src, size_t n) {
    const volatile unsigned char *src_c = src;
    volatile unsigned char *dest_c = dest;

    for (size_t i = 0; i < n; i++) {
        dest_c[i] = src_c[i];
    }
    return dest;
}

实施 `memcpy()`：需要 `unsigned char ` 还是只需要 `char `？

Implement `memcpy()`: Is `unsigned char ` needed, or just `char `?

c

unsigned

pointers

casting

char

实施 `memcpy()`：需要 `unsigned char *` 还是只需要 `char *`？

Implement `memcpy()`: Is `unsigned char *` needed, or just `char *`?

c

unsigned

pointers

casting

char

实施 `memcpy()`：需要 `unsigned char ` 还是只需要 `char `？

Implement `memcpy()`: Is `unsigned char ` needed, or just `char `?