用于非零基数组指针分配的 C++ gcc 扩展?
C++ gcc extension for non-zero-based array pointer allocation?
我正在寻找支持 gcc 的 C++ 语言扩展,以启用非基于零的数组指针的分配。理想情况下我可以简单地写:
#include<iostream>
using namespace std;
// Allocate elements array[lo..hi-1], and return the new array.
template<typename Elem>
Elem* Create_Array(int lo, int hi)
{
return new Elem[hi-lo] - lo;
// FIXME what about [expr.add]/4.
// How do we create a pointer outside the array bounds?
}
// Deallocate an array previously allocated via Create_Array.
template<typename Elem>
void Destroy_Array(Elem* array, int lo, int hi)
{
delete[](array + lo);
}
int main()
{
const int LO = 1000000000;
const int HI = LO + 10;
int* array = Create_Array<int>(LO, HI);
for (int i=LO; i<HI; i++)
array[i] = i;
for (int i=LO; i<HI; i++)
cout << array[i] << "\n";
Destroy_Array(array, LO, HI);
}
以上代码似乎有效,但 C++ 标准并未定义。具体来说,问题是 [expr.add]/4:
When an expression that has integral type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
expression P points to element x[i] of an array object x with n
elements, the expressions P + J and J + P (where J has the value j)
point to the (possibly-hypothetical) element x[i + j] if 0 ≤ i + j ≤
n; otherwise, the behavior is undefined. Likewise, the expression P -
J points to the (possibly-hypothetical) element x[i − j] if 0 ≤ i − j
≤ n; otherwise, the behavior is undefined.
换句话说,上面代码中标记为 FIXME 的行的行为是未定义的,因为它计算的指针超出了基于 0 的数组 x
的范围 x[0..n]
。
gcc
是否有一些 --std=...
选项告诉它允许直接计算非基于零的数组指针?
如果不是,是否有一种合理的可移植方式来模拟 return new Type[hi-lo] - lo;
语句,或许可以转换为 long
并返回? (但我会担心引入更多的错误)
此外,是否可以像上面的代码那样,只需要一个寄存器来跟踪每个数组?例如,如果我有 array1[i], array2[i], array3[i]
,这只需要数组指针 array1, array2, array3
的 3 个寄存器,再加上 i
的一个寄存器? (类似地,如果冷取数组引用,我们应该能够直接获取非零基指针,而不是仅仅为了在寄存器中建立引用而进行计算)
假设您在 linux x86-64 上使用 gcc,它支持 intptr_t
和 uintptr_t
类型,它们可以保存任何指针值(有效或无效)并且还支持整数运算。 uintptr_t
更适合此应用,因为它支持 mod 2^64 semantics 而 intptr_t
有 UB 情况。
正如评论中所建议的,我们可以使用它来构建一个 class 来重载 operator[]
并执行范围检查:
#include <iostream>
#include <assert.h>
#include <sstream> // for ostringstream
#include <vector> // out_of_range
#include <cstdint> // uintptr_t
using namespace std;
// Safe non-zero-based array. Includes bounds checking.
template<typename Elem>
class Array {
uintptr_t array; // base value for non-zero-based access
int lo; // lowest valid index
int hi; // highest valid index plus 1
public:
Array(int lo, int hi)
: array(), lo(lo), hi(hi)
{
if (lo > hi)
{
ostringstream msg; msg<<"Array(): lo("<<lo<<") > hi("<<hi<< ")";
throw range_error(msg.str());
}
static_assert(sizeof(uintptr_t) == sizeof(void*),
"Array: uintptr_t size does not match ptr size");
static_assert(sizeof(ptrdiff_t) == sizeof(uintptr_t),
"Array: ptrdiff_t size does not match ptr (efficieny issue)");
Elem* alloc = new Elem[hi-lo];
assert(alloc); // this is redundant; alloc throws bad_alloc
array = (uintptr_t)(alloc) - (uintptr_t)(lo * sizeof(Elem));
// Convert offset to unsigned to avoid overflow UB.
}
//////////////////////////////////////////////////////////////////
// UNCHECKED access utilities (these method names start with "_").
uintptr_t _get_array(){return array;}
// Provide direct access to the base pointer (be careful!)
Elem& _at(ptrdiff_t i)
{return *(Elem*)(array + (uintptr_t)(i * sizeof(Elem)));}
// Return reference to element (no bounds checking)
// On GCC 5.4.0 with -O3, this compiles to an 'lea' instruction
Elem* _get_alloc(){return &_at(lo);}
// Return zero-based array that was allocated
~Array() {delete[](_get_alloc());}
//////////////////////////////
// SAFE access utilities
Elem& at(ptrdiff_t i)
{
if (i < lo || i >= hi)
{
ostringstream msg;
msg << "Array.at(): " << i << " is not in range ["
<< lo << ", " << hi << "]";
throw out_of_range(msg.str());
}
return _at(i);
}
int get_lo() const {return lo;}
int get_hi() const {return hi;}
int size() const {return hi - lo;}
Elem& operator[](ptrdiff_t i){return at(i);}
// std::vector is wrong; operator[] is the typical use and should be safe.
// It's good practice to fix mistakes as we go along.
};
// Test
int main()
{
const int LO = 1000000000;
const int HI = LO + 10;
Array<int> array(LO, HI);
for (int i=LO; i<HI; i++)
array[i] = i;
for (int i=LO; i<HI; i++)
cout << array[i] << "\n";
}
请注意,由于GCC 4.7 Arrays and Pointers:
,仍然无法将intptr_t
计算出的无效"pointer"转换为指针类型
When casting from pointer to integer and back again, the resulting pointer must reference the same object as the original pointer, otherwise the behavior is undefined. That is, one may not use integer arithmetic to avoid the undefined behavior of pointer arithmetic as proscribed in C99 and C11 6.5.6/8.
这就是 array
字段必须是 intptr_t
而不是 Elem*
类型的原因。换句话说,只要在转换回 Elem*
.
之前将 intptr_t
调整为指向原始对象,就会定义行为。
我正在寻找支持 gcc 的 C++ 语言扩展,以启用非基于零的数组指针的分配。理想情况下我可以简单地写:
#include<iostream>
using namespace std;
// Allocate elements array[lo..hi-1], and return the new array.
template<typename Elem>
Elem* Create_Array(int lo, int hi)
{
return new Elem[hi-lo] - lo;
// FIXME what about [expr.add]/4.
// How do we create a pointer outside the array bounds?
}
// Deallocate an array previously allocated via Create_Array.
template<typename Elem>
void Destroy_Array(Elem* array, int lo, int hi)
{
delete[](array + lo);
}
int main()
{
const int LO = 1000000000;
const int HI = LO + 10;
int* array = Create_Array<int>(LO, HI);
for (int i=LO; i<HI; i++)
array[i] = i;
for (int i=LO; i<HI; i++)
cout << array[i] << "\n";
Destroy_Array(array, LO, HI);
}
以上代码似乎有效,但 C++ 标准并未定义。具体来说,问题是 [expr.add]/4:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 ≤ i + j ≤ n; otherwise, the behavior is undefined. Likewise, the expression P - J points to the (possibly-hypothetical) element x[i − j] if 0 ≤ i − j ≤ n; otherwise, the behavior is undefined.
换句话说,上面代码中标记为 FIXME 的行的行为是未定义的,因为它计算的指针超出了基于 0 的数组 x
的范围 x[0..n]
。
gcc
是否有一些 --std=...
选项告诉它允许直接计算非基于零的数组指针?
如果不是,是否有一种合理的可移植方式来模拟 return new Type[hi-lo] - lo;
语句,或许可以转换为 long
并返回? (但我会担心引入更多的错误)
此外,是否可以像上面的代码那样,只需要一个寄存器来跟踪每个数组?例如,如果我有 array1[i], array2[i], array3[i]
,这只需要数组指针 array1, array2, array3
的 3 个寄存器,再加上 i
的一个寄存器? (类似地,如果冷取数组引用,我们应该能够直接获取非零基指针,而不是仅仅为了在寄存器中建立引用而进行计算)
假设您在 linux x86-64 上使用 gcc,它支持 intptr_t
和 uintptr_t
类型,它们可以保存任何指针值(有效或无效)并且还支持整数运算。 uintptr_t
更适合此应用,因为它支持 mod 2^64 semantics 而 intptr_t
有 UB 情况。
正如评论中所建议的,我们可以使用它来构建一个 class 来重载 operator[]
并执行范围检查:
#include <iostream>
#include <assert.h>
#include <sstream> // for ostringstream
#include <vector> // out_of_range
#include <cstdint> // uintptr_t
using namespace std;
// Safe non-zero-based array. Includes bounds checking.
template<typename Elem>
class Array {
uintptr_t array; // base value for non-zero-based access
int lo; // lowest valid index
int hi; // highest valid index plus 1
public:
Array(int lo, int hi)
: array(), lo(lo), hi(hi)
{
if (lo > hi)
{
ostringstream msg; msg<<"Array(): lo("<<lo<<") > hi("<<hi<< ")";
throw range_error(msg.str());
}
static_assert(sizeof(uintptr_t) == sizeof(void*),
"Array: uintptr_t size does not match ptr size");
static_assert(sizeof(ptrdiff_t) == sizeof(uintptr_t),
"Array: ptrdiff_t size does not match ptr (efficieny issue)");
Elem* alloc = new Elem[hi-lo];
assert(alloc); // this is redundant; alloc throws bad_alloc
array = (uintptr_t)(alloc) - (uintptr_t)(lo * sizeof(Elem));
// Convert offset to unsigned to avoid overflow UB.
}
//////////////////////////////////////////////////////////////////
// UNCHECKED access utilities (these method names start with "_").
uintptr_t _get_array(){return array;}
// Provide direct access to the base pointer (be careful!)
Elem& _at(ptrdiff_t i)
{return *(Elem*)(array + (uintptr_t)(i * sizeof(Elem)));}
// Return reference to element (no bounds checking)
// On GCC 5.4.0 with -O3, this compiles to an 'lea' instruction
Elem* _get_alloc(){return &_at(lo);}
// Return zero-based array that was allocated
~Array() {delete[](_get_alloc());}
//////////////////////////////
// SAFE access utilities
Elem& at(ptrdiff_t i)
{
if (i < lo || i >= hi)
{
ostringstream msg;
msg << "Array.at(): " << i << " is not in range ["
<< lo << ", " << hi << "]";
throw out_of_range(msg.str());
}
return _at(i);
}
int get_lo() const {return lo;}
int get_hi() const {return hi;}
int size() const {return hi - lo;}
Elem& operator[](ptrdiff_t i){return at(i);}
// std::vector is wrong; operator[] is the typical use and should be safe.
// It's good practice to fix mistakes as we go along.
};
// Test
int main()
{
const int LO = 1000000000;
const int HI = LO + 10;
Array<int> array(LO, HI);
for (int i=LO; i<HI; i++)
array[i] = i;
for (int i=LO; i<HI; i++)
cout << array[i] << "\n";
}
请注意,由于GCC 4.7 Arrays and Pointers:
,仍然无法将intptr_t
计算出的无效"pointer"转换为指针类型
When casting from pointer to integer and back again, the resulting pointer must reference the same object as the original pointer, otherwise the behavior is undefined. That is, one may not use integer arithmetic to avoid the undefined behavior of pointer arithmetic as proscribed in C99 and C11 6.5.6/8.
这就是 array
字段必须是 intptr_t
而不是 Elem*
类型的原因。换句话说,只要在转换回 Elem*
.
intptr_t
调整为指向原始对象,就会定义行为。