使用非 NULL 指针时出现分段错误
segmentation fault when using a non NULL pointer
使用dpdk时出现标题奇怪的问题,
当我使用 rte_pktmbuf_alloc(struct rte_mempool *) 并且已经验证 rte_pktmbuf_pool_create() 的 return 值不为 NULL 时,进程收到分段错误。
关注
ing message is output of gdb in dpdk source code:
Thread 1 "osw" received signal SIGSEGV, Segmentation fault.
0x00000000005e9f41 in __mempool_generic_get (cache=0x1a7dfc000000000, n=1, obj_table=0x7fffffffdec8, mp=0x101a7df00)at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1449
1449 if (unlikely(cache == NULL || n >= cache->size))
(gdb) p cache
= (struct rte_mempool_cache *) 0x1a7dfc000000000
(gdb) bt
0 0x00000000005e9f41 in __mempool_generic_get (cache=0x1a7dfc000000000, n=1, obj_table=0x7fffffffdeb8, mp=0x101a7df00)
at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1449
1 rte_mempool_generic_get (cache=0x1a7dfc000000000, n=1, obj_table=0x7fffffffdeb8, mp=0x101a7df00)
at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1517
2 rte_mempool_get_bulk (n=1, obj_table=0x7fffffffdeb8, mp=0x101a7df00)
at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1552
3 rte_mempool_get (obj_p=0x7fffffffdeb8, mp=0x101a7df00) at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1578
4 rte_mbuf_raw_alloc (mp=0x101a7df00) at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mbuf.h:586
5 rte_pktmbuf_alloc (mp=0x101a7df00) at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mbuf.h:896
然后我深入 rte_mempool.h:
并更改线路 1449-1450
1449 if (unlikely(cache == NULL || n >= cache->size))
1450 goto ring_dequeue;
到
1449 if (unlikely(cache == NULL))
1450 goto ring_dequeue;
1451 if (unlikely(n >= cache->size))
1452 goto ring_dequeue;
它也在第 1451 行失败
更改后的 gdb 输出消息:
Thread 1 "osw" received signal SIGSEGV, Segmentation fault.
__mempool_generic_get (cache=0x1a7dfc000000000, n=1, obj_table=0x7fffffffdeb8, mp=0x101a7df00)
at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1451
1451 if (unlikely(n >= cache->size))
(gdb) p cache
= (struct rte_mempool_cache *) 0x1a7dfc000000000
(gdb) bt
0 __mempool_generic_get (cache=0x1a7dfc000000000, n=1, obj_table=0x7fffffffdeb8, mp=0x101a7df00)
at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1451
1 rte_mempool_generic_get (cache=0x1a7dfc000000000, n=1, obj_table=0x7fffffffdeb8, mp=0x101a7df00)
at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1519
2 rte_mempool_get_bulk (n=1, obj_table=0x7fffffffdeb8, mp=0x101a7df00)
at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1554
3 rte_mempool_get (obj_p=0x7fffffffdeb8, mp=0x101a7df00) at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1580
4 rte_mbuf_raw_alloc (mp=0x101a7df00) at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mbuf.h:586
5 rte_pktmbuf_alloc (mp=0x101a7df00) at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mbuf.h:896
6 main (argc=<optimized out>, argv=<optimized out>) at ofpd.c:150
(gdb) p cache->size
Cannot access memory at address 0x1a7dfc000000000
看起来存储的内存地址“缓存”指针不是NULL,但实际上是一个NULL指针。
我不知道为什么 "cache" 指针地址在前缀 4 字节处为非零而在后缀 4 字节处为零。
DPDK版本是20.05,我也试过18.11和19.11
OS是CentOS8.1内核是4.18.0-147.el8.x86_64.
CPU是AMD EPYC 7401P。
#define RING_SIZE 16384
#define NUM_MBUFS 8191
#define MBUF_CACHE_SIZE 512
int main(int argc, char **argv)
{
int ret;
uint16_t portid;
unsigned cpu_id = 1;
struct rte_mempool *tmp;
int arg = rte_eal_init(argc, argv);
if (arg < 0)
rte_exit(EXIT_FAILURE, "Cannot init EAL: %s\n", rte_strerror(rte_errno));
if (rte_lcore_count() < 10)
rte_exit(EXIT_FAILURE, "We need at least 10 cores.\n");
argc -= arg;
argv += arg;
/* Creates a new mempool in memory to hold the mbufs. */
tmp = rte_pktmbuf_pool_create("TMP", NUM_MBUFS, MBUF_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id());
if (tmp == NULL)
rte_exit(EXIT_FAILURE, "Cannot create mbuf pool, %s\n", rte_strerror(rte_errno));
printf("tmp addr = %x\n", tmp);
struct rte_mbuf *test = rte_pktmbuf_alloc(tmp);
rte_exit(EXIT_FAILURE, "end\n");
}
我在使用 getifaddrs() 的 return 指针时遇到过同样的问题,它也遇到了分段错误,我不得不像
那样移动指针地址
ifa->ifa_addr = (struct sockaddr *)((uintptr_t)(ifa->ifa_addr) >> 32);
然后就可以正常工作了
因此,我认为这不是 dpdk 特有的问题。
有人知道这个问题吗?
谢谢。
我可以 运行 通过修改您的代码
而不会出现任何错误
- 包括headers
- 删除了未使用的变量
- 为
alloc
添加检查返回值是否为 NULL
测试时间:
CPU: Intel(R) Xeon(R) CPU E5-2699
OS: 4.15.0-101-generic
GCC: 7.5.0
DPDK version: 19.11.2, dpdk mainline
Library mode: static
代码:
int main(int argc, char **argv)
{
int ret = 0;
struct rte_mempool *tmp;
int arg = rte_eal_init(argc, argv);
if (arg < 0)
rte_exit(EXIT_FAILURE, "Cannot init EAL: %s\n", rte_strerror(rte_errno));
if (rte_lcore_count() < 10)
rte_exit(EXIT_FAILURE, "We need at least 10 cores.\n");
argc -= arg;
argv += arg;
/* Creates a new mempool in memory to hold the mbufs. */
tmp = rte_pktmbuf_pool_create("TMP", NUM_MBUFS, MBUF_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id());
if (tmp == NULL)
rte_exit(EXIT_FAILURE, "Cannot create mbuf pool, %s\n", rte_strerror(rte_errno));
printf("tmp addr = %p\n", tmp);
struct rte_mbuf *test = rte_pktmbuf_alloc(tmp);
if (test == NULL)
rte_exit(EXIT_FAILURE, "end\n");
return ret;
}
[EDIT-1] 基于评论 Brydon Gibson
注:
- 由于我无法访问您的代码库或工作代码片段,我的建议是 从 DPDK/examples/l2fwd 或 DPDK/examples/skeleton 中查找任何示例代码并复制 headers 供您编译。
- 我假设作者
THE
和 Brydon
是不同的人,可能在不同的代码库上面临相似的问题。
- 当前问题声称 DPDK 版本 20.05、18.11 和 19.11 错误已通过代码片段重现。
- 当前答案清楚地说明了库的 静态链接 相同的代码片段有效
请求@BrydonGibson 打开包含相关信息和环境详细信息的票证,因为它可能有所不同。
使用dpdk时出现标题奇怪的问题,
当我使用 rte_pktmbuf_alloc(struct rte_mempool *) 并且已经验证 rte_pktmbuf_pool_create() 的 return 值不为 NULL 时,进程收到分段错误。
关注
ing message is output of gdb in dpdk source code:
Thread 1 "osw" received signal SIGSEGV, Segmentation fault.
0x00000000005e9f41 in __mempool_generic_get (cache=0x1a7dfc000000000, n=1, obj_table=0x7fffffffdec8, mp=0x101a7df00)at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1449
1449 if (unlikely(cache == NULL || n >= cache->size))
(gdb) p cache
= (struct rte_mempool_cache *) 0x1a7dfc000000000
(gdb) bt
0 0x00000000005e9f41 in __mempool_generic_get (cache=0x1a7dfc000000000, n=1, obj_table=0x7fffffffdeb8, mp=0x101a7df00)
at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1449
1 rte_mempool_generic_get (cache=0x1a7dfc000000000, n=1, obj_table=0x7fffffffdeb8, mp=0x101a7df00)
at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1517
2 rte_mempool_get_bulk (n=1, obj_table=0x7fffffffdeb8, mp=0x101a7df00)
at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1552
3 rte_mempool_get (obj_p=0x7fffffffdeb8, mp=0x101a7df00) at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1578
4 rte_mbuf_raw_alloc (mp=0x101a7df00) at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mbuf.h:586
5 rte_pktmbuf_alloc (mp=0x101a7df00) at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mbuf.h:896
然后我深入 rte_mempool.h:
并更改线路 1449-1450
1449 if (unlikely(cache == NULL || n >= cache->size))
1450 goto ring_dequeue;
到
1449 if (unlikely(cache == NULL))
1450 goto ring_dequeue;
1451 if (unlikely(n >= cache->size))
1452 goto ring_dequeue;
它也在第 1451 行失败
更改后的 gdb 输出消息:
Thread 1 "osw" received signal SIGSEGV, Segmentation fault.
__mempool_generic_get (cache=0x1a7dfc000000000, n=1, obj_table=0x7fffffffdeb8, mp=0x101a7df00)
at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1451
1451 if (unlikely(n >= cache->size))
(gdb) p cache
= (struct rte_mempool_cache *) 0x1a7dfc000000000
(gdb) bt
0 __mempool_generic_get (cache=0x1a7dfc000000000, n=1, obj_table=0x7fffffffdeb8, mp=0x101a7df00)
at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1451
1 rte_mempool_generic_get (cache=0x1a7dfc000000000, n=1, obj_table=0x7fffffffdeb8, mp=0x101a7df00)
at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1519
2 rte_mempool_get_bulk (n=1, obj_table=0x7fffffffdeb8, mp=0x101a7df00)
at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1554
3 rte_mempool_get (obj_p=0x7fffffffdeb8, mp=0x101a7df00) at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1580
4 rte_mbuf_raw_alloc (mp=0x101a7df00) at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mbuf.h:586
5 rte_pktmbuf_alloc (mp=0x101a7df00) at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mbuf.h:896
6 main (argc=<optimized out>, argv=<optimized out>) at ofpd.c:150
(gdb) p cache->size
Cannot access memory at address 0x1a7dfc000000000
看起来存储的内存地址“缓存”指针不是NULL,但实际上是一个NULL指针。
我不知道为什么 "cache" 指针地址在前缀 4 字节处为非零而在后缀 4 字节处为零。
DPDK版本是20.05,我也试过18.11和19.11
OS是CentOS8.1内核是4.18.0-147.el8.x86_64.
CPU是AMD EPYC 7401P。
#define RING_SIZE 16384
#define NUM_MBUFS 8191
#define MBUF_CACHE_SIZE 512
int main(int argc, char **argv)
{
int ret;
uint16_t portid;
unsigned cpu_id = 1;
struct rte_mempool *tmp;
int arg = rte_eal_init(argc, argv);
if (arg < 0)
rte_exit(EXIT_FAILURE, "Cannot init EAL: %s\n", rte_strerror(rte_errno));
if (rte_lcore_count() < 10)
rte_exit(EXIT_FAILURE, "We need at least 10 cores.\n");
argc -= arg;
argv += arg;
/* Creates a new mempool in memory to hold the mbufs. */
tmp = rte_pktmbuf_pool_create("TMP", NUM_MBUFS, MBUF_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id());
if (tmp == NULL)
rte_exit(EXIT_FAILURE, "Cannot create mbuf pool, %s\n", rte_strerror(rte_errno));
printf("tmp addr = %x\n", tmp);
struct rte_mbuf *test = rte_pktmbuf_alloc(tmp);
rte_exit(EXIT_FAILURE, "end\n");
}
我在使用 getifaddrs() 的 return 指针时遇到过同样的问题,它也遇到了分段错误,我不得不像
那样移动指针地址ifa->ifa_addr = (struct sockaddr *)((uintptr_t)(ifa->ifa_addr) >> 32);
然后就可以正常工作了
因此,我认为这不是 dpdk 特有的问题。
有人知道这个问题吗?
谢谢。
我可以 运行 通过修改您的代码
而不会出现任何错误- 包括headers
- 删除了未使用的变量
- 为
alloc
添加检查返回值是否为 NULL
测试时间:
CPU: Intel(R) Xeon(R) CPU E5-2699
OS: 4.15.0-101-generic
GCC: 7.5.0
DPDK version: 19.11.2, dpdk mainline
Library mode: static
代码:
int main(int argc, char **argv)
{
int ret = 0;
struct rte_mempool *tmp;
int arg = rte_eal_init(argc, argv);
if (arg < 0)
rte_exit(EXIT_FAILURE, "Cannot init EAL: %s\n", rte_strerror(rte_errno));
if (rte_lcore_count() < 10)
rte_exit(EXIT_FAILURE, "We need at least 10 cores.\n");
argc -= arg;
argv += arg;
/* Creates a new mempool in memory to hold the mbufs. */
tmp = rte_pktmbuf_pool_create("TMP", NUM_MBUFS, MBUF_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id());
if (tmp == NULL)
rte_exit(EXIT_FAILURE, "Cannot create mbuf pool, %s\n", rte_strerror(rte_errno));
printf("tmp addr = %p\n", tmp);
struct rte_mbuf *test = rte_pktmbuf_alloc(tmp);
if (test == NULL)
rte_exit(EXIT_FAILURE, "end\n");
return ret;
}
[EDIT-1] 基于评论 Brydon Gibson
注:
- 由于我无法访问您的代码库或工作代码片段,我的建议是 从 DPDK/examples/l2fwd 或 DPDK/examples/skeleton 中查找任何示例代码并复制 headers 供您编译。
- 我假设作者
THE
和Brydon
是不同的人,可能在不同的代码库上面临相似的问题。 - 当前问题声称 DPDK 版本 20.05、18.11 和 19.11 错误已通过代码片段重现。
- 当前答案清楚地说明了库的 静态链接 相同的代码片段有效
请求@BrydonGibson 打开包含相关信息和环境详细信息的票证,因为它可能有所不同。