如何解决分割错误?

JiaHao Xu

(编辑:我刚刚解决了getpid缓存问题,然后重新运行gdbvalgrind。)

(编辑:我只是将子项的堆栈大小从200字节增加到2000字节。)

我写了下面的程序来学习如何使用cloneCLONE_VM | CLONE_VFORK | CLONE_PARENTlinux x86-64机:

// test.c
#define _GNU_SOURCE
#include <stdio.h>
#include <assert.h>
#include <syscall.h>  // For syscall to call getpid
#include <signal.h>   // For SIGCHILD
#include <sys/types.h>// For getppid
#include <unistd.h>   // For getppid and sleep
#include <sched.h>    // For clone
#include <stdlib.h>   // For calloc and free

#define STACK_SIZE 2000

void Puts(const char *str)
{
    assert(fputs(str, stderr) != EOF);
}

void Sleep(unsigned int sec)
{
    do {
        sec = sleep(sec);
    } while(sec > 0);
}

int child(void *useless)
{
    Puts("The new process is created.\n");
    assert(fprintf(stderr, "pid = %d, ppid = %d\n", (pid_t) syscall(SYS_getpid), getppid()) > 0);

    Puts("sleep for 120 secs\n");
    Sleep(120);

    return 0;
}

int main(int argc, char* argv[])
{
    Puts("Allocate stack for new process\n");
    void *stack = calloc(STACK_SIZE, sizeof(char));
    void *stack_top = (void*) ((char*) stack + STACK_SIZE - 1);
    assert(fprintf(stderr, "stack = %p, stack top = %p\n", stack, stack_top) > 0);

    Puts("clone\n");
    int ret = clone(child, stack_top, CLONE_VM | CLONE_VFORK | CLONE_PARENT | SIGCHLD, NULL);
    Puts("clone returns\n");

    Puts("Free the stack\n");
    free(stack);

    if (ret == -1)
        perror("clone(child, stack, CLONE_VM | CLONE_VFORK, NULL)");
    else {
        ret = 0;
        Puts("Child dies...\n");
    }

    return ret;
}

我编译使用该程序clang-7 test.c并运行它./a.outbash它立即返回,并显示以下输出:

Allocate stack for new process
stack = 0x492260, stack top = 0x492a2f
clone
The new process is created.
Segmentation fault

它返回139意味着信号SIGSEGV已发送到我的进程。

然后,我-g使用valgrind --trace-children=yes ./a.out进行了重新编译并用于调试它:

|| ==14494== Memcheck, a memory error detector
|| ==14494== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
|| ==14494== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info
|| ==14494== Command: ./a.out
|| ==14494== 
|| Allocate stack for new process
|| stack = 0x51f3040, stack top = 0x51f380f
|| clone
|| clone returns
|| Free the stack
|| Child dies...
|| ==14495== Invalid write of size 4
|| ==14495==    at 0x201322: ??? (in /home/nobodyxu/a.out)
|| ==14495==    by 0x4F2FCBE: clone (clone.S:95)
|| ==14495==  Address 0xffffffffffffffdc is not stack'd, malloc'd or (recently) free'd
|| ==14495== 
|| ==14495== 
|| ==14495== Process terminating with default action of signal 11 (SIGSEGV)
|| ==14495==  Access not within mapped region at address 0xFFFFFFFFFFFFFFDC
|| ==14495==    at 0x201322: ??? (in /home/nobodyxu/a.out)
|| ==14495==    by 0x4F2FCBE: clone (clone.S:95)
|| ==14495==  If you believe this happened as a result of a stack
|| ==14495==  overflow in your program's main thread (unlikely but
|| ==14495==  possible), you can try to increase the size of the
|| ==14495==  main thread stack using the --main-stacksize= flag.
|| ==14495==  The main thread stack size used in this run was 8388608.
|| ==14495== 
|| ==14495== HEAP SUMMARY:
|| ==14495==     in use at exit: 2,000 bytes in 1 blocks
|| ==14495==   total heap usage: 1 allocs, 0 frees, 2,000 bytes allocated
|| ==14495== 
|| ==14495== LEAK SUMMARY:
|| ==14495==    definitely lost: 0 bytes in 0 blocks
|| ==14495==    indirectly lost: 0 bytes in 0 blocks
|| ==14495==      possibly lost: 0 bytes in 0 blocks
|| ==14495==    still reachable: 2,000 bytes in 1 blocks
|| ==14495==         suppressed: 0 bytes in 0 blocks
|| ==14495== Rerun with --leak-check=full to see details of leaked memory
|| ==14495== 
|| ==14495== For counts of detected and suppressed errors, rerun with: -v
|| ==14495== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
|| ==14494== 
|| ==14494== HEAP SUMMARY:
|| ==14494==     in use at exit: 0 bytes in 0 blocks
|| ==14494==   total heap usage: 1 allocs, 1 frees, 2,000 bytes allocated
|| ==14494== 
|| ==14494== All heap blocks were freed -- no leaks are possible
|| ==14494== 
|| ==14494== For counts of detected and suppressed errors, rerun with: -v
|| ==14494== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

它还立即返回并打印这些。

我检查了生成的程序集0x201322,发现它属于int main(int argc, char* argv[])

||   20131d:    e8 8e 01 00 00          callq  2014b0 <clone@plt>
||   201322:    89 45 dc                mov    %eax,-0x24(%rbp)
||   201325:    48 bf 54 09 20 00 00    movabs $0x200954,%rdi
||   20132c:    00 00 00 
||   20132f:    e8 dc fd ff ff          callq  201110 <Puts>
||   201334:    48 bf ad 08 20 00 00    movabs $0x2008ad,%rdi
||   20133b:    00 00 00 

我也尝试使用set follow-fork-mode childin对其gdb进行调试,但这是行不通的。

如何解决分割错误?

哈克锯

如果没有各种防护栏,函数printf和fprintf似乎不是线程安全的。这在segfault中带有clone()和printf进行了详细说明

我通过蛮力方法发现了最后一次打印的位置,然后注释掉了行,直到错误消失,才发现了问题。

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章