Printf in loop not working without initial print statement

mlz7

I am attempting to write a very basic lexxer in C and have the following code which is supposed to just do something like the following:

Input: "12 142 123"

Output:

NUMBER -- 12
NUMBER -- 14
NUMBER -- 123

However, I am having an issue where if I do not include an initial printf("") statement before looping over the input, then I will get an output like this: Output:

NUMBER --
NUMBER -- 14
NUMBER -- 123

where the first number is simply blank. I am really confused as to why this is happening and would really appreciate some help with this!

I have the following code (with a number of irrelevant functions omitted)

#define MAX_LEN 400

char* input;
char* ptr;

char curr_type;
char curr;

enum token_type {
  END,
  NUMBER,
  UNEXPECTED
};

typedef struct {
  enum token_type type;
  char* str;
} Token;
  
void print_tok(Token t) {
  printf("%s -- %s\n", token_types[t.type], t.str);
}

char get(void) {
  return *ptr++;
}

char peek(void) {
  return *ptr;
}

Token number(void) {
  char arr[MAX_LEN];
  arr[0] = peek();
  get();
  int i = 1;
  while (is_digit(peek())) {
    arr[i] = get();
    ++i;
  }
  arr[++i] = '\0';
  Token ret = {NUMBER, (char*)arr};
  return ret;
}

Token unexpected(void) {
  // omitted
}

Token next(void) {
  while (is_space(peek())) get();

  char c = peek();
  switch (peek()) {
    case '0':
    // omitted
    case '9':
      return number();
    default: 
      return unexpected();
  }
}

int main(int argc, char **argv) {
  printf(""); // works fine with this line

  input = argv[1];
  ptr = input;

  Token tokens[MAX_LEN];
  Token t;
  int i = 0;
  do {
    t = next();
    print_tok(t);
    
    tokens[i++] = t;

  } while (t.type != END && t.type != UNEXPECTED);

  return 0;
}

user253751

In number, arr is a local variable. The local variable is destroyed when its function ends and its content is then unpredictable. Nonetheless, your program then prints its value by using a pointer in the Token struct.

The value that is printed is unpredictable. The extra printf("") statement may cause the compiler to rearrange the code in a way that causes the variable to not get overwritten, or something like that. You cannot rely on it.

You have several other options to allocate memory per token:

  • Change str in token so it's an array of chars instead of a pointer. Then each token has its own space to store the string.
  • Allocate the string with malloc. Then it stays allocated until you free it.
  • Create the array in main so it's valid for both next and print_tok. You'd have to give next a pointer to the array, so it knows where it should store the string. This would only store one token's string at a time.
  • Basically any other way of creating an array other than making it a local variable in next.
  • Make the pointer point to where the token is in the original string. Add another variable in Token which stores how long the token is.

I think the first option is easiest and the last option uses the least memory, but I included some other options for completeness.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related