Question

Why does strtok make almost every character whitespace?

Answer and Explanation

The C function strtok, often a source of confusion, doesn't actually turn characters into whitespace. Instead, it treats a string as a sequence of tokens, delimited by a set of characters that you provide, known as the delimiter string. The problem typically arises from a misunderstanding of how strtok works and what it returns. It's not a direct 'replace' tool.

Here's a breakdown of the issue and why you might perceive it as making 'almost every character whitespace':

1. How `strtok` Works:

- The function char strtok(char str, const char delim); modifies the input string `str`. It replaces the first occurrence of a delimiter from `delim` with a null terminator \0.

- On the initial call with a string `str`, it returns a pointer to the beginning of the first token (the part of the string before the first delimiter). On subsequent calls with `str` set to `NULL`, it returns a pointer to the beginning of the next token.

- Importantly, strtok remembers its internal state.

2. The Core Misunderstanding:

- If your delimiter string `delim` includes every character you want to be treated as separate tokens, such as a space, comma, and so on, and your string only contains single character tokens separated by that delimiter set, strtok will find a delimiter after every character it processes. So, in effect it will return one-character tokens.

- If you then try to output your whole string again, it will appear as if it's now whitespace or composed of single characters separated by whitespace, because each token you output is a single character, and strtok modified the original string to use null terminators to mark token ends.

3. Example:

Let's illustrate this with code. Suppose you have a string "a,b,c" and your delimiter string is ", " (a comma and a space):

#include <stdio.h>
#include <string.h>

int main() {
  char str[] = "a,b,c";
  char token = strtok(str, ","); // Delimiter is just the comma

  while (token != NULL) {
    printf("%s\n", token); // This will print 'a', then 'b', then 'c' each on a new line
    token = strtok(NULL, ",");
  }
  printf("Modified String: '%s'\n", str); // Prints only 'a' because the others are now terminated with \0   return 0;
}

In this scenario, the delimiter `,` is used to split the string into tokens `a`, `b`, and `c`. If you were using a delimiter string like `",abcdefghijklmnopqrstuvwxyz"` you would get single character tokens. The original string `str` becomes modified to `a\0b\0c\0`

4. The Issue with Using Too Many Delimiters:

- If you use all possible characters as delimiters (a highly unusual and generally incorrect practice), then each character will effectively become a separate token. However, they are still individual tokens.

5. How to Correctly Use strtok:

- Use strtok with a delimiter string that identifies the separation between the tokens that you want. - Be aware that the original string passed to strtok is modified in place with null terminators.

In summary, strtok does not turn characters into whitespace. It divides the input string into tokens based on the delimiters provided and modifies the original string by inserting null terminators. The confusion can arise when delimiters are chosen that split a string into many single-character tokens, or because of misunderstanding of how strtok modifies its input string in place.

More questions