strtok与strtok_r函数及线程安全问题

news2025/4/12 6:06:05

#include <string.h>

char *strtok(char *str, const char *delim);

char *strtok_r(char *str, const char *delim, char **saveptr);

总的：这两个函数都是分割字符串的函数，但是前者是线程不安全的，后者是线程安全的。

我们先从使用和学习的角度看一个例题：从键盘读入一个字符串，输出其中单词的个数。

第一次写代码如下：

#include<iostream>
#include<cstring>
int main() {
	char ch;
	for (;;) {
		int numOfWord = 0;
		while ((ch = getchar()) != '\n') {
			if (ch == ' ') {
				numOfWord++;
			}
		}
		std::cout << numOfWord + 1 << std::endl;
	}//这段代码有bug，因为如果输入有多个连续空格的话，numOfWord仍然会自增，导致计数不准;
	return 0;
}

测试结果如下：

正如代码中我注释的那样，如果有多个空格，那么代码就无法正确统计单词个数了。

但这难不倒我们：

#include<iostream>
#include<cstring>
#include<cstdio>
using namespace std;
#define N 128
int main() {
    char buff[N];
    while (cin.getline(buff, N)) {//从键盘读入字符串，但是'\n'它不会读入。我们打算使用'\0'作为判断字符串结束的标志;
        int pos = 0, numOfWord = 0;
        bool inWord = false;
        while (buff[pos] != '\0') { // 使用 '\0' 判断字符串结束
            if (buff[pos] != ' ') {
                if (!inWord) {
                    numOfWord++;  // 开始一个新单词
                    inWord = true;
                }
            }
            else {
                inWord = false;  // 结束当前单词
            }
            pos++;
        }
        cout << numOfWord << endl;
        memset(buff, 0, sizeof(buff));
    }
}

结果如下：

这下空格可以被正确地忽略了！但是这段代码显然不太容易一次性写出，有没有更加容易的办法？

当然是：strtok

#define _CRT_SECURE_NO_WARNINGS -1

#include<cstring>
#include<stdio.h>
#include<iostream>

int main() {
	const char s[2] = " ";
	char buffer[128];
	//std::cin >> buffer;
	gets_s(buffer);
	int numOfWord = 0;
	char* p = strtok(buffer, s);
	while (p != NULL) {
		printf("第%d个单词是%s\n", numOfWord + 1, p);
		numOfWord++;
		p = strtok(NULL, s);
	}
	std::cout << "总的单词个数为:" << numOfWord << std::endl;
	return 0;
}

ps：不要使用cin读取，它会跳过空格；

这就是strtok的一个用法；但是在学习LINUX系统编程的时候，我发现它是线程不安全的，多个线程在执行相同的代码，但结果是不同的；

举例：

#include<iostream>
#include<unistd.h>
#include<cstdio>

#include<semaphore.h>
#include<pthread.h>
#include<cstdlib>
#include<cstring>

using namespace std;


void *fun(void *arg)
{
  char buff[128] = {"a b c d e f"};
  char *s = strtok(buff, " ");
  while (s != NULL)
  {
    printf("fun s=%s\n", s);
    sleep(1);
    s = strtok(NULL, " ");
  }
}
int main()
{
  pthread_t id;
  pthread_create(&id, NULL, fun, NULL);

  char buff[128] = {"1 2 3 4 5 6"};
  char *s = strtok(buff, " ");
  while (s != NULL)
  {
    printf("main s=%s\n", s);
    sleep(1);
    s = strtok(NULL, " ");
  }

  exit(0);
}

上述代码运行结果可能为：

也可能为：

其实还有其他情况，但这两张图片已经够了：同样的多线程代码运行了两次出现了不同情况。究其原因其实是strtok函数搞的鬼，它维护一个全局可见的静态变量:

The point where the last token was found is kept internally by the function to be used on the next call (particular library implementations are not required to avoid data races).

对于我们的代码而言，我们有main函数作为主线程，fun函数作为子线程，两者同时运行，但是strtok函数的静态变量全局只有一份，那么一个就会覆盖另一个的值，main函数如果先执行，切割了一个1，此时如果fun再执行，char *s = strtok(buff, " ");这一行代码会导致重新分割，重新来过。下一次如果是main函数的执行就到了while循环中的 s = strtok(NULL, " ");它会以空格为分界点，继续切割，没想到它被静态变量欺骗了，静态变量存储的是fun函数中（而不是main函数）标记的位置，所以main打印了b。以此类推，一直到f。特别地，如我们第二张图片所示，两个线程同时访问到了那个静态变量，都打印了f。

所以我们应该怎么做？我们可以使用strtok_r函数：

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <pthread.h>

void *fun(void *arg)
{
  char buff[128] = {"a b c d e f"};
  char *ptr = NULL;
  char *s = strtok_r(buff, " ", &ptr);
  while (s != NULL)
  {
    printf("fun s=%s\n", s);
    sleep(1);
    s = strtok_r(NULL, " ", &ptr);
  }
}
int main()
{
  pthread_t id;
  pthread_create(&id, NULL, fun, NULL);

  char buff[128] = {"1 2 3 4 5 6"};
  char *ptr = NULL;
  char *s = strtok_r(buff, " ", &ptr);
  while (s != NULL)
  {
    printf("main s=%s\n", s);
    sleep(1);
    s = strtok_r(NULL, " ", &ptr);
  }

  exit(0);
}

无论如何执行，结果如下：