在计算进程D状态持续时间及等IO的时间遇到的一处问题

news2025/4/22 12:24:19

一、背景

之前的博客 线程每次iodelay监控及D状态开始和结束监控并做堆栈记录-CSDN博客 里,我们讲到了通过内核模块抓取D状态的进程和等IO事件的方法,里面也用到了通过获取rq的symbol,再去获取rq里的rq_clock_task时间的方法(内核模块里访问struct rq及获取rq_clock_task时间的方法-CSDN博客),通过获取到rq_clock_task时间,可以在同一个时间轴上去判断和计算D状态的持续时间和等IO的持续时间。但是会遇到如下的情况,即抓出的等io的时间或者D状态的时间是一个很大的数值,如下图:

这个数其实就是long类型的负数,为什么会造成这种情况,我们会在第二章里做实验,然后得出结果。然后在第三章里,改造这个抓取D状态持续时间和等io时间的内核模块。

二、进一步实验及原因分析

如第一章里所说,异常时,这个数值其实就是一个long类型的负数,取绝对值后,其值并不大,所以,减数和被减数在时间轴上的位置并不很远。而之前的博客 与调度相关的内核时间接口的分析及实现介绍_中断里的逻辑耗时超过500us-CSDN博客 里,我们讲到了rq_clock_task接口获取到的时间的原理,它是一个单调递增的时间,但是它只是相对于每个核心上,它是单调递增的,另外,rq_clock_task的更新并不像local_clock那样每次返回回来基本都会增加,rq_clock_task的增加的节奏是由调度逻辑被动触发的,并不是实时的。

2.1 尝试使用local_clock来替代my_rq_clock_task来作为被减数

实验方式如下:

虽然这样并不容易复现减数大于被减数的情况,但是运行很久时间以后,仍然还是复现了:

另外,从原理上,它也是不成立的,因为不同的核之间的local_clock在时间轴上的比如时间A和时间B这两个时刻,local_clock并不能确保靠后的时间B这个时刻上的所有的核心上的local_clock数值都比时间A时刻的所有核心上的数值都高。更多的这块细节见之前的博客 与调度相关的内核时间接口的分析及实现介绍_中断里的逻辑耗时超过500us-CSDN博客。

另外,在之前的博客 内核模块里访问struct rq及获取rq_clock_task时间的方法-CSDN博客 里,我们也讲到,如果直接用local_clock来作为被减数来进行运算在某些内核选项打开下是不行的。我们还是得用rq_clock_task这个时间轴来作为减数。

2.1 增加运行的核的判断,如果发生任务迁移则不打印

如下修改:

判断是否上一次运行的核心和进行唤醒时的核心是否不一样,如果不一样则不打印。

这样修改之后,就不在出现异常的打印了。基本也符合理论上的解释,因为不同的核心的rq_clock_task的更新节奏是不一样的,且更新的时间轴也是不一样的;但是同一个核心上rq_clock_task则是单调递增的。

三、改造后的抓取D状态和等io时间的内核模块

我们根据 2.1 里的实验的结果,修改一下被减数,从my_rq_clock_task改成任务原运行核心上的rq_clock_task:

如上图修改,修改了被减数,变成了原任务核心上的rq_clock_task数值,另外,去掉了是否发生迁移的检查,如果发生减数大于被减数的情况则进行打印,可以发现长时间运行后并没有异常情况出现了。

修改过后的内核模块完整源码如下:

#include <linux/module.h>
#include <linux/capability.h>
#include <linux/sched.h>
#include <linux/uaccess.h>
#include <linux/proc_fs.h>
#include <linux/ctype.h>
#include <linux/seq_file.h>
#include <linux/poll.h>
#include <linux/types.h>
#include <linux/ioctl.h>
#include <linux/errno.h>
#include <linux/stddef.h>
#include <linux/lockdep.h>
#include <linux/kthread.h>
#include <linux/sched.h>
#include <linux/delay.h>
#include <linux/wait.h>
#include <linux/init.h>
#include <asm/atomic.h>
#include <trace/events/workqueue.h>
#include <linux/sched/clock.h>
#include <linux/string.h>
#include <linux/mm.h>
#include <linux/interrupt.h>
#include <linux/tracepoint.h>
#include <trace/events/osmonitor.h>
#include <trace/events/sched.h>
#include <trace/events/irq.h>
#include <trace/events/kmem.h>
#include <linux/ptrace.h>
#include <linux/uaccess.h>
#include <asm/processor.h>
#include <linux/sched/task_stack.h>
#include <linux/nmi.h>
#include <asm/apic.h>
#include <linux/version.h>
#include <linux/sched/mm.h>
#include <asm/irq_regs.h>
#include <linux/kallsyms.h>
#include <linux/kprobes.h>
#include <linux/stop_machine.h>

MODULE_LICENSE("GPL");
MODULE_AUTHOR("zhaoxin");
MODULE_DESCRIPTION("Module for monitor D tasks.");
MODULE_VERSION("1.0");

#define IODELAY_TRACEPOINT_ENABLE

#define TEST_STACK_TRACE_ENTRIES   32

typedef unsigned int (*stack_trace_save_tsk_func)(struct task_struct *task,
				  unsigned long *store, unsigned int size,
				  unsigned int skipnr);
stack_trace_save_tsk_func _stack_trace_save_tsk;

typedef int (*get_cmdline_func)(struct task_struct *task, char *buffer, int buflen);
get_cmdline_func _get_cmdline_func;

#define TESTDIOMONITOR_SAMPLEDESC_SWDSTART  "swDstart"
#define TESTDIOMONITOR_SAMPLEDESC_WADSTOP    "waDstop"
#define TESTDIOMONITOR_SAMPLEDESC_SWDIOSTART "swDiostart"
#define TESTDIOMONITOR_SAMPLEDESC_WADIOSTOP  "waDiostop"
#define TESTDIOMONITOR_SAMPLEDESC_DEXCEED    "Dexceed"
#define TESTDIOMONITOR_SAMPLEDESC_DIOEXCEED  "Dioexceed"
#define TESTDIOMONITOR_SAMPLEDESC_IOEXCEED   "Ioexceed"

// 1ms
//#define TESTDIOMONITOR_DEXCEED_THRESHOLD     1000ull//1000000ull


struct uclamp_bucket {
	unsigned long value : bits_per(SCHED_CAPACITY_SCALE);
	unsigned long tasks : BITS_PER_LONG - bits_per(SCHED_CAPACITY_SCALE);
};

struct uclamp_rq {
	unsigned int value;
	struct uclamp_bucket bucket[UCLAMP_BUCKETS];
};

/* CFS-related fields in a runqueue */
struct cfs_rq {
	struct load_weight	load;
	unsigned int		nr_running;
	unsigned int		h_nr_running;      /* SCHED_{NORMAL,BATCH,IDLE} */
	unsigned int		idle_nr_running;   /* SCHED_IDLE */
	unsigned int		idle_h_nr_running; /* SCHED_IDLE */

	u64			exec_clock;
	u64			min_vruntime;
#ifdef CONFIG_SCHED_CORE
	unsigned int		forceidle_seq;
	u64			min_vruntime_fi;
#endif

#ifndef CONFIG_64BIT
	u64			min_vruntime_copy;
#endif

	struct rb_root_cached	tasks_timeline;

	/*
	 * 'curr' points to currently running entity on this cfs_rq.
	 * It is set to NULL otherwise (i.e when none are currently running).
	 */
	struct sched_entity	*curr;
	struct sched_entity	*next;
	struct sched_entity	*last;
	struct sched_entity	*skip;

#ifdef	CONFIG_SCHED_DEBUG
	unsigned int		nr_spread_over;
#endif

#ifdef CONFIG_SMP
	/*
	 * CFS load tracking
	 */
	struct sched_avg	avg;
#ifndef CONFIG_64BIT
	u64			last_update_time_copy;
#endif
	struct {
		raw_spinlock_t	lock ____cacheline_aligned;
		int		nr;
		unsigned long	load_avg;
		unsigned long	util_avg;
		unsigned long	runnable_avg;
	} removed;

#ifdef CONFIG_FAIR_GROUP_SCHED
	unsigned long		tg_load_avg_contrib;
	long			propagate;
	long			prop_runnable_sum;

	/*
	 *   h_load = weight * f(tg)
	 *
	 * Where f(tg) is the recursive weight fraction assigned to
	 * this group.
	 */
	unsigned long		h_load;
	u64			last_h_load_update;
	struct sched_entity	*h_load_next;
#endif /* CONFIG_FAIR_GROUP_SCHED */
#endif /* CONFIG_SMP */

#ifdef CONFIG_FAIR_GROUP_SCHED
	struct rq		*rq;	/* CPU runqueue to which this cfs_rq is attached */

	/*
	 * leaf cfs_rqs are those that hold tasks (lowest schedulable entity in
	 * a hierarchy). Non-leaf lrqs hold other higher schedulable entities
	 * (like users, containers etc.)
	 *
	 * leaf_cfs_rq_list ties together list of leaf cfs_rq's in a CPU.
	 * This list is used during load balance.
	 */
	int			on_list;
	struct list_head	leaf_cfs_rq_list;
	struct task_group	*tg;	/* group that "owns" this runqueue */

	/* Locally cached copy of our task_group's idle value */
	int			idle;

#ifdef CONFIG_CFS_BANDWIDTH
	int			runtime_enabled;
	s64			runtime_remaining;

	u64			throttled_pelt_idle;
#ifndef CONFIG_64BIT
	u64                     throttled_pelt_idle_copy;
#endif
	u64			throttled_clock;
	u64			throttled_clock_pelt;
	u64			throttled_clock_pelt_time;
	int			throttled;
	int			throttle_count;
	struct list_head	throttled_list;
#ifdef CONFIG_SMP
	struct list_head	throttled_csd_list;
#endif
#endif /* CONFIG_CFS_BANDWIDTH */
#endif /* CONFIG_FAIR_GROUP_SCHED */
};

struct rt_prio_array {
	DECLARE_BITMAP(bitmap, MAX_RT_PRIO+1); /* include 1 bit for delimiter */
	struct list_head queue[MAX_RT_PRIO];
};

/* Real-Time classes' related field in a runqueue: */
struct rt_rq {
	struct rt_prio_array	active;
	unsigned int		rt_nr_running;
	unsigned int		rr_nr_running;
#if defined CONFIG_SMP || defined CONFIG_RT_GROUP_SCHED
	struct {
		int		curr; /* highest queued rt task prio */
#ifdef CONFIG_SMP
		int		next; /* next highest */
#endif
	} highest_prio;
#endif
#ifdef CONFIG_SMP
	unsigned int		rt_nr_migratory;
	unsigned int		rt_nr_total;
	int			overloaded;
	struct plist_head	pushable_tasks;

#endif /* CONFIG_SMP */
	int			rt_queued;

	int			rt_throttled;
	u64			rt_time;
	u64			rt_runtime;
	/* Nests inside the rq lock: */
	raw_spinlock_t		rt_runtime_lock;

#ifdef CONFIG_RT_GROUP_SCHED
	unsigned int		rt_nr_boosted;

	struct rq		*rq;
	struct task_group	*tg;
#endif
};

/* Deadline class' related fields in a runqueue */
struct dl_rq {
	/* runqueue is an rbtree, ordered by deadline */
	struct rb_root_cached	root;

	unsigned int		dl_nr_running;

#ifdef CONFIG_SMP
	/*
	 * Deadline values of the currently executing and the
	 * earliest ready task on this rq. Caching these facilitates
	 * the decision whether or not a ready but not running task
	 * should migrate somewhere else.
	 */
	struct {
		u64		curr;
		u64		next;
	} earliest_dl;

	unsigned int		dl_nr_migratory;
	int			overloaded;

	/*
	 * Tasks on this rq that can be pushed away. They are kept in
	 * an rb-tree, ordered by tasks' deadlines, with caching
	 * of the leftmost (earliest deadline) element.
	 */
	struct rb_root_cached	pushable_dl_tasks_root;
#else
	struct dl_bw		dl_bw;
#endif
	/*
	 * "Active utilization" for this runqueue: increased when a
	 * task wakes up (becomes TASK_RUNNING) and decreased when a
	 * task blocks
	 */
	u64			running_bw;

	/*
	 * Utilization of the tasks "assigned" to this runqueue (including
	 * the tasks that are in runqueue and the tasks that executed on this
	 * CPU and blocked). Increased when a task moves to this runqueue, and
	 * decreased when the task moves away (migrates, changes scheduling
	 * policy, or terminates).
	 * This is needed to compute the "inactive utilization" for the
	 * runqueue (inactive utilization = this_bw - running_bw).
	 */
	u64			this_bw;
	u64			extra_bw;

	/*
	 * Maximum available bandwidth for reclaiming by SCHED_FLAG_RECLAIM
	 * tasks of this rq. Used in calculation of reclaimable bandwidth(GRUB).
	 */
	u64			max_bw;

	/*
	 * Inverse of the fraction of CPU utilization that can be reclaimed
	 * by the GRUB algorithm.
	 */
	u64			bw_ratio;
};

struct rq {
	/* runqueue lock: */
	raw_spinlock_t		__lock;

	/*
	 * nr_running and cpu_load should be in the same cacheline because
	 * remote CPUs use both these fields when doing load calculation.
	 */
	unsigned int		nr_running;
#ifdef CONFIG_NUMA_BALANCING
	unsigned int		nr_numa_running;
	unsigned int		nr_preferred_running;
	unsigned int		numa_migrate_on;
#endif
#ifdef CONFIG_NO_HZ_COMMON
#ifdef CONFIG_SMP
	unsigned long		last_blocked_load_update_tick;
	unsigned int		has_blocked_load;
	call_single_data_t	nohz_csd;
#endif /* CONFIG_SMP */
	unsigned int		nohz_tick_stopped;
	atomic_t		nohz_flags;
#endif /* CONFIG_NO_HZ_COMMON */

#ifdef CONFIG_SMP
	unsigned int		ttwu_pending;
#endif
	u64			nr_switches;

#ifdef CONFIG_UCLAMP_TASK
	/* Utilization clamp values based on CPU's RUNNABLE tasks */
	struct uclamp_rq	uclamp[UCLAMP_CNT] ____cacheline_aligned;
	unsigned int		uclamp_flags;
#define UCLAMP_FLAG_IDLE 0x01
#endif

	struct cfs_rq		cfs;
	struct rt_rq		rt;
	struct dl_rq		dl;

#ifdef CONFIG_FAIR_GROUP_SCHED
	/* list of leaf cfs_rq on this CPU: */
	struct list_head	leaf_cfs_rq_list;
	struct list_head	*tmp_alone_branch;
#endif /* CONFIG_FAIR_GROUP_SCHED */

	/*
	 * This is part of a global counter where only the total sum
	 * over all CPUs matters. A task can increase this counter on
	 * one CPU and if it got migrated afterwards it may decrease
	 * it on another CPU. Always updated under the runqueue lock:
	 */
	unsigned int		nr_uninterruptible;

	struct task_struct __rcu	*curr;
	struct task_struct	*idle;
	struct task_struct	*stop;
	unsigned long		next_balance;
	struct mm_struct	*prev_mm;

	unsigned int		clock_update_flags;
	u64			clock;
	/* Ensure that all clocks are in the same cache line */
	u64			clock_task ____cacheline_aligned;
	u64			clock_pelt;
	unsigned long		lost_idle_time;

	atomic_t		nr_iowait;

#ifdef CONFIG_SCHED_DEBUG
	u64 last_seen_need_resched_ns;
	int ticks_without_resched;
#endif

#ifdef CONFIG_MEMBARRIER
	int membarrier_state;
#endif

#ifdef CONFIG_SMP
	struct root_domain		*rd;
	struct sched_domain __rcu	*sd;

	unsigned long		cpu_capacity;
	unsigned long		cpu_capacity_orig;

	struct callback_head	*balance_callback;

	unsigned char		nohz_idle_balance;
	unsigned char		idle_balance;

	unsigned long		misfit_task_load;

	/* For active balancing */
	int			active_balance;
	int			push_cpu;
	struct cpu_stop_work	active_balance_work;

	/* CPU of this runqueue: */
	int			cpu;
	int			online;

	struct list_head cfs_tasks;

	struct sched_avg	avg_rt;
	struct sched_avg	avg_dl;
#ifdef CONFIG_HAVE_SCHED_AVG_IRQ
	struct sched_avg	avg_irq;
#endif
#ifdef CONFIG_SCHED_THERMAL_PRESSURE
	struct sched_avg	avg_thermal;
#endif
	u64			idle_stamp;
	u64			avg_idle;

	unsigned long		wake_stamp;
	u64			wake_avg_idle;

	/* This is used to determine avg_idle's max value */
	u64			max_idle_balance_cost;

#ifdef CONFIG_HOTPLUG_CPU
	struct rcuwait		hotplug_wait;
#endif
#endif /* CONFIG_SMP */

#ifdef CONFIG_IRQ_TIME_ACCOUNTING
	u64			prev_irq_time;
#endif
#ifdef CONFIG_PARAVIRT
	u64			prev_steal_time;
#endif
#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
	u64			prev_steal_time_rq;
#endif

	/* calc_load related fields */
	unsigned long		calc_load_update;
	long			calc_load_active;

#ifdef CONFIG_SCHED_HRTICK
#ifdef CONFIG_SMP
	call_single_data_t	hrtick_csd;
#endif
	struct hrtimer		hrtick_timer;
	ktime_t 		hrtick_time;
#endif

#ifdef CONFIG_SCHEDSTATS
	/* latency stats */
	struct sched_info	rq_sched_info;
	unsigned long long	rq_cpu_time;
	/* could above be rq->cfs_rq.exec_clock + rq->rt_rq.rt_runtime ? */

	/* sys_sched_yield() stats */
	unsigned int		yld_count;

	/* schedule() stats */
	unsigned int		sched_count;
	unsigned int		sched_goidle;

	/* try_to_wake_up() stats */
	unsigned int		ttwu_count;
	unsigned int		ttwu_local;
#endif

#ifdef CONFIG_CPU_IDLE
	/* Must be inspected within a rcu lock section */
	struct cpuidle_state	*idle_state;
#endif

#ifdef CONFIG_SMP
	unsigned int		nr_pinned;
#endif
	unsigned int		push_busy;
	struct cpu_stop_work	push_work;

#ifdef CONFIG_SCHED_CORE
	/* per rq */
	struct rq		*core;
	struct task_struct	*core_pick;
	unsigned int		core_enabled;
	unsigned int		core_sched_seq;
	struct rb_root		core_tree;

	/* shared state -- careful with sched_core_cpu_deactivate() */
	unsigned int		core_task_seq;
	unsigned int		core_pick_seq;
	unsigned long		core_cookie;
	unsigned int		core_forceidle_count;
	unsigned int		core_forceidle_seq;
	unsigned int		core_forceidle_occupation;
	u64			core_forceidle_start;
#endif
};

// runqueues (not export symbol)
struct rq* _prq = NULL;

struct rq* my_cpu_rq(int i_cpu)
{
    return per_cpu_ptr(_prq, i_cpu);
}

u64 my_rq_clock_task(void)
{
    struct rq* prq = my_cpu_rq(smp_processor_id());
    return prq->clock_task;
}

typedef struct testdiomonitor_sample {
    struct timespec64 time;
    int cpu;
    int pid;
    int tgid;
    int ppid;
    char comm[TASK_COMM_LEN];
    char ppidcomm[TASK_COMM_LEN];
    // 0 or 1
    int bin_iowait;
    /*
     * "swDstart" // 在sched_switch里
     * "waDstop"  // 在sched_waking里
     * "swDiostart" // 在sched_switch里
     * "waDiostop"  // 在sched_waking里
     * "Dexceed"    // 超出阈值,非iowait
     * "Dioexceed"  // 超出阈值,iowait
     */
    const char* desc;
    u64 dtimens;    // 纳秒单位,D状态持续的时间
    u64 iowaittimens;   // 纳秒单位,等待io的时间
    int stackn;
    void* parray_stack[TEST_STACK_TRACE_ENTRIES];
    u32 writedone;  // 0 or 1
} testdiomonitor_sample;

#define TESTDIOMONITOR_SAMPLE_RINGBUFF_MAXCOUNT  8192

typedef struct testdiomonitor_sample_ringbuff {
    testdiomonitor_sample* parray_sample;
    volatile u64 wp;    // Index is wp & (TESTDIOMONITOR_SAMPLE_RINGBUFF_MAXCOUNT - 1).
    volatile u64 rp;    // Index is rp & (TESTDIOMONITOR_SAMPLE_RINGBUFF_MAXCOUNT - 1).
    u32 skipcount;  // 0 means no skip any abnormal event
} testdiomonitor_sample_ringbuff;

#define TESTDIOMONITOR_LINEBUFF  1024

typedef struct testdiomonitor_env {
    struct file* file;
    char file_linebuff[TESTDIOMONITOR_LINEBUFF];
    int headoffset;
    loff_t file_pos;
    testdiomonitor_sample_ringbuff ringbuff;
} testdiomonitor_env;

static testdiomonitor_env _env;

static struct delayed_work work_write_file;
static struct workqueue_struct *wq_write_file;

#define FILENAME        "test.txt"

void init_file(void)
{
    _env.file = filp_open(FILENAME, O_WRONLY | O_CREAT | O_TRUNC, 0644);
    if (IS_ERR(_env.file)) {
        _env.file = NULL;
    }
}

void exit_file(void)
{
    if (_env.file) {
        filp_close(_env.file, NULL);
    }
}

void testdiomonitor_write_file(char* i_pchar, int i_size)
{
    if (_env.file) {
        kernel_write(_env.file, i_pchar, i_size, &_env.file_pos);
    }
}

void testdiomonitor_write_file_emptyline(void)
{
    testdiomonitor_write_file("\n", strlen("\n"));
}

void testdiomonitor_file_oneline(const char* i_format, ...)
{
    char* pcontent = &_env.file_linebuff[_env.headoffset];
    va_list args;
    va_start(args, i_format);
    vsnprintf(pcontent, TESTDIOMONITOR_LINEBUFF - _env.headoffset, i_format, args);
    va_end(args);
    testdiomonitor_write_file(_env.file_linebuff, strlen(_env.file_linebuff));
}

void testdiomonitor_replace_null_with_space(char *str, int n) {
    for (int i = 0; i < n - 1; i++) {
        if (str[i] == '\0') {
            str[i] = ' ';
        }
    }
}

void testdiomonitor_set_cmdline(char* i_pbuff, int i_buffsize, struct task_struct* i_ptask)
{
    int ret = _get_cmdline_func(i_ptask, i_pbuff, i_buffsize);
    if (ret <= 0) {
        i_pbuff[0] = '\0';
        return;
    }
    testdiomonitor_replace_null_with_space(i_pbuff, ret);
    i_pbuff[ret - 1] = '\0';
}

void testdiomonitor_checkget_parentinfo_and_cmdline(testdiomonitor_sample* io_psample, struct task_struct* i_ptask)
{
    struct task_struct* parent;
    rcu_read_lock();
    parent = rcu_dereference(i_ptask->real_parent);
    io_psample->ppid = parent->pid;
    strlcpy(io_psample->ppidcomm, parent->comm, TASK_COMM_LEN);
    rcu_read_unlock();
}

#define TESTDIOMONITOR_COMMANDLINE_MAX 128

static void write_file(struct work_struct *w)
{
    ssize_t ret;
    u32 index;
    testdiomonitor_sample* psample;
    struct tm t;
    char timestr[64];
    char exceedstr[64];
    char temp_commandline[TESTDIOMONITOR_COMMANDLINE_MAX];
    struct pid* pid_struct;
    struct task_struct* ptask;
    int stacki;
    while (_env.ringbuff.rp != _env.ringbuff.wp) {
        index = (_env.ringbuff.rp & (TESTDIOMONITOR_SAMPLE_RINGBUFF_MAXCOUNT - 1));
        psample = &_env.ringbuff.parray_sample[index];
        if (psample->writedone != 1) {
            break;
        }
        testdiomonitor_write_file_emptyline();
        _env.headoffset = sprintf(_env.file_linebuff, "[%llu][%s] ", _env.ringbuff.rp, psample->desc);
        time64_to_tm(psample->time.tv_sec + 8 * 60 * 60, 0, &t);
        snprintf(timestr, 64, "%04ld-%02d-%02d-%02d_%02d_%02d.%09ld",
		    1900 + t.tm_year, t.tm_mon + 1, t.tm_mday, t.tm_hour, t.tm_min, t.tm_sec, psample->time.tv_nsec);
        if (psample->desc == TESTDIOMONITOR_SAMPLEDESC_DEXCEED) {
            snprintf(exceedstr, 64, "dtimens[%llu]", psample->dtimens);
        }
        else if (psample->desc == TESTDIOMONITOR_SAMPLEDESC_DIOEXCEED) {
            snprintf(exceedstr, 64, "iowaittimens[%llu]", psample->iowaittimens);
        }
        else if (psample->desc == TESTDIOMONITOR_SAMPLEDESC_IOEXCEED) {
            snprintf(exceedstr, 64, "delayacct_iowaittimens[%llu]", psample->iowaittimens);
        }
        else {
            exceedstr[0] = '\0';
        }
        testdiomonitor_file_oneline("begin...time[%s]cpu[%d]desc[%s]%s\n", 
            timestr, psample->cpu, psample->desc, exceedstr);
        testdiomonitor_file_oneline("tgid[%d]pid[%d]comm[%s]ppid[%d]ppidcomm[%s]\n",
            psample->tgid, psample->pid, psample->ppidcomm, psample->pid, psample->comm);
        pid_struct = find_get_pid(psample->pid);
        if (pid_struct) {
            ptask = get_pid_task(pid_struct, PIDTYPE_PID);
            if (ptask) {
                testdiomonitor_set_cmdline(temp_commandline, TESTDIOMONITOR_COMMANDLINE_MAX, ptask);
                put_task_struct(ptask);
            }
            else {
                temp_commandline[0] = '\0';
            }
            put_pid(pid_struct);
        }
        else {
            temp_commandline[0] = '\0';
        }
        testdiomonitor_file_oneline("commandline[%s]\n", temp_commandline);
        pid_struct = find_get_pid(psample->ppid);
        if (pid_struct) {
            ptask = get_pid_task(pid_struct, PIDTYPE_PID);
            if (ptask) {
                testdiomonitor_set_cmdline(temp_commandline, TESTDIOMONITOR_COMMANDLINE_MAX, ptask);
                put_task_struct(ptask);
            }
            else {
                temp_commandline[0] = '\0';
            }
            put_pid(pid_struct);
        }
        else {
            temp_commandline[0] = '\0';
        }
        testdiomonitor_file_oneline("ppid_commandline[%s]\n", temp_commandline);
        testdiomonitor_file_oneline("stack[%d]:\n", psample->stackn);
        for (stacki = 0; stacki < psample->stackn; stacki++) {
            testdiomonitor_file_oneline("%*c%pS\n", 5, ' ', (void *)psample->parray_stack[stacki]);
        }
        testdiomonitor_write_file_emptyline();
        psample->writedone = 0;
        _env.ringbuff.rp ++;
    }
    queue_delayed_work_on(nr_cpu_ids - 1, wq_write_file,
        &work_write_file, 1);
}

static void init_write_file(void)
{
    init_file();
    wq_write_file = alloc_workqueue("testdiomonitor_write_file", WQ_MEM_RECLAIM, 0);
    INIT_DELAYED_WORK(&work_write_file, write_file);
    queue_delayed_work_on(nr_cpu_ids - 1, wq_write_file,
        &work_write_file, 3);
}

static void exit_write_file(void)
{
    cancel_delayed_work_sync(&work_write_file);
    destroy_workqueue(wq_write_file);
    exit_file();
}

void init_testdiomonitor_sample_ringbuff(void) 
{
    testdiomonitor_sample* psample;
    _env.ringbuff.parray_sample = kvzalloc(sizeof(testdiomonitor_sample) * TESTDIOMONITOR_SAMPLE_RINGBUFF_MAXCOUNT, GFP_KERNEL);
}

void exit_testdiomonitor_sample_ringbuff(void)
{
    kvfree(_env.ringbuff.parray_sample);
}

testdiomonitor_sample* testdiomonitor_get_psample(void)
{
    u64 windex_raw, windex_raw_old;
    u32 windex;
    while (1) {
        windex_raw = _env.ringbuff.wp;
        if (windex_raw - _env.ringbuff.rp >= (u64)(TESTDIOMONITOR_SAMPLE_RINGBUFF_MAXCOUNT)) {
            _env.ringbuff.skipcount ++;
            return NULL;
        }
        // atomic_cmpxchg return old value
		windex_raw_old = atomic64_cmpxchg((atomic64_t*)&_env.ringbuff.wp,
			windex_raw, windex_raw + 1);
        if (windex_raw_old == windex_raw) {
            break;
        }
    }
    windex = (u32)(windex_raw & (u64)(TESTDIOMONITOR_SAMPLE_RINGBUFF_MAXCOUNT - 1));
    return &_env.ringbuff.parray_sample[windex];
}

void testdiomonitor_add_sample(const char* i_desc, struct task_struct* i_task, u64 i_timens)
{
    testdiomonitor_sample* psample = testdiomonitor_get_psample();
    if (!psample) {
        return;
    }
    ktime_get_real_ts64(&psample->time);
    psample->cpu = task_cpu(i_task);
    psample->pid = i_task->pid;
    psample->tgid = i_task->tgid;
    strlcpy(psample->comm, i_task->comm, TASK_COMM_LEN);
    testdiomonitor_checkget_parentinfo_and_cmdline(psample, i_task);
    psample->bin_iowait = i_task->in_iowait;
    psample->desc = i_desc;
    if (i_desc == TESTDIOMONITOR_SAMPLEDESC_DEXCEED) {
        psample->dtimens = i_timens;
    }
    else if (i_desc == TESTDIOMONITOR_SAMPLEDESC_DIOEXCEED || i_desc == TESTDIOMONITOR_SAMPLEDESC_IOEXCEED) {
        psample->iowaittimens = i_timens;
    }
    psample->stackn = _stack_trace_save_tsk(i_task, (unsigned long*)psample->parray_stack, TEST_STACK_TRACE_ENTRIES, 0);
    psample->writedone = 1;
}

static void cb_sched_switch(void *i_data, bool i_preempt,
	struct task_struct *i_prev,
	struct task_struct *i_next,
	unsigned int i_prev_state)
{
    void* parray_stack[TEST_STACK_TRACE_ENTRIES];
    int num_stack;
    int stacki;
    if (i_prev_state == TASK_UNINTERRUPTIBLE) {
        if (i_prev->in_iowait) {
            testdiomonitor_add_sample(TESTDIOMONITOR_SAMPLEDESC_SWDIOSTART, i_prev, 0);
        }
        else {
            testdiomonitor_add_sample(TESTDIOMONITOR_SAMPLEDESC_SWDSTART, i_prev, 0);
        }
    }
}

static void cb_sched_waking(void *i_data, struct task_struct *i_p) {
    if (i_p->__state == TASK_UNINTERRUPTIBLE) {
        //u64 currns = my_rq_clock_task();
        struct rq* prq = my_cpu_rq(task_cpu(i_p));
        u64 currns = prq->clock_task;
        u64 local_c = local_clock();
        int cpuid = smp_processor_id();
        if (i_p->in_iowait) {
            testdiomonitor_add_sample(TESTDIOMONITOR_SAMPLEDESC_WADIOSTOP, i_p, 0);
            testdiomonitor_add_sample(TESTDIOMONITOR_SAMPLEDESC_DIOEXCEED, i_p, currns - i_p->se.exec_start);
            if (i_p->se.exec_start > currns) 
            {
                //if (task_cpu(i_p) == cpuid) 
                {
                    printk("comm[%s]pid[%d]exec_start[%llu]currns[%llu]local_clock[%llu]last_cpu[%d]cpuid[%d]\n", 
                        i_p->comm, i_p->pid, i_p->se.exec_start, currns, local_c, task_cpu(i_p), cpuid);
                }
            }
        }
        else {
            testdiomonitor_add_sample(TESTDIOMONITOR_SAMPLEDESC_WADSTOP, i_p, 0);
            testdiomonitor_add_sample(TESTDIOMONITOR_SAMPLEDESC_DEXCEED, i_p, my_rq_clock_task() - i_p->se.exec_start);
            if (i_p->se.exec_start > currns) 
            {
                //if (task_cpu(i_p) == cpuid) 
                {
                    printk("comm[%s]pid[%d]exec_start[%llu]currns[%llu]local_clock[%llu]last_cpu[%d]cpuid[%d]\n", 
                        i_p->comm, i_p->pid, i_p->se.exec_start, currns, local_c, task_cpu(i_p), cpuid);
                }
            }
        }
    }
}

static void cb_iodelay_account(void *i_data, struct task_struct *i_curr,
				    unsigned long long i_delta)
{
    testdiomonitor_add_sample(TESTDIOMONITOR_SAMPLEDESC_IOEXCEED, i_curr, i_delta);
}

struct kern_tracepoint {
    void *callback;
    struct tracepoint *ptr;
    bool bregister;
};
static void clear_kern_tracepoint(struct kern_tracepoint *tp)
{
    if (tp->bregister) {
        tracepoint_probe_unregister(tp->ptr, tp->callback, NULL);
    }
}

#define INIT_KERN_TRACEPOINT(tracepoint_name) \
    static struct kern_tracepoint mykern_##tracepoint_name = {.callback = NULL, .ptr = NULL, .bregister = false};


#define TRACEPOINT_CHECK_AND_SET(tracepoint_name)                                             \
    static void tracepoint_name##_tracepoint_check_and_set(struct tracepoint *tp, void *priv) \
    {                                                                                \
        if (!strcmp(#tracepoint_name, tp->name))                                     \
        {                                                                            \
            ((struct kern_tracepoint *)priv)->ptr = tp;                          \
            return;                                                                  \
        }                                                                            \
    }

INIT_KERN_TRACEPOINT(sched_switch)
TRACEPOINT_CHECK_AND_SET(sched_switch)
INIT_KERN_TRACEPOINT(sched_waking)
TRACEPOINT_CHECK_AND_SET(sched_waking)
#ifdef IODELAY_TRACEPOINT_ENABLE
INIT_KERN_TRACEPOINT(iodelay_account)
TRACEPOINT_CHECK_AND_SET(iodelay_account)
#endif

typedef unsigned long (*kallsyms_lookup_name_func)(const char *name);
kallsyms_lookup_name_func _kallsyms_lookup_name_func;

void* get_func_by_symbol_name_kallsyms_lookup_name(void)
{
    int ret;
    void* pfunc = NULL;
	struct kprobe kp;
	memset(&kp, 0, sizeof(kp));
	kp.symbol_name = "kallsyms_lookup_name";
	kp.pre_handler = NULL;
	kp.addr = NULL;	// 作为强调,提示使用symbol_name
	ret = register_kprobe(&kp);
	if (ret < 0) {
		printk("register_kprobe fail!\n");
		return NULL;
	}
	printk("register_kprobe succeed!\n");
    pfunc = (void*)kp.addr;
	unregister_kprobe(&kp);
    return pfunc;
}

void* get_func_by_symbol_name(const char* i_symbol)
{
    if (_kallsyms_lookup_name_func == NULL) {
        return NULL;
    }
    return _kallsyms_lookup_name_func(i_symbol);
}


static int __init testdiomonitor_init(void)
{
    _kallsyms_lookup_name_func = get_func_by_symbol_name_kallsyms_lookup_name();

    _prq = get_func_by_symbol_name("runqueues");
    if (_prq == NULL) {
        printk(KERN_ERR "get_func_by_symbol_name runqueues failed!\n");
        return -1;
    }

    init_testdiomonitor_sample_ringbuff();
    init_write_file();

    _stack_trace_save_tsk = get_func_by_symbol_name("stack_trace_save_tsk");
    if (_stack_trace_save_tsk == NULL) {
        printk(KERN_ERR "get_func_by_symbol_name stack_trace_save_tsk failed!\n");
        return -1;
    }
    _get_cmdline_func = get_func_by_symbol_name("get_cmdline");
    if (_get_cmdline_func == NULL) {
        printk(KERN_ERR "get_func_by_symbol_name get_cmdline failed!\n");
        return -1;
    }
    mykern_sched_switch.callback = cb_sched_switch;
    for_each_kernel_tracepoint(sched_switch_tracepoint_check_and_set, &mykern_sched_switch);
    if (!mykern_sched_switch.ptr) {
        printk(KERN_ERR "mykern_sched_switch register failed!\n");
        return -1;
    }
    else {
        printk(KERN_INFO "mykern_sched_switch register succeeded!\n");
    }
    tracepoint_probe_register(mykern_sched_switch.ptr, mykern_sched_switch.callback, NULL);
    mykern_sched_switch.bregister = 1;

    mykern_sched_waking.callback = cb_sched_waking;
    for_each_kernel_tracepoint(sched_waking_tracepoint_check_and_set, &mykern_sched_waking);
    if (!mykern_sched_waking.ptr) {
        printk(KERN_ERR "mykern_sched_waking register failed!\n");
        return -1;
    }
    else {
        printk(KERN_INFO "mykern_sched_waking register succeeded!\n");
    }
    tracepoint_probe_register(mykern_sched_waking.ptr, mykern_sched_waking.callback, NULL);
    mykern_sched_waking.bregister = 1;

#ifdef IODELAY_TRACEPOINT_ENABLE
    mykern_iodelay_account.callback = cb_iodelay_account;
    for_each_kernel_tracepoint(iodelay_account_tracepoint_check_and_set, &mykern_iodelay_account);
    if (!mykern_iodelay_account.ptr) {
        printk(KERN_ERR "mykern_iodelay_account register failed!\n");
        return -1;
    }
    else {
        printk(KERN_INFO "mykern_iodelay_account register succeeded!\n");
    }
    tracepoint_probe_register(mykern_iodelay_account.ptr, mykern_iodelay_account.callback, NULL);
    mykern_iodelay_account.bregister = 1;
#endif

    return 0;
}

static void __exit testdiomonitor_exit(void)
{
    clear_kern_tracepoint(&mykern_sched_switch);
    clear_kern_tracepoint(&mykern_sched_waking);
#ifdef IODELAY_TRACEPOINT_ENABLE
    clear_kern_tracepoint(&mykern_iodelay_account);
#endif
    tracepoint_synchronize_unregister();
    exit_write_file();
    exit_testdiomonitor_sample_ringbuff();
}

module_init(testdiomonitor_init);
module_exit(testdiomonitor_exit);

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2321349.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

Android11-12-13 替换系统默认壁纸

替换默认壁纸&#xff0c;是客需中再普通不过的需求&#xff0c;这里整理作为笔记记录 文章目录 需求场景 关联资料需求实现拓展总结 需求 客制化客户壁纸需求&#xff0c;替换客户定制的壁纸。 场景 手机-平板相关产品&#xff0c;各种广告机、工控、消费级产品&#xff0c…

Buffer overFolw---Kryo序列化出现缓冲区溢出的问题解决

问题&#xff1a; 由于我的数据量太大&#xff0c;我设置批次为10000万&#xff0c;50w数据大概有400M左右&#xff0c;然后进行spark数据处理时候报错为org.apache.spark.SparkException:Kryo serialization failed:Buffer overFolw.Available:0,rquired 58900977,To …

【Linux】线程基础

&#x1f525;个人主页&#xff1a;Quitecoder &#x1f525;专栏&#xff1a;linux笔记仓 目录 01.背景知识02.线程概念简单使用线程线程调度成本更低 01.背景知识 OS进行内存管理&#xff0c;不是以字节为单位的&#xff0c;而是以内存块为单位的&#xff0c;默认大小为4kb&…

「宇树科技」13家核心零部件供应商梳理!

2025年2月6日&#xff0c;摩根士丹利&#xff08;Morgan Stanley&#xff09;发布最新人形机器人研报&#xff1a;Humanoid 100: Mapping the Humanoid Robot Value Chain&#xff08;人形机器人100&#xff1a;全球人形机器人产业链梳理&#xff09;。 2025年2月20日&#xf…

Spring Boot 项目打包运行

打包成jar包&#xff0c;执行 java -jar 包名&#xff1b; 保证打出的jar包是独立可运行的包&#xff0c;需要xml中添加插件配置 <!-- SpringBoot应用打包插件--> <build><plugins><plugin><groupId>org.springframework.boot</groupId&…

数据结构八股

线性数据结构 数组:数组的内存空间是连续的&#xff0c;随机访问的时间复杂度是01&#xff0c;适用于需要按索引访问元素的场景&#xff0c;但是插入和删除元素较慢&#xff0c;时间复杂度是On链表:链表是由节点组成&#xff0c;节点之间是分散存储的&#xff0c;内存不连续&a…

7.2 控件和组件

版权声明&#xff1a;本文为博主原创文章&#xff0c;转载请在显著位置标明本文出处以及作者网名&#xff0c;未经作者允许不得用于商业目的 C#工具箱位于编辑窗口的左侧&#xff0c;它默认内置了大量的控件和组件。控件一般派生于System.Windows.Forms.Control类&#xff0c;显…

transform

http://zhihu.com/question/445556653/answer/3254012065 西科技的文章 视频讲解 小白也能听懂的 transformer模型原理详解 self- attention 多头注意力机制 encoder decoder 机器翻译_哔哩哔哩_bilibili

思库拉水厂开业庆典千人大会回顾

近日,思库拉离子水厂在广州隆重举办了开业盛典,现场汇聚了逾千名嘉宾。此次盛会不仅是对思库拉离子水厂正式投产的庆祝,更是对思库拉品牌未来蓝图的一次展示。 现场氛围热烈,洋溢着浓厚的喜庆气息。参与者来自五湖四海,既有思库拉的忠实拥趸,也有对思库拉产品充满兴趣的潜在消费…

JAVA学习*Object类

Object类 Object类是所有类的父类 类中有一些方法&#xff08;都需要掌握&#xff09; toString()方法 在学习类的对象的时候有介绍过了&#xff0c;当我们重新给此方法就会打印类与对象的信息 equals()方法 在Java中的比较&#xff0c; 如果左右两侧是基本类型变量&#…

基于python脚本实现的打砖块小游戏

目录 1. 打砖块游戏 2. 初始化 Pygame 和设置屏幕 3. 定义游戏对象 3.1 定义玩家操作的paddle 3.2 定义球&#xff08;Ball&#xff09; 3.3 砖块&#xff08;Bricks&#xff09; 4. 游戏主循环 4.1 事件处理 4.2 板子移动 4.3 球移动和碰撞检测 4.4 绘制游戏对象 …

地理信息系统(GIS)在智慧城市中的40个应用场景案例

在智慧城市发展进程中&#xff0c;地理信息系统&#xff08;GIS&#xff09;作为关键技术之一&#xff0c;正扮演着不可或缺的角色&#xff0c;堪称智慧城市的神经中枢。通过空间数据分析优化城市管理&#xff0c;GIS技术为智慧城市的构建提供了强大的支持。 本文分享了GIS在智…

XSS Game(DOM型) 靶场 通关

目录 靶场网址 Ma Spaghet! 分析 解题 Jefff 分析 解题 方法一 方法二 Ugandan Knuckles 分析 解题 Ricardo Milos 分析 解题 Ah Thats Hawt 分析 解题 方法一 方法二 Ligma 分析 解题 ​ Mafia 分析 解题 方法一&#xff1a;构造函数 方法二&#xf…

【大模型基础_毛玉仁】3.5 Prompt相关应用

目录 3.5 相关应用3.5.1 基于大语言模型的Agent3.5.2 数据合成3.5.3 Text-to-SQL3.5.4 GPTs 3.5 相关应用 Prompt工程应用广泛&#xff0c;能提升大语言模型处理基础及复杂任务的能力&#xff0c;在构建Agent、数据合成、Text-to-SQL转换和设计个性化GPTs等方面不可或缺。 . …

deepseek搭建本地私有知识库dify安装介绍docker compose图文教程

按照官方介绍&#xff0c;打开教程安装。下载源码&#xff0c; # 假设当前最新版本为 0.15.3 git clone https://github.com/langgenius/dify.git --branch 0.15.3 进入docker目录&#xff0c; cd dify/docker 网络科学的直接执行命令就可以了。 docker compose up -d 镜…

右击没有Word、PPT、Excel功能

右击没有Word、PPT、Excel功能 导航 文章目录 右击没有Word、PPT、Excel功能导航一、问题描述二、事情经过三、解决方案其他思路分享 一、问题描述 ​ 在安装并激活了office之后&#xff0c;业务反馈右击没有出现新建Word功能&#xff0c;仅有Word文档 二、事情经过 ​ 按道…

DeepSeek R1 本地部署指南 (6) - Windows 本地部署使用 GPU 运行

DeepSeek R1 本地部署指南 (1) - Windows 本地部署 上一篇&#xff0c;安装好 Windows 本地步骤后&#xff0c;如果发现在任务管理器中 GPU 显示 0%。 1.在命令行中输入&#xff1a; ollama ps 显示&#xff1a; PROCESSOR CPU 2.安装 CUDA Toolkit CUDA Toolkit Downloads htt…

婚姻的解构与重构 | 一场关于选择与责任的探索

注&#xff1a;本文为 “婚姻的解构与重构” 相关文章合辑。 未整理。 明明渴望爱情 为何反感催婚&#xff1f; 原创 常 晋 人民日报评论 2024 年 04 月 22 日 12:29 北京 没有催促指责&#xff0c;也毫无批评之意。面对单身、失业的 30 岁女儿&#xff0c;只是鼓励孩子&…

jangow靶机攻略

配置网卡 VMware需要配置&#xff0c;不配置扫不到ip,VirtualBox正常打开ip会直接显示出来 网卡配置都改成NAT 打开虚拟机&#xff0c;第一个框选第二行&#xff0c;回车 选第二个&#xff0c;按e键 进入下一个框后&#xff0c;将ro 后面的修改为 rw signin init/bin/bash 按…

自动化测试框架维护成本高怎么办

自动化测试框架维护成本高&#xff0c;可以通过优化测试用例设计、引入持续集成&#xff08;CI&#xff09;策略、强化代码规范和审查机制、建立明确的维护计划、定期进行技术债务清理等方式来降低成本。 其中&#xff0c;优化测试用例设计尤其关键&#xff0c;它不仅能提高测试…