目录
pass:在观看此篇前先看上篇的awk介绍
一、文件
二、第一方法
1.步骤
编辑三、第二方法:
awk内容:
结果:
四、第二要求
统计独立ip
操作步骤:
1.先创建文件写入一下测试内容:
2.书写awk代码如下:
3.未生成之前:
4.生成后:
编辑 5.检查
五、第三要求
处理字段缺失的数据
内容:
1.问题:
2.奇异的解题思路---重构(无法解决)
编辑 3.小技巧:将空白部分保留下来打印
4.看下一个有字符如何打印:
5.解决:
总结:逗号不再是分隔符,可正常打印
六、第四要求
筛选给定时间范围内的日志
问题解释:
概念引入
相关例题:
文件中引入内容:
awk内容:
运行内容如下:
解释:
pass:在观看此篇前先看上篇的awk介绍
一、文件
找到自己目录下Apache的工作日志作为例子,这里我挑选了一个比较大的
127.0.0.1 - - [30/Jul/2023:08:34:54 +0800] "GET /less02/index.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:34:54 +0800] "GET /favicon.ico HTTP/1.1" 404 2659
127.0.0.1 - - [30/Jul/2023:08:36:05 +0800] "GET /less02/index.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:36:55 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:36:55 +0800] "GET /less02/js.js HTTP/1.1" 200 211
127.0.0.1 - - [30/Jul/2023:08:37:55 +0800] "-" 408 -
127.0.0.1 - - [30/Jul/2023:08:38:17 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:17 +0800] "GET /less02/js.js HTTP/1.1" 200 226
127.0.0.1 - - [30/Jul/2023:08:38:20 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:20 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:21 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:21 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:21 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:21 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:35 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:35 +0800] "GET /less02/js.js HTTP/1.1" 200 226
127.0.0.1 - - [30/Jul/2023:08:38:36 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:36 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:36 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:36 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:36 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:37 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:37 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:37 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:37 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:37 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:37 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:37 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:38 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:38 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:39 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:39 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:39 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:59 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:38:59 +0800] "GET /less02/js.js HTTP/1.1" 200 249
127.0.0.1 - - [30/Jul/2023:08:39:59 +0800] "-" 408 -
127.0.0.1 - - [30/Jul/2023:08:42:20 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:42:20 +0800] "GET /less02/js.js HTTP/1.1" 200 178
127.0.0.1 - - [30/Jul/2023:08:43:20 +0800] "-" 408 -
127.0.0.1 - - [30/Jul/2023:08:44:50 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:44:50 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:45:50 +0800] "-" 408 -
127.0.0.1 - - [30/Jul/2023:08:50:04 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:50:04 +0800] "GET /less02/js.js HTTP/1.1" 200 271
127.0.0.1 - - [30/Jul/2023:08:50:08 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:50:08 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:51:04 +0800] "-" 408 -
127.0.0.1 - - [30/Jul/2023:08:58:41 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:58:41 +0800] "GET /less02/js.js HTTP/1.1" 200 472
127.0.0.1 - - [30/Jul/2023:08:58:47 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:58:47 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:58:48 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:58:48 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:58:48 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:08:59:47 +0800] "-" 408 -
127.0.0.1 - - [30/Jul/2023:14:40:28 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:40:28 +0800] "GET /less02/js.js HTTP/1.1" 200 180
127.0.0.1 - - [30/Jul/2023:14:40:28 +0800] "GET /favicon.ico HTTP/1.1" 404 2659
127.0.0.1 - - [30/Jul/2023:14:40:53 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:40:53 +0800] "GET /less02/js.js HTTP/1.1" 200 180
127.0.0.1 - - [30/Jul/2023:14:40:54 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:40:54 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:41:39 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:41:39 +0800] "GET /less02/js.js HTTP/1.1" 200 180
127.0.0.1 - - [30/Jul/2023:14:41:39 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:41:39 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:41:40 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:41:40 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:41:40 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:42:39 +0800] "-" 408 -
127.0.0.1 - - [30/Jul/2023:14:42:51 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:42:51 +0800] "GET /less02/js.js HTTP/1.1" 200 189
127.0.0.1 - - [30/Jul/2023:14:43:51 +0800] "-" 408 -
127.0.0.1 - - [30/Jul/2023:14:44:03 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:44:03 +0800] "GET /less02/js.js HTTP/1.1" 200 231
127.0.0.1 - - [30/Jul/2023:14:45:03 +0800] "-" 408 -
127.0.0.1 - - [30/Jul/2023:14:48:51 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:48:51 +0800] "GET /less02/js.js HTTP/1.1" 200 253
127.0.0.1 - - [30/Jul/2023:14:48:52 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:48:52 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:49:51 +0800] "-" 408 -
127.0.0.1 - - [30/Jul/2023:14:52:27 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:14:52:27 +0800] "GET /less02/js.js HTTP/1.1" 200 281
127.0.0.1 - - [30/Jul/2023:21:56:45 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:56:45 +0800] "GET /less02/js.js HTTP/1.1" 200 36
127.0.0.1 - - [30/Jul/2023:21:57:15 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:15 +0800] "GET /less02/js.js HTTP/1.1" 200 34
127.0.0.1 - - [30/Jul/2023:21:57:36 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:36 +0800] "GET /less02/js.js HTTP/1.1" 200 33
127.0.0.1 - - [30/Jul/2023:21:57:38 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:38 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:38 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:38 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:38 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:38 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:57 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:57 +0800] "GET /less02/js.js HTTP/1.1" 200 31
127.0.0.1 - - [30/Jul/2023:21:57:58 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:58 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:58 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:58 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:58 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:58 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:59 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:59 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:59 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:59 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:59 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:57:59 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:00 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:00 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:00 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:00 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:00 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:00 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:01 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:01 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:01 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:01 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:01 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:01 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:01 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:02 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:02 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:02 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:02 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:02 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:02 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:03 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:03 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:58:57 +0800] "-" 408 -
127.0.0.1 - - [30/Jul/2023:21:59:25 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:59:25 +0800] "GET /less02/js.js HTTP/1.1" 200 32
127.0.0.1 - - [30/Jul/2023:21:59:26 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:59:26 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:59:26 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:59:26 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:59:26 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:59:26 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:59:26 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:59:27 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:59:27 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:59:27 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:59:27 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:59:27 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:59:45 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:59:45 +0800] "GET /less02/js.js HTTP/1.1" 200 32
127.0.0.1 - - [30/Jul/2023:21:59:46 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:21:59:46 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:22:00:34 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:22:00:34 +0800] "GET /less02/js.js HTTP/1.1" 200 33
127.0.0.1 - - [30/Jul/2023:22:00:34 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:22:00:34 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:22:00:37 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:22:00:37 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:22:01:34 +0800] "-" 408 -
127.0.0.1 - - [30/Jul/2023:22:05:39 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:22:05:39 +0800] "GET /less02/js.js HTTP/1.1" 200 51
127.0.0.1 - - [30/Jul/2023:22:05:39 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:22:05:39 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:22:05:40 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:22:05:41 +0800] "GET /less02/js.js HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:22:05:41 +0800] "GET /less02/index1.html HTTP/1.1" 304 -
127.0.0.1 - - [30/Jul/2023:22:06:39 +0800] "-" 408 -
127.0.0.1 - - [30/Jul/2023:23:35:11 +0800] "GET /less02/tools HTTP/1.1" 404 2659
127.0.0.1 - - [30/Jul/2023:23:35:27 +0800] "GET /tools/ HTTP/1.1" 200 56719
127.0.0.1 - - [30/Jul/2023:23:35:27 +0800] "GET /tools/assets/main.css HTTP/1.1" 200 626596
127.0.0.1 - - [30/Jul/2023:23:35:27 +0800] "GET /tools/images/file-32x32.png HTTP/1.1" 200 1946
127.0.0.1 - - [30/Jul/2023:23:35:27 +0800] "GET /tools/images/file-128x128.png HTTP/1.1" 200 19378
127.0.0.1 - - [30/Jul/2023:23:35:27 +0800] "GET /tools/images/cook_male-32x32.png HTTP/1.1" 200 1624
127.0.0.1 - - [30/Jul/2023:23:35:27 +0800] "GET /tools/assets/main.js HTTP/1.1" 200 4237575
127.0.0.1 - - [30/Jul/2023:23:41:08 +0800] "GET /less/index.html HTTP/1.1" 404 2659
127.0.0.1 - - [30/Jul/2023:23:41:20 +0800] "GET /less01/index.html HTTP/1.1" 200 790
127.0.0.1 - - [30/Jul/2023:23:41:20 +0800] "GET /less01/css/style.css HTTP/1.1" 200 55
127.0.0.1 - - [30/Jul/2023:23:41:34 +0800] "GET /less02/index.html HTTP/1.1" 200 323
127.0.0.1 - - [30/Jul/2023:23:42:34 +0800] "-" 408 -
二、第一方法
统计日志中各IP访问304状态码的次数
1.步骤
首先第一步先测试(看状态码是否可以正常打印)
cat access.log | awk '{print $1 $9}'
其次统计出其次数和状态码如下:
注:因为我本机的Apache一直用来测试,因此只有访问本地端,真实的应该为:
当我们看到一个ip访问多次的时候,就应该明白此为爆破扫描ip应及时封除
这里我以我的举例如下:
awk '$9==304{arr[$1]++}END{for(i in arr){print arr[i],i}}' access.log
三、第二方法:
awk内容:
$9 == 200 {
arr[$1]++
}
END {
PROCINFO["sorted_in"] = "@val_num_desc";
for (i in arr) {
if (cnt++ == 10) {
exit
}
print arr[i], i
}
}
结果:
访问多次解决方法:自动封堵
如何解决:
统计非200状态码的ip,并获取次数最多的前10个ip
awk中排序函数 sort asort
设置排序顺序PROCINFO
四、第二要求
统计独立ip
需求:统计每个URL的独立访问IP有多少个(去重),并且要为每个URL保存一个对应的文件
操作步骤:
1.先创建文件写入一下测试内容:
a.com.cn|202.109.134.23|2015-11-20 20:34:43|guest
b.com.cn|202.109.134.23|2015-11-20 20:34:48|guest
c.com.cn|202.109.134.24|2015-11-20 20:34:48|guest
a.com.cn|202.109.134.23|2015-11-20 20:34:43|guest
a.com.cn|202.109.134.24|2015-11-20 20:34:43|guest
b.com.cn|202.109.134.25|2015-11-20 20:34:48|guest
2.书写awk代码如下:
BEGIN{
FS="|"
}
!arr[$1,$2]++{
arr1[$1]++
}
END{
for(i in arr1) {
print i, arr[i] > (i".txt")
}
}
~
3.未生成之前:
4.生成后:
5.检查
五、第三要求
处理字段缺失的数据
内容:
ID name gender age email phone
1 Bob male 28 abc@qq.com 18023394012
2 Alice female 24 def@gmail.com 18084925203
3 Tony male 21 17048792503
4 Kevin male 21 bbb@189.com 17023929033
5 Alex male 18 ccc@xyz.com 18185904230
6 Andy female ddd@139.com 18923902352
7 Jerry female 25 exdsa@189.com 18785234906
8 Peter male 20 bax@qq.com 17729348758
9 Steven 23 bc@sohu.com 15947893212
10 Bruce female 27 bcbd@139.com 13942943905
1.问题:
当字段缺失时很明显打印错误
2.奇异的解题思路---重构(无法解决)
3.小技巧:将空白部分保留下来打印
awk '{print $0}' FIELDWIDTHS="2 2:6 2:6 2:3 2:13 2:11" a.txt
FIELDWIDTH第一个字段是字符宽度ID为2,指定2个字符宽度
第两个字段最大为6,但前面和ID之间还有两个空格,所以可以指定宽度为8,也可以跳过两个字符2:6
awk 'NR==4{print $5}' FIELDWIDTHS="2 2:6 2:6 2:3 2:13 2:11" a.txt
4.看下一个有字符如何打印:
5.解决:
FPAT可以收集正则匹配的结果,并将它们保存在各个字段中。(就像grep匹配成功的部分会加颜色显示,而使用FPAT划分字段,则是将匹配成功的部分保存在字段$1 $2 $3...
中)。
总结:逗号不再是分隔符,可正常打印
cat demo2.txt | awk 'BEGIN{FPAT="[^,]+|\".*\""}{print $1 $3}'
六、第四要求
筛选给定时间范围内的日志
问题解释:
grep/sed/awk用正则去筛选日志时,如果要精确到小时、分钟、秒,则非常难以实现。
但是awk提供了mktime()函数,它可以将时间转换成epoch时间值。
借此,可以取得日志中的时间字符串部分,再将它们的年、月、日、时、分、秒都取出来,然后放入mktime()构建成对应的epoch值。因为epoch值是数值,所以可以比较大小,从而决定时间的大小。
概念引入
在 AWK 编程语言中,时间戳通常用于处理文本数据中的时间信息。AWK 是一种用于文本处理和数据提取的编程语言,它允许你使用模式匹配和操作来处理文本文件中的行和字段。
在 AWK 中,你可以使用内置函数 systime()
来获取当前的 Unix 时间戳,它返回从 Epoch 时间(1970 年 1 月 1 日)到当前时间的秒数。这可以用于处理时间戳相关的操作。
相关例题:
文件中引入内容:
John 2023-08-01 15:30:45
Alice 2023-08-02 12:45:00
Bob 2023-08-03 09:15:30
awk内容:
awk '{
cmd = "date -d \"" $2 " " $3 "\" +%s";
cmd | getline timestamp;
close(cmd);
print $1, timestamp
}' data.txt
运行内容如下:
解释:
$2
和 $3
表示输入行中的第二个和第三个字段,即日期和时间。date -d
命令被用来将日期和时间转换为 Unix 时间戳,+%s
参数表示输出结果以秒为单位。getline
函数用于执行外部命令并读取其输出,将结果存储在 timestamp
变量中,然后通过 print
命令输出名字和对应的 Unix 时间戳。