Python读取复杂电子表格(CSV)数据小技巧一则

news2025/2/26 3:46:18

关于CSV格式

逗号分隔值(Comma-Separated Values,CSV,有时也称为字符分隔值,因为分隔字符也可以不是逗号),其文件以纯文本形式存储表格数据(数字和文本)。“CSV”并不是一种单一的、定义明确的格式(尽管RFC 4180有一个被通常使用的定义)。

python中csv模块中定义的函数:

csv.reader(csvfile, dialect=‘excel’, **fmtparams)

返回一个可以遍历csv文件的reader对象。dialect参数可以用于定义一组特定的csv方言参数,是Dialect类的子类或者list_dialects()函数返回的字符串。从csv文件读取的每一行都作为字符串列表返回。除非指定了QUOTE_NONNUMERIC格式选项(在这种情况下,未加引号的字段将转换为浮点数),否则不会执行自动的数据类型转换。

待处理CSV文件

此文件是外部接口提供的文件,由于时间是比较久远的软件,或者,其他原因,内容有些散乱,如下图所示:
在这里插入图片描述
示例数据如下:

"","","","","","","","","","","","","","","","","油品销售明细表","","","","","","","","","","","",""
"","","加油站名称:","","","","广州*********加油站        ","","","","","","","","","","","","","","","","","","","","","",""
"","从:","","2021-10-01 00:00:00","","","","","","","到:","2022-10-01 23:59:59","","","","","","","","","","","","","","","","",""
"","流水号","","","","","","交易时间","","","","","油枪号码","","油品名称","","","油品单价","","体积","","交易金额","","起泵码","","止泵码","","","备注"
"","","","","479171","","","","2021-10-23 16:21:00","","","","","1","","95号 车用汽油(ⅥA)","","","8.21","","58.83","","479.49","","380693.19","","","380752.02",""
"","","","","","259635","","","","2021-10-23 16:32:00","","","","1","","95号 车用汽油(ⅥA)","","","8.21","","60.90","","493.00","","380752.02","","","380812.92",""
"","","","","","259636","","","","2021-10-23 16:34:00","","","","1","","95号 车用汽油(ⅥA)","","","8.21","","56.19","","446.95","","380812.92","","","380869.11",""
"","","","","","479251","","","","2021-10-23 18:30:00","","","","1","","95号 车用汽油(ⅥA)","","","8.21","","70.76","","573.94","","380869.11","","","380939.87",""
"","","","","","86765","","","","2021-10-23 18:35:00","","","","1","","95号 车用汽油(ⅥA)","","","8.21","","44.03","","361.49","","380939.87","","","380983.90",""
"","","","","479289","","","","","2021-10-23 20:11:00","","","","1","","95号 车用汽油(ⅥA)","","","8.21","","6.09","","50.00","","380983.90","","","380989.99",""
"","","","","","86775","","","","2021-10-23 20:30:00","","","","1","","95号 车用汽油(ⅥA)","","","8.21","","49.13","","393.53","","380989.99","","","381039.12",""
"","","","","479309","","","","","2021-10-23 21:23:00","","","","1","","95号 车用汽油(ⅥA)","","","8.21","","24.36","","200.00","","381039.12","","","381063.48",""
"","","","","","479413","","","","2021-10-24 03:29:00","","","","1","","95号 车用汽油(ⅥA)","","","8.21","","33.47","","271.29","","381063.48","","","381096.95",""
"打印时间:2022-10-28","","","","","","","","","","","","","","","","","","","","","","","","","","填表人:","",""
"","流水号","","","交易时间","","油枪号码","","油品名称","","油品单价","","体积","","交易金额","","起泵码","","止泵码","","","备注"
"","","","86814","","2021-10-24 09:47:00","","1","","95号 车用汽油(ⅥA)","","8.21","","52.29","","429.30","","381157.85","","","381210.14",""
"","","259822","","","2021-10-24 09:59:00","","1","","95号 车用汽油(ⅥA)","","8.21","","46.09","","374.90","","381210.14","","","381256.23",""

python使用csv模块解析数据

方法一,是按单元格逐行个性化解析,例如参考上次XLS格式数据处理《Python按单元格读取复杂电子表格(Excel)数据实践》,这个方法,挺麻烦的,发现第二个方法后,过段放弃此方法。

方法二,提取有效数据解析,由于CSV格式数据不跨行,可以逐行剔除空项,而直接取有效数据,代码非常简单,如下所示:

import csv
import pandas as pd

# 以读方式打开文件
dat_row = []
with open("油品销售明细202110-202210.CSV", mode="r") as f:    
    # 基于打开的文件,创建csv.reader实例
    reader = csv.reader(f)
    
    # 逐行获取数据,并输出
    for row in reader:
        dat_col = [v for v in row if len(v)>0]
        
        n = n + 1
        if len(dat_col)==9:
            dat_row.append(dat_col)
                 
cols_list = ['流水号', '交易时间', '油枪号码', '油品名称', '油品单价', '体积', '交易金额', '起泵码', '止泵码']
df = pd.DataFrame(dat_row,columns=cols_list)
df.to_csv('detail.csv',encoding='utf_8_sig',index=False)

注:其中,“dat_col = [v for v in row if len(v)>0]”代码是按行,过滤没有数据的单元格。

小结

对于没有合并单元格(此处为跨行)的数据文件解析,使用适当的方法还是很简单的,非常喜欢简单的方法!

参考:

快乐江小鱼. Python基础 - csv文件格式. CSDN博客. 2022.08
肖永威. 《Python按单元格读取复杂电子表格(Excel)数据实践》. CSDN博客. 2022.11

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/45026.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

竞赛——【蓝桥杯】2022年12月第十四届蓝桥杯模拟赛第二期Java

1、最小的2022 问题描述 请找到一个大于 2022 的最小数,这个数转换成二进制之后,最低的 6 个二进制为全为 0 。 请将这个数的十进制形式作为答案提交。 答案提交 这是一道结果填空的题,你只需要算出结果后提交即可。本题的结果为一个整数…

VWware-安装AD域服务

AD域控就是基于轻型目录访问协议将企业网络中的资源(包括用户、计算机、服务器、数据库、共享文件、共享打印机等)合理、安全、有效的管理起来。通俗来说就是:解决了单点登录,简化了身份认证,完成了不同用户资源之间的…

Python连接Clickhouse遇坑篇,耗时一天成功连接!

首先!!!!!!!!!不要看网上那些乱七八糟的使用clickhouse-driver连接了,真tm难用,端口能搞死你那种,超级烦! 推荐直接看官方…

建筑材料企业如何进行采购价格管理?SCM系统助力企业灵活控制采购价格

如今的“广厦千万间”早已矗立在中华大地的各个角落,建筑行业早已经成为中国国民经济发展的重要支柱性产业。同时随着近几年数字经济的兴起,给传统建筑材料行业带来了巨大挑战,如何优化生产运营、保障设备稳定运行、提高和稳定产品质量是每个…

string类的常用接口说明

STL六大组件: 容器 算法 配接器 迭代器 仿函数 空间配置器 温馨提示:只讲常用接口,使用方法说明详见代码注释 目录 一、string类对象的常见构造 二、string类对象的容量操作 三、类对象的访问及遍历操作 四、string类对象的修改操…

万古霉素修饰银纳米粒/磁性纳米微球/负载万古霉素PLGA缓释微球/硅包银纳米三角片

小编今天给大家带来的科研知识是万古霉素修饰银纳米粒/磁性纳米微球/负载万古霉素PLGA缓释微球/硅包银纳米三角片,来看! 万古霉素修饰磁性纳米微球的制备: 磁性纳米粒子具有纳米粒子一般的特性外还具有超顺磁性,而且通过表面修饰可以连接上不同的生物功…

Windows10安装配置allure

1、allure官方文档: https://docs.qameta.io/allure/#_about 官方文档中,windows部署allure步骤: 奈何提示scoop不是內部命令 2、安装scoop scoop官方文档:https://scoop.sh/ 需要打开power shell,执行提示的两条…

【SQL Server + MySQL二 】SQL: DDL数据定义【定义、修改、删除基本表】,DML【憎删改查】,DCL数据控制语言

极其感动!!!当时学数据库的时候,没白学!! 时隔很长时间回去看数据库的笔记都能看懂,每次都靠这份笔记巩固真的是语雀分享要花钱,要不一定把笔记给贴出来(;༎ຶД༎ຶ) ,除…

SpringCloud搭建微服务之OAuth2实现SSO单点登录

SSO单点登录实现方式有多种&#xff0c;在这里不介绍理论&#xff0c;本文只讨论采用spring-security-oauth2来实现&#xff0c;本文共有三个服务&#xff0c;一个权限认证中心&#xff0c;两个客户端 1. 认证服务搭建 1.1. 引入核心依赖 <dependency><groupId>…

网络服务---OSI七层参考模型及各层工作原理详解

OSI网络模型概念 OSI模型&#xff08;Open System Interconnection/Reference Model&#xff09;是指国际标准化组织(ISO)提出的一个试图使各种计算机在世界范围内互连为网络的标准框架&#xff0c;简称OSI。1981年&#xff0c;为了解决不同体系结构的网络的互联问题&#xff…

在IDEA中配置MySQL数据库连接以及在使用mybatis时设置sql语句的代码提示功能

在IDEA中配置MySQL数据库连接以及在使用mybatis时设置sql语句的代码提示功能 一&#xff1a;在IDEA中配置MySQL数据库连接 第一步&#xff1a;在IDEA右侧区域有database选项&#xff0c;点击进去 第二步&#xff1a;database -> data soucre -> mysql 第三步&#xf…

只看优点,这2款可视化产品你更心水谁?

现代的数据可视化设计一般喜欢追求更加高效的工具&#xff0c;我们在选择可视化工具的时候&#xff0c;一定会被繁多的可视化产品晃得眼花缭乱。今天给大家推荐2款我用过的可视化软件&#xff0c;不谈缺陷&#xff0c;只看优点&#xff0c;看看哪款更和你的心意吧&#xff01; …

Acrel-2000M马达保护与监控系统解决方案具有保护、遥控功能可实现无人或少人值守

安科瑞 李可欣 具体可咨询&#xff1a;Acrel_lkx Acrel-2000M马达保护与监控系统&#xff0c;是根据马达监控系统自动化及无人值守的要求&#xff0c;总结国内外的研究和生产的先进经验&#xff0c;专门研制出的新一代马达监控系统。本系统具有保护、遥测、遥信、遥脉、遥调、…

瑞吉外卖(五) 全局异常处理

全局异常处理如何进行全局异常处理&#xff1f;效果展示**ControllerAdvice**如何进行全局异常处理&#xff1f; 效果展示 ControllerAdvice 本质上就是Component&#xff0c;然后&#xff0c;我们来看一下此类的注释&#xff1a; 这个类是为那些声明了&#xff08;ExceptionH…

Airtest手机APP自动化操作微信

感觉Appium太垃圾了&#xff0c;于是顺手学了下Airtest 安装并解压 官网&#xff0c;有很显眼的下载按钮 下载完zip文件后进行解压 启动自带AirtestIDE.exe 不想登录的可以跳过 因为提前通过数据线连接了手机和电脑了&#xff0c;所以一进去就显示已经连接到手机设备了 当然…

前端css元素yi

996技术站 - 活在未来 | KingSun966技术站&#xff0c;极客带你看世界&#xff01;https://www.996station.com程序员开发指南Descriptionhttps://guide.996station.com css元素溢出 当子元素的尺寸超过父元素的尺寸时&#xff0c;需要设置父元素显示溢出的子元素的方式&#x…

[附源码]SSM计算机毕业设计小锅米线点餐管理系统JAVA

项目运行 环境配置&#xff1a; Jdk1.8 Tomcat7.0 Mysql HBuilderX&#xff08;Webstorm也行&#xff09; Eclispe&#xff08;IntelliJ IDEA,Eclispe,MyEclispe,Sts都支持&#xff09;。 项目技术&#xff1a; SSM mybatis Maven Vue 等等组成&#xff0c;B/S模式 M…

Redis——》数据类型:bitmap

推荐链接&#xff1a; 总结——》【Java】 总结——》【Mysql】 总结——》【Redis】 总结——》【Spring】 总结——》【SpringBoot】 总结——》【MyBatis、MyBatis-Plus】 Redis——》数据类型&#xff1a;bitmap一、底层结构二、最大长度三、操作示…

IPv6进阶:IPv6 过渡技术之 6to4 自动隧道

实验拓扑 R1-R3-R2之间的网络为IPv4公网&#xff1b;PC1及PC2处于IPv6孤岛。 实验需求 R1及R2为IPv6/IPv4双栈设备&#xff1b;在R1及R2上部署6to4自动隧道使得PC1及PC2能够互相访问。 实验步骤及配置 Internet Router的配置如下 [R3] interface GigabitEthernet0/0/0 [R3…

gan与dcgan训练自己的数据集

gan https://blog.csdn.net/weixin_50113231/article/details/122959899 dcgan 源码地址&#xff1a;https://github.com/carpedm20/DCGAN-tensorflow 安装教程 环境配置 将代码克隆到本地后首先按照官网所需依赖环境进行配置 由于该文章比较早所以python与tensorflow最好按…