Data Analysis With Python

news2024/11/15 10:52:40

文章目录

  • Data Analysis With Python
    • Analyzing Numerical Data with NumPy
    • Creating NumPy Array
    • NumPy Array Slicing
    • NumPy Array Broadcasting
    • Analyzing Data Using Pandas

In this article, we will discuss how to do data analysis with Python. We will discuss all sorts of data analysis i.e. analyzing numerical data with NumPy, Tabular data with Pandas, data visualization Matplotlib, and Exploratory data analysis.

Data Analysis With Python

Data Analysis is the technique of collecting, transforming, and organizing data to make future predictions and informed data-driven decisions. It also helps to find possible solutions for a business problem. There are six steps for Data Analysis. They are:

  • Ask or Specify Data Requirements
  • Prepare or Collect Data
  • Clean and Process
  • Analyze
  • Share
  • Act or Report

Analyzing Numerical Data with NumPy

NumPy is an array processing package in Python and provides a high-performance multidimensional array object and tools for working with these arrays. It is the fundamental package for scientific computing with Python.

Creating NumPy Array

NumPy arrays can be created in multiple ways, with various ranks. It can also be created with the use of different data types like lists, tuples, etc. The type of the resultant array is deduced from the type of elements in the sequences. NumPy offers several functions to create arrays with initial placeholder content. These minimize the necessity of growing arrays, an expensive operation.

Create Array using numpy.empty(shape, dtype=float, order=’C’)


import numpy as np
  
b = np.empty(2, dtype = int)
print("Matrix b : \n", b)
  
a = np.empty([2, 2], dtype = int)
print("\nMatrix a : \n", a)
  
c = np.empty([3, 3])
print("\nMatrix c : \n", c)

NumPy Array Slicing

Consider the syntax x[obj] where x is the array and obj is the index. The slice object is the index in the case of basic slicing. Basic slicing occurs when obj is :

a slice object that is of the form start: stop: step

  • an integer
  • or a tuple of slice objects and integers
  • All arrays generated by basic slicing are always the view in the original array.
# Python program for basic slicing.
import numpy as np
 
# Arrange elements from 0 to 19
a = np.arrange(20)
print("\n Array is:\n ",a)
 
# a[start:stop:step]
print("\n a[-8:17:1] = ",a[-8:17:1])
 
# The : operator means all elements till the end.
print("\n a[10:] = ",a[10:])

Ellipsis can also be used along with basic slicing. Ellipsis (…) is the number of : objects needed to make a selection tuple of the same length as the dimensions of the array.

# Python program for indexing using basic slicing with ellipsis
import numpy as np
 
# A 3 dimensional array.
b = np.array([[[1, 2, 3],[4, 5, 6]],
            [[7, 8, 9],[10, 11, 12]]])
 
print(b[...,1]) #Equivalent to b[: ,: ,1 ]

NumPy Array Broadcasting

The term broadcasting refers to how numpy treats arrays with different Dimensions during arithmetic operations which lead to certain constraints, the smaller array is broadcast across the larger array so that they have compatible shapes.

Let’s assume that we have a large data set, each datum is a list of parameters. In Numpy we have a 2-D array, where each row is a datum and the number of rows is the size of the data set. Suppose we want to apply some sort of scaling to all these data every parameter gets its own scaling factor or say Every parameter is multiplied by some factor.

Just to have a clear understanding, let’s count calories in foods using a macro-nutrient breakdown. Roughly put, the caloric parts of food are made of fats (9 calories per gram), protein (4 CPG), and carbs (4 CPG). So if we list some foods (our data), and for each food list its macro-nutrient breakdown (parameters), we can then multiply each nutrient by its caloric value (apply scaling) to compute the caloric breakdown of every food item.请添加图片描述
With this transformation, we can now compute all kinds of useful information. For example, what is the total number of calories present in some food or, given a breakdown of my dinner know how many calories did I get from protein and so on.

Let’s see a naive way of producing this computation with Numpy:

import numpy as np
 
macros = np.array([
[0.8, 2.9, 3.9],
[52.4, 23.6, 36.5],
[55.2, 31.7, 23.9],
[14.4, 11, 4.9]
])
 
# Create a new array filled with zeros,
# of the same shape as macros.
result = np.zeros_like(macros)
 
cal_per_macro = np.array([3, 3, 8])
 
# Now multiply each row of macros by
# cal_per_macro. In Numpy, `*` is
# element-wise multiplication between two arrays.
for i in range(macros.shape[0]):
    result[i, :] = macros[i, :] * cal_per_macro
 
result

Output:

array([[ 2.4, 8.7, 31.2],
[157.2, 70.8, 292. ],
[165.6, 95.1, 191.2],
[ 43.2, 33. , 39.2]])

Broadcasting Rules: Broadcasting two arrays together follow these rules:

  • If the arrays don’t have the same rank then prepend the shape of the lower rank array with 1s until both shapes have the same length.
  • The two arrays are compatible in a dimension if they have the same size in the dimension or if one of the arrays has size 1 in that dimension.
  • The arrays can be broadcast together if they are compatible with all dimensions.
  • After broadcasting, each array behaves as if it had a shape equal to the element-wise maximum of shapes of the two input arrays.
  • In any dimension where one array had a size of 1 and the other array had a size greater than 1, the first array behaves as if it were copied along that dimension.

import numpy as np
 
v = np.array([12, 24, 36])
w = np.array([45, 55])
 
# To compute an outer product we first
# reshape v to a column vector of shape 3x1
# then broadcast it against w to yield an output
# of shape 3x2 which is the outer product of v and w
print(np.reshape(v, (3, 1)) * w)
 
X = np.array([[12, 22, 33], [45, 55, 66]])
 
# x has shape 2x3 and v has shape (3, )
# so they broadcast to 2x3,
print(X + v)
 
# Add a vector to each column of a matrix X has
# shape 2x3 and w has shape (2, ) If we transpose X
# then it has shape 3x2 and can be broadcast against w
# to yield a result of shape 3x2.
 
# Transposing this yields the final result
# of shape 2x3 which is the matrix.
print((X.T + w).T)
 
# Another solution is to reshape w to be a column
# vector of shape 2X1 we can then broadcast it
# directly against X to produce the same output.
print(X + np.reshape(w, (2, 1)))
 
# Multiply a matrix by a constant, X has shape 2x3.
# Numpy treats scalars as arrays of shape();
# these can be broadcast together to shape 2x3.
print(X * 2)

Note: For more information, refer to our Python NumPy Tutorial.

Analyzing Data Using Pandas

Python Pandas Is used for relational or labeled data and provides various data structures for manipulating such data and time series. This library is built on top of the NumPy library. This module is generally imported as:

import pandas as pd

Here, pd is referred to as an alias to the Pandas. However, it is not necessary to import the library using the alias, it just helps in writing less amount code every time a method or property is called. Pandas generally provide two data structures for manipulating data, They are:

  • Series
  • Dataframe

Series:

Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called indexes. Pandas Series is nothing but a column in an excel sheet. Labels need not be unique but must be a hashable type. The object supports both integer and label-based indexing and provides a host of methods for performing operations involving the index.
在这里插入图片描述

It can be created using the Series() function by loading the dataset from the existing storage like SQL, Database, CSV Files, Excel Files, etc., or from data structures like lists, dictionaries, etc.

Python Pandas Creating Series


import pandas as pd
import numpy as np
 
 
# Creating empty series
ser = pd.Series()
 
print(ser)
 
# simple array
data = np.array(['g', 'e', 'e', 'k', 's'])
 
ser = pd.Series(data)
print(ser)

Dataframe:

Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three principal components, the data, rows, and columns.
在这里插入图片描述
It can be created using the Dataframe() method and just like a series, it can also be from different file types and data structures.

Python Pandas Creating Dataframe

import pandas as pd
 
# Calling DataFrame constructor
df = pd.DataFrame()
print(df)
 
# list of strings
lst = ['Geeks', 'For', 'Geeks', 'is',
            'portal', 'for', 'Geeks']
 
# Calling DataFrame constructor on list
df = pd.DataFrame(lst)

Creating Dataframe from CSV
We can create a dataframe from the CSV files using the read_csv() function.

Note: This dataset can be downloaded from here.

Python Pandas read CSV


import pandas as pd
 
# Reading the CSV file
df = pd.read_csv("Iris.csv")
 
# Printing top 5 rows
df.head()

Filtering DataFrame
Pandas dataframe.filter() function is used to Subset rows or columns of dataframe according to labels in the specified index. Note that this routine does not filter a dataframe on its contents. The filter is applied to the labels of the index.

Python Pandas Filter Dataframe

import pandas as pd
 
# Reading the CSV file
df = pd.read_csv("Iris.csv")
 
# applying filter function
df.filter(["Species", "SepalLengthCm", "SepalLengthCm"]).head()

Sorting DataFrame
In order to sort the data frame in pandas, the function sort_values() is used. Pandas sort_values() can sort the data frame in Ascending or Descending order.

Python Pandas Sorting Dataframe in Ascending Order
在这里插入图片描述

Pandas GroupBy
Groupby is a pretty simple concept. We can create a grouping of categories and apply a function to the categories. In real data science projects, you’ll be dealing with large amounts of data and trying things over and over, so for efficiency, we use the Groupby concept. Groupby mainly refers to a process involving one or more of the following steps they are:

→More

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1143619.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

平衡二叉树AVL的插入删除

在AVL树的插入操作中,假设插入一个结点后,当前节点p的平衡因子是﹣2,其左子结点的平衡因子是+1,左子结点的右子结点的平衡因子是﹣1。如图所示,请给出票转调整之后的结构。

微机原理与接口技术-第八章常用接口技术

文章目录 定时控制接口8253/8254定时器定时器的应用 并行接口并行接口电路8255内部引脚工作方式工作方式0:基本输入输出方式工作方式1:选通输入输出方式 编程 并行接口的应用用8255方式0与打印机接口 数码管及其接口数码管的工作原理单个数码管的显示多个…

目录和文件操作

在自己电脑任一盘符中新建以OS_Test命名的文件夹,并在该文件夹中新建新建3个以.txt,3个 .xlsx为扩展名的文件(文件名由代码随机生成,长度为8,由字母数字组成)。,请写一个程序,删除掉…

Unity的unity_ObjectToWorld里的每一列分别代表什么意思?换个方向反向理解-更简单

官方关键UnityObjectToWorldNormal() 代码 从乐乐姐的书中得知,当我们在shader想获得法线,大概会这么些 o.wordDir UnityObjectToWorldNormal(i.normal) (这行代码就包含了官方对“unity_ObjectToWorld”的终极理解…

视频批量剪辑技巧:如何实现震撼的嵌套合并效果

随着视频制作需求的不断增长,视频批量剪辑技巧在提高制作效率和质量方面显得尤为重要。本文将介绍云炫AI智剪实现震撼嵌套合并效果的方法,帮助您在视频制作过程中更高效地完成任务。 视频批量剪辑技巧是一种利用计算机技术实现自动化视频剪辑的方法。通…

串口占用检测工具

串口占用检测工具 平时需要检测哪个程序占用了串口,下面介绍一款非常方便的工具,它的工具箱里包含一个串口占用检测工具,可以非常方便的检测出来哪个程序占用了串口,并给出程序名和PID。 官网下载地址:http://www.red…

2023枣庄麒瑞音乐嘉年华济南新闻发布会

2023枣庄麒瑞音乐嘉年华新闻发布会今日在济南市西元大厦隆重举行,演唱会主办方枣庄恒立城市发展投资有限公司副经理刘畅先生、枣庄麒瑞文化董事长孙振敏女士,演唱会冠名方滕州爱啤士精酿啤酒有限公司总经理周静女士等和国内及山东省近30家主流新闻媒体到…

命令模式——让程序舒畅执行

● 命令模式介绍 命令模式(Command Pattern),是行为型设计模式之一。命令模式相对于其他的设计模式来说并没有那么多条条框框,其实并不是一个很“规矩”的模式,不过,就是基于一点,命令模式相对于…

局域网内两台电脑共享文件夹(通过网线直连共享数据)

文章目录 2.设置共享文件夹3.访问共享文件夹 1.将两台电脑置于同一局域网下 用网线将两台电脑连接关闭两台电脑防火墙将两台电脑IP地址设置在同一局域网下 测试是否在同一局域网下,使用ping命令 ping 192.168.0.122.设置共享文件夹 选择想要共享的文件夹&#xff…

刷题学习记录

sql注入(bugkuctf) 打开显示一个登录框 照常用admin用户名登录,密码随便填一个,显示密码错误 接着用admin为用户名登录,密码照样随便填,结果显示用户名不存在 题目提示基于布尔的SQL盲注,猜测后端是判断用…

【torch高级】一种新型的概率学语言pyro(02/2)

前文链接:【torch高级】一种新型的概率学语言pyro(01/2) 七、Pyro 中的推理 7.1 背景:变分推理 引言中的每项计算(后验分布、边际似然和后验预测分布)都需要执行积分,而这通常是不可能的或计算…

静力触探数据智能预处理(4)

静力触探数据智能预处理(4) 前言 数据处理方式已由手工1.0、计算机辅助2.0向人工智能3.0的趋势发展。机器学习是人工智能的基础,本文尝试应用机器学习中K均值聚类算法对孔压静力触探数据进行土的分类,分类结果不理想&#xff0c…

buuctf_练[安洵杯 2019]easy_web

[安洵杯 2019]easy_web 文章目录 [安洵杯 2019]easy_web掌握知识解题思路代码分析正式解题 关键paylaod 掌握知识 url地址和源代码的信息捕捉;图片和base64之间转换;base64和十六进制编码的了解;代码审计,绕过正则匹配对关键字的…

简易但很实用的javaswing/gui音乐播放器

视频浏览地址 很实用的一个javaswing音乐播放器。可以展示歌名,上一曲下一曲。 源码下载地址 支持:远程部署/安装/调试、讲解、二次开发/修改/定制

Java八股文 ----Redis篇

问题大纲 缓存穿透 原因:入侵者大量查询不存在的数据 使得Redis不断去访问数据库 然而Redis也无法缓存,就导致每次都会查询数据库...数据库的并发度不高 就会宕机 解决办法 布隆过滤器:作用:拦截不存在的数据 布隆过滤器 原理:把数据的id通过多次哈希计算标记数组,新来个数…

Easex样式样式

eg1&#xff1a;线形样式和描边 #include <stdio.h> #include <easyx.h> #include <iostream> #include <math.h> #define PI 3.14 // 1PI 180度 2PI 360度int main() {initgraph(800, 600);setorigin(400, 300);setaspectratio(1, -1);/*void setl…

基于Ubuntu20.04安装ROS系统

文章目录 一、ROS简介二、ROS安装三、ROS安装测试四、安装问题解决1. sudo rosdepc init&#xff1a;找不到命令2. ERROR: cannot download default sources list from...3. Command roscore not found...4. Resource not found: roslaunch... 一、ROS简介 ROS是用于编写机器人…

行业追踪,2023-10-27

自动复盘 2023-10-27 凡所有相&#xff0c;皆是虚妄。若见诸相非相&#xff0c;即见如来。 k 线图是最好的老师&#xff0c;每天持续发布板块的rps排名&#xff0c;追踪板块&#xff0c;板块来开仓&#xff0c;板块去清仓&#xff0c;丢弃自以为是的想法&#xff0c;板块去留让…

前后端分离不可忽视的陷阱,深入剖析挑战,分享解决方案,助你顺利实施分离开发。

不管你设计的系统架构是怎么样&#xff0c;最后都是你的组织内的沟通结构胜出。这个观点一直在组织内不断地被证明&#xff0c;但也不断地被忽略。 前后端分离的利与弊 近几年&#xff0c;随着微服务架构风格的引入、前后端生态的快速发展、多端产品化的出现&#xff0c;前后…

DevChat:VSCode中的AI黑马

前言 编程对于很多人来说&#xff0c;可能是一件复杂且耗时的事情。在结合当下各类AI产品层出不穷的情况下&#xff0c;我是有在认真的去拥抱AI来结合我们的工作&#xff0c;帮助我们的工作提升效率&#xff0c;尝试过我们的官方G P T&#xff0c;以及各类国产AI产品&#xff…