Pandas is an extension library for the Python language for data analysis.
Pandas is a powerful tool set for analyzing structured data, based on Numpy (providing high-performance matrix operations).
Pandas can import data from various file formats such as CSV, JSON, SQL, Microsoft Excel.
Pandas can perform operations on various data, such as merging, reshaping, selection, as well as data cleaning and data processing features.
In Pandas, the main data structures are Series (one-dimensional data) and DataFrame (two-dimensional data):
A Series is a one-dimensional array-like object that consists of a set of data (various Numpy data types) and a set of data labels (ie, indices) associated with it.
A DataFrame is a tabular data structure that contains an ordered set of columns, each of which can be of a different value type (numeric, string, boolean). DataFrame has both row index and column index, which can be regarded as a dictionary composed of Series (commonly share an index).
Table of contents
2.3 Series data extraction (index)
3.2 DataFrame creation example
3.3 DataFrame Data Extraction (Index)
1. Pandas installation
Like Numpy, Pandas can be installed using pip or conda. If you have already installed the anaconda integrated development environment, which comes with numpy and pandas, you don't need to install it again.
1.1 Install Pandas using pip
pip install pandas
After successful installation, it can be used by importing the pandas package:
import pandas as pd
1.2 Test example
import numpy as np
import pandas as pd
s = pd.Series([1,2,3,4,np.nan,6,8])
s
2. Series data structure
2.1 Serise Introduction
Pandas Series is similar to a column in a table, similar to a one-dimensional array, and can hold any data type.
Series consists of indexes and columns, and the functions are as follows:
pandas.Series( data, index, dtype, name, copy)
-
data : a set of data (ndarray type).
-
index : Data index label, if not specified, starts from 0 by default.
-
dtype : The data type, which will be judged by itself by default.
-
name : Set the name.
-
copy : Copy the data, the default is False.
2.2 series creation example
1. Create a normal example and set the index value:
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data,index=[100,101,102,103])
s
Output result:
100 a 101 b 102 c 103 d dtype: object
Extract the elements according to the index:
print(s[102])
Output result:
c
2. Use key/value objects, similar to dictionaries to create Series:
data = {
'user1':100,
'user2':200,
'user3':250,
}
s = pd.Series(data)
s
Output result:
3. Create a Series with a scalar:
s = pd.Series([1,2,3,4,5,6],index=['a','b','c','d','e','f'])
print(s)
Output result:
2.3 Series data extraction (index)
In the Series data structure, for data extraction, you can use the array subscript index method, or use the index parameter setting method to extract its elements.
s = pd.Series([1,2,3,4,5,6],index=['a','b','c','d','e','f'])
s
By indexing as follows:
s[0]
s[0:3]
s[-3:]#取出最后三个
s['a']
s[['a','c','f']]
The output is
1
a 1
b 2
c 3
dtype: int64
d 4
e 5
f 6
dtype: int64
1
a 1
c 3
f 6
dtype: int64
3. DataFrame
3.1 Introduction to DataFrame
A DataFrame is a tabular data structure that contains an ordered set of columns, each of which can be of a different value type (numeric, string, boolean). DataFrame has both row index and column index, which can be regarded as a dictionary composed of Series (commonly share an index).
The DataFrame construction method is as follows:
pandas.DataFrame( data, index, columns, dtype, copy)
-
data : A set of data (ndarray, series, map, lists, dict, etc.).
-
index : Index value, or can be called row label.
-
columns : column labels, the default is RangeIndex (0, 1, 2, …, n) .
-
dtype : data type.
-
copy : Copy the data, the default is False.
3.2 DataFrame creation example
1. Basic table creation
import pandas as pd
import numpy as np
df = pd.DataFrame()
#2列数据,1列写名字,2列写年龄
data = [['TOM',10],['BOB',12],['AOA',13]]
df = pd.DataFrame(data,columns=['username','age'])
Output result:
2. Using dictionary creation
#字典创建dataframe
data = {
"username":['小黑','小白','小刘'],
'income':[1000,2000,3000]
}
df = pd.DataFrame(data,index=[1,2,3])
df
Output result:
3. Use the Series method to create
d = {
'one':pd.Series([1,2,3],index=['a','b','c']),
'two':pd.Series([1,2,3,4],index=['a','b','c','d'])
}
df = pd.DataFrame(d)
df
Output result:
Empty data will be filled with NaN.
3.3 DataFrame Data Extraction (Index)
In the above case, to get the first column of data, you can use df['one'] to get it:
In addition, you can also use loc(), iloc() and attributes to index data.
3.4 DataFrame data operation
3.4.1 Add column data
In DataFrame, you can directly use DataFrame[' columns '] = data to add:
df['three'] = pd.Series([4,5,6],index=['a','b','c'])
df['four'] = df['one']+df['three']
print(df)
3.4.2 Column data deletion
For the entire column of data deletion, you can use del or dataframe.pop method:
del df['four']
df.pop('two')
Output result:
3.4.3 Data Append
In dataframe, you can use dataframe.append to append data:
d = {
'one':pd.Series([1,2,3],index=['a','b','c']),
'two':pd.Series([1,2,3,4],index=['a','b','c','d'])
}
df = pd.DataFrame(d)
df2 = pd.DataFrame([[14,15],[15,16]],columns=['one','two'],index=['e','f'])
df = df.append(df2)
print(df)
Output result:
3.4.4 Index delete data
In dataframe, you can use dataframe.drop(index) to delete data:
For example, delete the row data indexed by d in the above data:
# 删除以d为索引的数据
df.drop("d")
Output result: