Pandas unconventional but practical cool operation 2

Table of contents

1. Sexy operation

1. Use shift for displacement operations

2. Use resample to resample the data

3. Use expanding to perform cumulative calculation operations

4. Use nlargest and nsmallest to perform maximum and minimum value operations

5. Use map and applymap for mapping operations

6. Use stack and unstack to perform stacking and unstacking operations

7. Use rolling for sliding window operation

8. Use replace to perform replacement operations

9. Use melt for reshaping operations

10. Use agg for aggregation operations

2. Focus on

1. Sexy operation

1. Use `shift`displacement operations

Methods in DataFrame shiftcan shift data up or down by a specified number of rows.

import pandas as pd  
import numpy as np  
  
# 创建一个包含NaN值的DataFrame  
df = pd.DataFrame({'A': [1, 2, np.nan, 4]})  
print(df)  
  
# 使用shift方法向上位移一行，将NaN值填充到第一行  
result = df['A'].shift(-1)  
print(result)  # 输出Series([2.0, nan, 4.0, nan])

2. Use `resample`data resampling

The methods in DataFrame resamplecan resample data according to specified time intervals and are often used for processing time series data.

import pandas as pd  
import numpy as np  
  
# 创建一个时间序列DataFrame  
df = pd.DataFrame({'A': [1, 2, 3, 4, 5]}, index=pd.date_range('2023-01-01', periods=5))  
print(df)  
  
# 使用resample方法按天进行重采样，采用平均值作为采样值  
result = df.resample('D').mean()  
print(result)  # 输出DataFrame({'A': [1.5, 2.5, 3.5, 4.5]})

3. Use `expanding`to perform cumulative calculation operations

The methods in DataFrame expandingcan perform cumulative calculations on data and are often used for processing time series data.

import pandas as pd  
import numpy as np  
  
# 创建一个时间序列DataFrame  
df = pd.DataFrame({'A': [1, 2, 3, 4, 5]}, index=pd.date_range('2023-01-01', periods=5))  
print(df)  
  
# 使用resample方法按天进行重采样，采用平均值作为采样值  
result = df.resample('D').mean()  
print(result)  # 输出DataFrame({'A': [1.5, 2.5, 3.5, 4.5]})

4. Use `nlargest`and `nsmallest`to perform maximum and minimum value operations

nlargestThe sum method in DataFrame nsmallestcan find the largest N values and the smallest N values.

import pandas as pd  
import numpy as np  
  
# 创建一个DataFrame  
df = pd.DataFrame({'A': [3, 1, 2], 'B': [6, 4, 5]})  
print(df)  
  
# 使用nlargest方法找出B列最大的2个值  
result = df['B'].nlargest(2)  
print(result)  # 输出Series([6, 5])  
  
# 使用nsmallest方法找出A列最小的2个值  
result = df['A'].nsmallest(2)  
print(result)  # 输出Series([1, 2])

5. Use `map`and `applymap`perform mapping operations

Methods in DataFrame mapcan map a column, and applymapmethods can map the entire DataFrame.

import pandas as pd  
import numpy as np  
  
# 创建一个DataFrame  
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})  
print(df)  
  
# 使用map方法将A列的值映射为新的值  
mapping_dict = {1: 'one', 2: 'two', 3: 'three'}  
result = df['A'].map(mapping_dict)  
print(result)  # 输出Series(['one', 'two', 'three'])  
  
# 使用applymap方法将整个DataFrame的值映射为新的值  
mapping_func = lambda x: x * 2  
result = df.applymap(mapping_func)  
print(result)  # 输出DataFrame({'A': [2, 4, 6], 'B': [8, 10, 12]})

6. Use `stack`and `unstack`to perform stacking and unstacking operations

Methods in a DataFrame stackcan convert columns of data into rows, and unstackmethods can convert rows of data into columns.

import pandas as pd  
import numpy as np  
  
# 创建一个多重索引的DataFrame  
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['a', 'b', 'c'], columns=['x', 'y'])  
print(df)  
  
# 使用stack方法将数据的列转换为行  
result = df.stack()  
print(result)  # 输出Series([1, 4, 2, 5, 3, 6], index=MultiIndex([('a', 'x'), ('a', 'y'), ('b', 'x'), ('b', 'y'), ('c', 'x'), ('c', 'y')], names=['index', 'columns']))  
  
# 使用unstack方法将数据的行转换为列  
result = df.unstack()  
print(result)  # 输出DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]})

7. Use `rolling`sliding window operation

The methods in DataFrame rollingcan perform sliding window operations and are commonly used for data processing and analysis.

import pandas as pd  
import numpy as np  
  
# 创建一个DataFrame  
df = pd.DataFrame({'A': [1, 2, 3, 4, 5]})  
print(df)  
  
# 使用rolling方法进行滑动窗口操作，窗口大小为3，并计算每个窗口内的平均值  
result = df.rolling(window=3).mean()  
print(result)  # 输出DataFrame({'A': [nan, nan, 2.0, 3.0, 4.0]})

8. Use `replace`to perform replacement operations

Methods in DataFrame replacecan perform substitution operations on specified values.

import pandas as pd  
import numpy as np  
  
# 创建一个DataFrame  
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': ['foo', 'bar', 'foo', 'bar', 'foo']})  
print(df)  
  
# 使用replace方法将B列中的'foo'替换为'baz'  
result = df.replace({'B': {'foo': 'baz'}})  
print(result)  # 输出DataFrame({'A': [1, 2, 3, 4, 5], 'B': ['baz', 'bar', 'baz', 'bar', 'baz']})

9. Use `melt`to perform reshaping operations

The methods in DataFrame meltcan reshape data and convert wide format data into long format, which is often used for data processing and analysis.

import pandas as pd  
import numpy as np  
  
# 创建一个宽格式的DataFrame  
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})  
print(df)  
  
# 使用melt方法将数据进行重塑操作，将宽格式的数据转换为长格式  
result = df.melt()  
print(result)  # 输出DataFrame({'A': [1, 2, 3, 1, 2, 3, 1, 2, 3], 'variable': ['B', 'B', 'B', 'C', 'C', 'C', 'B', 'B', 'B'], 'value': [4, 5, 6, 7, 8, 9, 4, 5, 6]})

10. Use `agg`aggregation operations

Methods in DataFrame aggcan perform aggregation operations on specified columns and combine multiple values into one value, which is often used for data processing and analysis.

import pandas as pd  
import numpy as np  
  
# 创建一个DataFrame  
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [10, 20, 30, 40, 50]})  
print(df)  
  
# 使用agg方法将A列和B列分别进行求和和平均值操作  
result = df.agg({'A': 'sum', 'B': 'mean'})  
print(result)  # 输出DataFrame({'A': [15], 'B': [30]})

2. Focus on

Follow the Weixin public account Python risk control model and data analysis , there are more theories and code sharing waiting for you.