Table of contents
1. Use shift for displacement operations
2. Use resample to resample the data
3. Use expanding to perform cumulative calculation operations
4. Use nlargest and nsmallest to perform maximum and minimum value operations
5. Use map and applymap for mapping operations
6. Use stack and unstack to perform stacking and unstacking operations
7. Use rolling for sliding window operation
8. Use replace to perform replacement operations
9. Use melt for reshaping operations
10. Use agg for aggregation operations
1. Sexy operation
1. Use shift
displacement operations
Methods in DataFrame shift
can shift data up or down by a specified number of rows.
import pandas as pd
import numpy as np
# 创建一个包含NaN值的DataFrame
df = pd.DataFrame({'A': [1, 2, np.nan, 4]})
print(df)
# 使用shift方法向上位移一行,将NaN值填充到第一行
result = df['A'].shift(-1)
print(result) # 输出Series([2.0, nan, 4.0, nan])
2. Use resample
data resampling
The methods in DataFrame resample
can resample data according to specified time intervals and are often used for processing time series data.
import pandas as pd
import numpy as np
# 创建一个时间序列DataFrame
df = pd.DataFrame({'A': [1, 2, 3, 4, 5]}, index=pd.date_range('2023-01-01', periods=5))
print(df)
# 使用resample方法按天进行重采样,采用平均值作为采样值
result = df.resample('D').mean()
print(result) # 输出DataFrame({'A': [1.5, 2.5, 3.5, 4.5]})
3. Use expanding
to perform cumulative calculation operations
The methods in DataFrame expanding
can perform cumulative calculations on data and are often used for processing time series data.
import pandas as pd
import numpy as np
# 创建一个时间序列DataFrame
df = pd.DataFrame({'A': [1, 2, 3, 4, 5]}, index=pd.date_range('2023-01-01', periods=5))
print(df)
# 使用resample方法按天进行重采样,采用平均值作为采样值
result = df.resample('D').mean()
print(result) # 输出DataFrame({'A': [1.5, 2.5, 3.5, 4.5]})
4. Use nlargest
and nsmallest
to perform maximum and minimum value operations
nlargest
The sum method in DataFrame nsmallest
can find the largest N values and the smallest N values.
import pandas as pd
import numpy as np
# 创建一个DataFrame
df = pd.DataFrame({'A': [3, 1, 2], 'B': [6, 4, 5]})
print(df)
# 使用nlargest方法找出B列最大的2个值
result = df['B'].nlargest(2)
print(result) # 输出Series([6, 5])
# 使用nsmallest方法找出A列最小的2个值
result = df['A'].nsmallest(2)
print(result) # 输出Series([1, 2])
5. Use map
and applymap
perform mapping operations
Methods in DataFrame map
can map a column, and applymap
methods can map the entire DataFrame.
import pandas as pd
import numpy as np
# 创建一个DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
print(df)
# 使用map方法将A列的值映射为新的值
mapping_dict = {1: 'one', 2: 'two', 3: 'three'}
result = df['A'].map(mapping_dict)
print(result) # 输出Series(['one', 'two', 'three'])
# 使用applymap方法将整个DataFrame的值映射为新的值
mapping_func = lambda x: x * 2
result = df.applymap(mapping_func)
print(result) # 输出DataFrame({'A': [2, 4, 6], 'B': [8, 10, 12]})
6. Use stack
and unstack
to perform stacking and unstacking operations
Methods in a DataFrame stack
can convert columns of data into rows, and unstack
methods can convert rows of data into columns.
import pandas as pd
import numpy as np
# 创建一个多重索引的DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['a', 'b', 'c'], columns=['x', 'y'])
print(df)
# 使用stack方法将数据的列转换为行
result = df.stack()
print(result) # 输出Series([1, 4, 2, 5, 3, 6], index=MultiIndex([('a', 'x'), ('a', 'y'), ('b', 'x'), ('b', 'y'), ('c', 'x'), ('c', 'y')], names=['index', 'columns']))
# 使用unstack方法将数据的行转换为列
result = df.unstack()
print(result) # 输出DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]})
7. Use rolling
sliding window operation
The methods in DataFrame rolling
can perform sliding window operations and are commonly used for data processing and analysis.
import pandas as pd
import numpy as np
# 创建一个DataFrame
df = pd.DataFrame({'A': [1, 2, 3, 4, 5]})
print(df)
# 使用rolling方法进行滑动窗口操作,窗口大小为3,并计算每个窗口内的平均值
result = df.rolling(window=3).mean()
print(result) # 输出DataFrame({'A': [nan, nan, 2.0, 3.0, 4.0]})
8. Use replace
to perform replacement operations
Methods in DataFrame replace
can perform substitution operations on specified values.
import pandas as pd
import numpy as np
# 创建一个DataFrame
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': ['foo', 'bar', 'foo', 'bar', 'foo']})
print(df)
# 使用replace方法将B列中的'foo'替换为'baz'
result = df.replace({'B': {'foo': 'baz'}})
print(result) # 输出DataFrame({'A': [1, 2, 3, 4, 5], 'B': ['baz', 'bar', 'baz', 'bar', 'baz']})
9. Use melt
to perform reshaping operations
The methods in DataFrame melt
can reshape data and convert wide format data into long format, which is often used for data processing and analysis.
import pandas as pd
import numpy as np
# 创建一个宽格式的DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
print(df)
# 使用melt方法将数据进行重塑操作,将宽格式的数据转换为长格式
result = df.melt()
print(result) # 输出DataFrame({'A': [1, 2, 3, 1, 2, 3, 1, 2, 3], 'variable': ['B', 'B', 'B', 'C', 'C', 'C', 'B', 'B', 'B'], 'value': [4, 5, 6, 7, 8, 9, 4, 5, 6]})
10. Use agg
aggregation operations
Methods in DataFrame agg
can perform aggregation operations on specified columns and combine multiple values into one value, which is often used for data processing and analysis.
import pandas as pd
import numpy as np
# 创建一个DataFrame
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [10, 20, 30, 40, 50]})
print(df)
# 使用agg方法将A列和B列分别进行求和和平均值操作
result = df.agg({'A': 'sum', 'B': 'mean'})
print(result) # 输出DataFrame({'A': [15], 'B': [30]})
2. Focus on
Follow the Weixin public account Python risk control model and data analysis , there are more theories and code sharing waiting for you.