pandas.DataFrame.merge() 参数详解

pd.merge 是使用数据库风格的连接合并DataFrame或已命名的系列对象。

方法：

DataFrame.merge(self, right, how='inner', on=None, left_on=None, right_on=None,
                left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'),
                copy=True, indicator=False, validate=None)

主要参数：

right : DataFrame或命名的Series ，合并的对象。

how : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’ 默认为合并两个frame的交集

Type of merge to be performed. 合并类型。

left: use only keys from left frame, similar to a SQL left outer join; preserve key order.

仅使用左frame中的键，类似于SQL左外部联接；保留关键顺序

right: use only keys from right frame, similar to a SQL right outer join; preserve key order.

仅使用右frame中的键，类似于SQL右外部联接；保留关键顺序。

outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically.

使用两个frame中键的并集，类似于SQL完全外部联接；按字典顺序对键进行排序。

inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys.

使用两个frame中关键点的交集，类似于SQL内部联接；保留左键的顺序。

on : label or list

扫描二维码关注公众号，回复： 9463434 查看本文章

Column or index level names to join on. These must be found in both DataFrames. If on is None and not merging on indexes then this defaults to the intersection of the columns in both DataFrames.

列名或索引，必须在两个DataFrame中都能找到。如果on为None且未用索引合并，则默认为两个DataFrame中列的交集

left_on : label or list, or array-like

Column or index level names to join on in the left DataFrame. Can also be an array or list of arrays of the length of the left DataFrame. These arrays are treated as if they are columns.

左DataFrame中的列名或索引。也可以是左DataFrame长度的数组或数组列表。

right_on : label or list, or array-like

Column or index level names to join on in the right DataFrame. Can also be an array or list of arrays of the length of the right DataFrame. These arrays are treated as if they are columns.

右DataFrame中的列名或索引。也可以是右DataFrame长度的数组或数组列表。

left_index : bool, default False

Use the index from the left DataFrame as the join key(s). 左DataFrame的索引作为连接键

right_index : bool, default False

Use the index from the right DataFrame as the join key. 右DataFrame的索引作为连接键

sort : bool, default False

Sort the join keys lexicographically in the result DataFrame. If False, the order of the join keys depends on the join type (how keyword).

在结果DataFrame中按字典顺序对连接键排序。如果为False，联接键的顺序取决于联接类型(how关键字)。

copy : bool, default True

If False, avoid copy if possible. 默认为True, 总是将数据复制到数据结构中。设为Fasle，尽可能避免复制。

举例：

1.1. how='left' ，仅使用左 frame 中的键，例子中 age=39 的行，左右 frame 的 class 值不同，class 属性使用左 frame 的键值，同时右 frame 的 Marital 和 Income 在左 frame 没有所以显示NaN值。

result = pd.merge(left, right, how='left', on=None, left_on=None, right_on=None,
                  left_index=False, right_index=False)

	left:				right:				result:
	Age	Marital	Class		Age	Income	Class		Age	Marital	Class	Income
0	37	Divorced	a	0	37	5993	a	0	37	Divorced	a	5993
1	54	Divorced	a	1	54	10502	a	1	54	Divorced	a	10502
2	34	Single	a	2	34	6074	a	2	34	Single	a	6074
3	39	Married	b	3	39	12742	a	3	39	Married	b	NaN
4	28	Divorced	b	4	28	2596	b	4	28	Divorced	b	2596
5	24	Married	b	5	24	4162	b	5	24	Married	b	4162

1.2. how='right' ，仅使用右 frame 中的键，例子中 age=39 的行，左右 frame 的 class 值不同，class 属性使用右 frame 的键值，同时左 frame 的 Gender 和 Ed 在右 frame 没有所以显示NaN值。

result = pd.merge(left, right, how='right', on=None, left_on=None, right_on=None,
                  left_index=False, right_index=False)

	left:				right:				result:
	Age	Marital	Class		Age	Income	Class		Age	Marital	Class	Income
0	37	Divorced	a	0	37	5993	a	0	37	Divorced	a	5993
1	54	Divorced	a	1	54	10502	a	1	54	Divorced	a	10502
2	34	Single	a	2	34	6074	a	2	34	Single	a	6074
3	39	Married	b	3	39	12742	a	3	28	Divorced	b	2596
4	28	Divorced	b	4	28	2596	b	4	24	Married	b	4162
5	24	Married	b	5	24	4162	b	5	39	NaN	a	12742

1.3. how='inner', 使用两个frame中键的交集。默认值。

result = pd.merge(left, right, how='inner', on=None, left_on=None, 
                  right_on=None, left_index=False, right_index=False)

	left:				right:				result:
	Age	Marital	Class		Age	Income	Class		Age	Marital	Class	Income
0	37	Divorced	a	0	37	5993	a	0	37	Divorced	a	5993
1	54	Divorced	a	1	54	10502	a	1	54	Divorced	a	10502
2	34	Single	a	2	34	6074	a	2	34	Single	a	6074
3	39	Married	b	3	39	12742	a	3	28	Divorced	b	2596
4	28	Divorced	b	4	28	2596	b	4	24	Married	b	4162
5	24	Married	b	5	24	4162	b

1.4. how='outer' ，使用两个frame中关键点的并集。

result = pd.merge(left, right, how='outer', on=None, left_on=None, right_on=None,
                  left_index=False, right_index=False)

	left:				right:				result:
	Age	Marital	Class		Age	Income	Class		Age	Marital	Class	Income
0	37	Divorced	a	0	37	5993	a	0	37	Divorced	a	5993
1	54	Divorced	a	1	54	10502	a	1	54	Divorced	a	10502
2	34	Single	a	2	34	6074	a	2	34	Single	a	6074
3	39	Married	b	3	39	12742	a	3	39	Married	b	NaN
4	28	Divorced	b	4	28	2596	b	4	28	Divorced	b	2596
5	24	Married	b	5	24	4162	b	5	24	Married	b	4162
								6	39	NaN	a	12742

2.1. on='Age' , 与 how 选择模式无关。on所选列名必须为左右 frame 相同列。

result = pd.merge(left, right, how='inner', on='Age', left_on=None, right_on=None,
                  left_index=False, right_index=False)

	left:				right:				result:
	Age	Marital	Class		Age	Income	Class		Age	Marital	Class_x	Income	Class_y
0	37	Divorced	a	0	37	5993	a	0	37	Divorced	a	5993	a
1	54	Divorced	a	1	54	10502	a	1	54	Divorced	a	10502	a
2	34	Single	a	2	34	6074	a	2	34	Single	a	6074	a
3	39	Married	b	3	39	12742	a	3	39	Married	b	12742	a
4	28	Divorced	b	4	28	2596	b	4	28	Divorced	b	2596	b
5	24	Married	b	5	24	4162	b	5	24	Married	b	4162	b

2.2. on='Class' , 与 how 选择模式无关，on所选列名必须为左右 frame 相同列。

result = pd.merge(left, right, how='inner', on='Class', left_on=None, right_on=None,
                  left_index=False, right_index=False)

	left:				right:				result:
	Age	Marital	Class		Age	Income	Class		Age_x	Marital	Class	Age_y	Income
0	37	Divorced	a	0	37	5993	a	0	37	Divorced	a	37	5993
1	54	Divorced	a	1	54	10502	a	1	37	Divorced	a	54	10502
2	34	Single	a	2	34	6074	a	2	37	Divorced	a	34	6074
3	39	Married	b	3	39	12742	a	3	37	Divorced	a	39	12742
4	28	Divorced	b	4	28	2596	b	4	54	Divorced	a	37	5993
5	24	Married	b	5	24	4162	b	5	54	Divorced	a	54	10502
								6	54	Divorced	a	34	6074
								7	54	Divorced	a	39	12742
								8	34	Single	a	37	5993
								9	34	Single	a	54	10502
								10	34	Single	a	34	6074
								11	34	Single	a	39	12742
								12	39	Married	b	28	2596
								13	39	Married	b	24	4162
								14	28	Divorced	b	28	2596
								15	28	Divorced	b	24	4162
								16	24	Married	b	28	2596
								17	24	Married	b	24	4162

2.3. on=['Age', 'Class'] , how='left'。on 所选列名为左右 frame 所有相同列名，效果与 on=None 相同。

result = pd.merge(left, right, how='left', on=['Age', 'Class'], 
                  left_on=None, right_on=None,
                  left_index=False, right_index=False)

	left:				right:				result:
	Age	Marital	Class		Age	Income	Class		Age	Marital	Class	Income
0	37	Divorced	a	0	37	5993	a	0	37	Divorced	a	5993
1	54	Divorced	a	1	54	10502	a	1	54	Divorced	a	10502
2	34	Single	a	2	34	6074	a	2	34	Single	a	6074
3	39	Married	b	3	39	12742	a	3	39	Married	b	NaN
4	28	Divorced	b	4	28	2596	b	4	28	Divorced	b	2596
5	24	Married	b	5	24	4162	b	5	24	Married	b	4162

3.1 on='Age', left_on='Age', right_on='Age'。

result = pd.merge(left, right, how='inner', on='Age', 
                  left_on='Age', right_on='Age',
                  left_index=False, right_index=False)

报错："on" 和 "left_on" and "right_on", 不能同时使用。

    'Can only pass argument "on" OR "left_on" '
pandas.errors.MergeError: Can only pass argument "on" OR "left_on" and "right_on", not a combination of both.

3.2 left_on='Age', right_on='Age', 左右frame中列名相同和数据类型相同。

result = pd.merge(left, right, how='inner', on=None, 
                  left_on='Age', right_on='Age',
                  left_index=False, right_index=False)

	Age	Marital	Class		Age	Income	Class		Age	Marital	Class_x	Income	Class_y
0	37	Divorced	a	0	37	5993	a	0	37	Divorced	a	5993	a
1	54	Divorced	a	1	54	10502	a	1	54	Divorced	a	10502	a
2	34	Single	a	2	34	6074	a	2	34	Single	a	6074	a
3	39	Married	b	3	39	12742	a	3	39	Married	b	12742	a
4	28	Divorced	b	4	28	2596	b	4	28	Divorced	b	2596	b
5	24	Married	b	5	24	4162	b	5	24	Married	b	4162	b

3.3 left_on='Age', right_on='Income', 左右frame中列名不同和数据类型相同。

result = pd.merge(left, right, how='inner', on=None, 
                  left_on='Age', right_on='Income',
                  left_index=False, right_index=False)

	left:				right:				result:
	Age	Marital	Class		Age	Income	Class		Age_x	Marital	Class_x	Age_y	Income	Class_y
0	37	Divorced	a	0	37	5993	a	0	37	Divorced	a	37	5993	a
1	54	Divorced	a	1	54	10502	a	1	54	Divorced	a	54	10502	a
2	34	Single	a	2	34	6074	a	2	34	Single	a	34	6074	a
3	39	Married	b	3	39	12742	a	3	39	Married	b	39	12742	a
4	28	Divorced	b	4	28	2596	b	4	28	Divorced	b	28	2596	b
5	24	Married	b	5	24	4162	b	5	24	Married	b	24	4162	b

3.4 left_on='Age_1', right_on='Age_2', 左右frame中列名不同和数据类型相同，数据值相同。

suffixes=('_l', '_r'), 设置应用于左侧和右侧重叠列名的后缀。若要对重叠列引发异常，请使用(False, False)。

result = pd.merge(left, right, how='inner', on=None, 
                  left_on='Age_1', right_on='Age_2',
                  left_index=False, right_index=False,
                  suffixes=('_l', '_r'))

	left:				right:				result:
	Age_1	Marital	Class		Age_2	Income	Class		Age_1	Marital	Class_l	Age_2	Income	Class_r
0	37	Divorced	a	0	37	5993	a	0	37	Divorced	a	37	5993	a
1	54	Divorced	a	1	54	10502	a	1	54	Divorced	a	54	10502	a
2	34	Single	a	2	34	6074	a	2	34	Single	a	34	6074	a
3	39	Married	b	3	39	12742	a	3	39	Married	b	39	12742	a
4	28	Divorced	b	4	28	2596	b	4	28	Divorced	b	28	2596	b
5	24	Married	b	5	24	4162	b	5	24	Married	b	24	4162	b

4.1 left_index=False, right_index=False,

result = pd.merge(left, right, how='inner', on=None, 
                  left_on=None, right_on=None,
                  left_index=False, right_index=False,
                  suffixes=('_l', '_r'))

	left:				right:				result:
	Age_1	Marital	Class		Age_2	Income	Class		Age	Marital	Class	Income
0	37	Divorced	a	0	37	5993	a	0	37	Divorced	a	5993
1	54	Divorced	a	1	54	10502	a	1	54	Divorced	a	10502
2	34	Single	a	2	34	6074	a	2	34	Single	a	6074
3	39	Married	b	3	39	12742	a	3	28	Divorced	b	2596
4	28	Divorced	b	4	28	2596	b	4	24	Married	b	4162
5	24	Married	b	5	24	4162	b

4.2 left_index=True, right_index=True, 使用来自左右DataFrame的索引作为连接键。

result = pd.merge(left, right, how='inner', on=None, 
                  left_on=None, right_on=None,
                  left_index=True, right_index=True,
                  suffixes=('_l', '_r'))

	left:				right:				result:
	Age_1	Marital	Class		Age_2	Income	Class		Age_l	Marital	Class_l	Age_r	Income	Class_r
0	37	Divorced	a	0	37	5993	a	0	37	Divorced	a	37	5993	a
1	54	Divorced	a	1	54	10502	a	1	54	Divorced	a	54	10502	a
2	34	Single	a	2	34	6074	a	2	34	Single	a	34	6074	a
3	39	Married	b	3	39	12742	a	3	39	Married	b	39	12742	a
4	28	Divorced	b	4	28	2596	b	4	28	Divorced	b	28	2596	b
5	24	Married	b	5	24	4162	b	5	24	Married	b	24	4162	b

4.3 left_index=True, right_index=False, 使用来自左右DataFrame的索引作为连接键。

result = pd.merge(left, right, how='inner', on=None, 
                  left_on=None, right_on=None,
                  left_index=True, right_index=False,
                  suffixes=('_l', '_r'))

报错：

pandas.errors.MergeError: Must pass right_on or right_index=True

说明必须right_on传参或 left_index=True 必须和 right_index=True 共同使用。

Fargo的火

发布了16 篇原创文章 · 获赞 1 · 访问量 1569

私信关注

pandas.DataFrame.merge() 参数详解

猜你喜欢