Kettle使用_18 分组组件计算百分位数
需求: 通过Kettle的组件对emp的sal字段计算百分之25、50、75、90数。
解决方法:结合表输入、排序记录、分组组件解决,这里主要是通过应用分组组件的Percentile方法来解决。
Previous work:
准备员工emp表结构和数据(mysql)。
create table emp(
empno int unsigned auto_increment COMMENT '雇员编号',
ename varchar(15) COMMENT '雇员姓名',
job varchar(10) COMMENT '雇员职位',
mgr int unsigned COMMENT '雇员对应的领导的编号',
hiredate date COMMENT '雇员的雇佣日期',
sal decimal(7,2) COMMENT '雇员的基本工资',
comm decimal(7,2) COMMENT '奖金',
deptno int unsigned COMMENT '所在部门',
primary key(empno),
foreign key(deptno) references dept(deptno)
) COMMENT='雇员表';
INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,'1980-12-17',800,NULL,20);
INSERT INTO emp VALUES (7499,'ALLEN','SALESMAN',7698,'1981-2-20',1600,300,30);
INSERT INTO emp VALUES (7521,'WARD','SALESMAN',7698,'1981-2-22',1250,500,30);
INSERT INTO emp VALUES (7566,'JONES','MANAGER',7839,'1981-4-2',2975,NULL,20);
INSERT INTO emp VALUES (7654,'MARTIN','SALESMAN',7698,'1981-9-28',1250,1400,30);
INSERT INTO emp VALUES (7698,'BLAKE','MANAGER',7839,'1981-5-1',2850,NULL,30);
INSERT INTO emp VALUES (7782,'CLARK','MANAGER',7839,'1981-6-9',2450,NULL,10);
INSERT INTO emp VALUES (7788,'SCOTT','ANALYST',7566,'87-7-13',3000,NULL,20);
INSERT INTO emp VALUES (7839,'KING','PRESIDENT',NULL,'1981-11-17',5000,NULL,10);
INSERT INTO emp VALUES (7844,'TURNER','SALESMAN',7698,'1981-9-8',1500,0,30);
INSERT INTO emp VALUES (7876,'ADAMS','CLERK',7788,'87-7-13',1100,NULL,20);
INSERT INTO emp VALUES (7900,'JAMES','CLERK',7698,'1981-12-3',950,NULL,30);
INSERT INTO emp VALUES (7902,'FORD','ANALYST',7566,'1981-12-3',3000,NULL,20);
INSERT INTO emp VALUES (7934,'MILLER','CLERK',7782,'1982-1-23',1300,NULL,10);
Step1:新建个转换
Step2: 拖个表输入组件,(该组件位于转换的输入分类下),配置如下:
扫描二维码关注公众号,回复:
12408378 查看本文章

Step3:拖动个排序记录组件,并通过SHIFT连接表输入和排序记录组件。配置排序记录如下:
Step4:拖动分组组件,通过SHIFT连接排序记录组件与分组组件。配置分组组件。
Step5: 保存并运行验证。
完整流程示意:
补充(Percentile计算过程,25百分位示例):
Percentile (linear interpolation)与Percentile (nearest-rank method)
Percentile (linear interpolation)、Excel PERCENTILE.EXC
Percentile (linear interpolation)、Excel PERCENTILE.EXC(PERCENTILE)