需求
数据库中存在一批数据,需要业务人员处理,然而,由于数据库中的数据存在大量的重复,因此,需要去重,仅保留一条数据即可,以减轻业务人员的工作量。
首先,我去网上搜索了一些方法,如下:
利用SQL,删除掉重复多余的数据,并且只保留一条数据。
1、查找表中多余的重复记录,重复记录是根据单个字段(teamId)来判断
select * from team where teamId in (select teamId from team group by teamId having count(teamId) > 1)
2、删除表中多余的重复记录,重复记录是根据单个字段(teamId)来判断,只留有rowid最小的记录
delete from team where
teamName in(select teamName from team group by teamName having count(teamName) > 1)
and teamId not in (select min(teamId) from team group by teamName having count(teamName)>1)
1、查找表中多余的重复记录(多个字段)
select * from team t
where (t.teamId,t.teamOrg) in (select teamId,teamOrg from team group by teamId,teamOrg having count(*) > 1)
2、删除表中多余的重复记录(多个字段),只留有rowid最小的记录
delete from team t
where (t.teamId,t.teamOrg) in (select teamId,teamOrg from team group by teamId,teamOrg having count(*) > 1)
and rowid not in (select min(rowid) from team group by teamId,teamOrg having count(*)>1)
3、查找表中多余的重复记录(多个字段),不包含rowid最小的记录
select * from team t
where (t.teamId,t.teamOrg) in (select teamId,teamOrg from team group by teamId ,t.teamOrg having count(*) > 1)
and rowid not in (select min(rowid) from team group by teamId,teamOrg having count(*)>1)
1.消除一个字段的左边的第一位:
update tableName set [Title]=Right([Title],(len([Title])-1)) where Title like ‘村%’
2.消除一个字段的右边的第一位:
update tableName set [Title]=left([Title],(len([Title])-1)) where Title like ‘%村’
1.假删除表中多余的重复记录(多个字段),不包含rowid最小的记录
update team set ispass=-1
where teamId in (select teamId from team group by teamId
1.oracle中利用rowId去删除多余的数据:
(1).在oracle中,每一条记录都有一个rowid,rowid在整个数据库中是唯一的,rowid确定了每条记录是在Oracle中的哪一个数据文件、块、行上。
(2).在重复的记录中,可能所有列的内容都相同,但rowid不会相同,所以只要确定出重复记录中那些具有最大rowid的就可以了,其余全部删除。
delete from team where teamName in
(select* from team t where rowid=(select max(rowid) from team where teamName=t.teamName))
然而,在进行上述操作过程中,发现一个很大的问题,上述sql中用到了in,而且,delete from team t where (t.teamId,t.teamOrg) in (select teamId,teamOrg from team group by teamId,teamOrg having count(*) > 1) and rowid not in (select min(rowid) from team group by teamId,teamOrg having count(*)>1)
这句,可运行性太低,特别是数据量较大,且你的‘teamId’和‘teamOrg’并非主键亦非索引时
第二种去重方法,借助Java代码实现:思路==》摘出重复数据的id,最终用delete from team where id in (各种重复数据的id)
测试代码如下:
mapper层
/**
* 功能描述: 查找重复数据id
*
* @param taskNo 任务号
* @param link 链接
* @return list
* @date 2018/8/7 16:02
*/
@Select("SELECT id FROM team " +
"WHERE task_no=@{taskNo} AND link = @{link} ")
List<Integer> selectDuplicateIds(@Param("taskNo") String taskNo, @Param("link") String link);
/**
* 功能描述: 查询重复数据
* 这里加limit是考虑到数据库性能和用户体验
* @return list
* @date 2018/8/7 16:42
*/
@Select("SELECT task_no,`link` FROM `team` " +
"GROUP BY `link`,task_no HAVING count(*) > 1 "
+ "limit 1000"
)
List<Team> selectDuplicate();
service层此处省去
/**
* 功能描述: 数据库去重
* @return 删除重复数据的数量
* @date 2018/8/7 16:47
*/
@Override
public Integer removeDuplicate() {
int result = 0;
List<Team> list = teamMapper.selectDuplicate();
for (Team info : list) {
List<Integer> ids = teamMapper.selectDuplicateIds(info.getTaskNo(), info.getLink());
for (int i=0;i<ids.size()-1;i++){
teamMapper.deleteByPrimaryKey(ids.get(i));
result++;
}
}
return result;
}
controller层省略
前端,去重按钮,发现数据有重复现象,点击去重,返回去重成功的数量,直到返回为0,说明没有重复数据
上述是接口功能实现,下面是拾取重复id的test
@RunWith(SpringRunner.class)
@SpringBootTest
public class AppTests {
@Resource
private TeamMapper teamMapper;
@Test
public void contextLoads() {
System.out.println("hello world");
}
/**
* 拾取重复数据的id或者删除重复数据
*/
@Test
public void remDuplicate() {
List<Team> list = teamMapper.selectDuplicate();
List<Integer> all = new ArrayList<>();
for (Team info : list) {
List<Integer> ids = teamMapper.selectDuplicateIds(info.getTaskNo(), info.getLink());
for (int i=0;i<ids.size()-1;i++){
// teamMapper.deleteByPrimaryKey(ids.get(i));
all.add(ids.get(i));
}
}
System.out.println(all.toString());
}
}