Python文本去重

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/muumian123/article/details/81942266

先介绍几种列表去重方法:

1. 清晰明了版(不改变顺序):

ids = [1,2,3,3,4,2,3,4,5,6,1]
news_ids = []
for id in ids:
    if id not in news_ids:
        news_ids.append(id)
print (news_ids)

 2.  简介快速版

利用set的自动去重功能:

li=[1,2,3,4,5,1,2,3]
li=list(set(li))
print(li)

这样处理会改变list原有顺序,若想保持顺序不变,则如下:

li=[1,2,3,4,5,1,2,3]
new_li=list(set(li))
new_li.sort(key=li.index)
print(new_li)

 3. 匿名函数版

ids = [1,4,3,3,4,2,3,4,5,6,1]
func = lambda x,y:x if y in x else x + [y]
reduce(func, [[], ] + ids)

4. 高级模块版 

import itertools
ids = [1,4,3,3,4,2,3,4,5,6,1]
ids.sort()
it = itertools.groupby(ids)
for k, g in it:
    print (k)

5. 数量级GB左右文本快速去重

#coding=utf-8 
import sys, re, os
def quchong(infile, outfile):
    inopen = open(infile, 'r', encoding='utf-8')
    outopen = open(outfile, 'w', encoding='utf-8')
    data = inopen.read()
    list_1 = list(set(data.split('\n')))
    print(list_1)
    for line in list_1:
        if line != '':
            outopen.write(line + '\n')
    inopen.close()
    outopen.close()

 有优秀的方法欢迎交流指正!

猜你喜欢

转载自blog.csdn.net/muumian123/article/details/81942266
今日推荐