Python 把txt文档的编码批量转化为utf-8 - 代码天地

Python 把txt文档的编码批量转化为utf-8

其他 2020-04-07 15:03:47 阅读次数: 0

files = os.listdir(".")#获取当前目录下的文件
from chardet.universaldetector import UniversalDetector

def get_encode_info(file):
 with open(file, 'rb') as f:
     detector = UniversalDetector()
     for line in f.readlines():
         detector.feed(line)
         if detector.done:
             break
     detector.close()
     return detector.result['encoding']

def read_file(file):
    with open(file, 'rb') as f:
        return f.read()

def write_file(content, file):
    with open(file, 'wb') as f:
        f.write(content)

def convert_encode2utf8(file, original_encode, des_encode):
    file_content = read_file(file)
    file_decode = file_content.decode(original_encode,'ignore')
    file_encode = file_decode.encode(des_encode)
    write_file(file_encode, file)

if __name__ == "__main__":
    for filename in files:
        file_content = read_file(filename)
        encode_info = get_encode_info(filename)
        if encode_info != 'utf-8':
            convert_encode2utf8(filename, encode_info, 'utf-8')
        encode_info = get_encode_info(filename)
        print(encode_info)

发布了35 篇原创文章 · 获赞 26 · 访问量 8万+

私信关注

猜你喜欢

转载自blog.csdn.net/weixin_42342968/article/details/104553130

Python 把txt文档的编码批量转化为utf-8

将UCS-2 Little Endian(即 utf-16)编码的txt文件批量转化为utf-8编码（python）

python中把ISO-8859-1编码转化为UTF-8

linux python 转化编码格式 ansi 到 utf-8

关于Python文档读取UTF-8编码文件问题

Python实现文件（xml,txt）编码转换GB2312、GBK、UTF-8

Python的编码注释# -*- coding:utf-8 -*-

python设置utf-8编码

Python UTF-8编码设置

python 使用 UTF-8 编码

python脚本------json批量转化为xml或txt文件

PYTHON UTF-8 乱码 PYTHON编码问题总结

Python常识（1）——Python的编码注释# -*- coding:utf-8 -*-

将UNicode编码转化为UTF－8

python 检测文件编码是否是utf-8无bom

python3 中将utf-8编码与汉字

Python编码方式unicode和utf-8

深入理解Python的字符编码utf-8 & unicode

Sublime 使用python utf-8编码格式

Python中GBK, UTF-8和Unicode的编码问题

Python设置默认编码为UTF-8

python 网址utf-8编码还有解码

python中出现utf-8编码格式错误

python编码：ascii/Unicode/UTF-8/etc......

python 解决unicode、utf-8各种编码问题

python转任意文件编码到utf-8

Python中的编码问题：ASCII码 Unicoden编码 UTF-8编码

Python提取TXT数据转化为DataFrame

【pywin32】python抽取word/PDF文档文本中转化为txt格式存储

python将字符串以utf-8格式保存在txt文件中

今日推荐

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

Java自定义时间格式

同步整形电路

在开发中最最最常用的字符串的属性大集合

Linux 查看端口占用并杀掉

Java基础四：ArrayList

多线程之死锁就是这么简单

mysql 基础命令集

awk 命令详解

Centos6.3编译安装nginx+php步骤

OCR （Optical Character Recognition，光学字符识别）

每日归档

更多

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)