使用lxml的etree读取xml时的问题:ValueError: Unicode strings with encoding declaration are not supported.

ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
原xml文件内容:

<?xml version="1.0" encoding="UTF-8"?>
<dataset name="Lara_UrbanSeq1" version="0.5" comments="Public database: http://www.lara.prd.fr/benchmarks/trafficlightsrecognition">
    <frame number="6695" sec="487" ms="829">
        <objectlist>
            <object id="18">
                <orientation>90</orientation>
                <box h="39" w="18" xc="294" yc="34"/>
                <appearance>appear</appearance>
                <hypothesislist>
                    <hypothesis evaluation="1.0" id="1" prev="1.0">
                        <type evaluation="1.0">Traffic Light</type>
                        <subtype evaluation="1.0">go</subtype>
                    </hypothesis>
                </hypothesislist>
            </object>
            <object id="19">
                <orientation>90</orientation>
                <box h="15" w="6" xc="518" yc="123"/>
                <appearance>appear</appearance>
                <hypothesislist>
                    <hypothesis evaluation="1.0" id="1" prev="1.0">
                        <type evaluation="1.0">Traffic Light</type>
                        <subtype evaluation="1.0">go</subtype>
                    </hypothesis>
                </hypothesislist>
            </object>
            <object id="20">
                <orientation>90</orientation>
                <box h="15" w="6" xc="382" yc="122"/>
                <appearance>appear</appearance>
                <hypothesislist>
                    <hypothesis evaluation="1.0" id="1" prev="1.0">
                        <type evaluation="1.0">Traffic Light</type>
                        <subtype evaluation="1.0">go</subtype>
                    </hypothesis>
                </hypothesislist>
            </object>
        </objectlist>
        <grouplist>
		</grouplist>
    </frame>
</dataset>

原读取代码:

import numpy as np
import PIL.Image
import tensorflow as tf
from lxml import etree

from object_detection.dataset_tools import tf_record_creation_util
from object_detection.utils import dataset_util
from object_detection.utils import label_map_util

# xml_path = "./Annotations/Abyssinian_12_test.xml"
xml_path = "./Annotations/Lara_test.xml"

with tf.gfile.GFile(xml_path, 'r') as fid:
    xml_str = fid.read()
#     xml = etree.fromstring(xml_str)
#    xml = etree.fromstring(xml_str).encode('utf-8')
    xml = etree.fromstring(xml_str.encode('utf-8')) # 这一句做了修改后bug消失   
    data = dataset_util.recursive_parse_xml_to_dict(xml)['annotation']
    print(data)

错误显示不支持的解码格式,以为时标注的xml文件出错了,就把相应的图片和标注文件删除了,发现还是出错。很感谢这篇博客的作者:https://blog.csdn.net/Fkk921912333/article/details/78537726 ,作者博客“解析 XML 字符串”部分,介绍了相关的xml文件解析方法,最主要的一句:print(etree.tostring(root, pretty_print=True).decode(‘utf-8’))。对比自己的creat_te_record文件,修改了读取文件时格式,即加入‘utf-8’,改变读取xml文件的编码方式,即可顺利转换数据。因为用的是tensorflow官方给的转换文件,具体语句为将xml = etree.fromstring(xml_str)改为xml = etree.fromstring(xml_str.encode(‘utf-8’))。可以据此更改自己的文件语句。
————————————————
以上文字转载自:
原文链接:https://blog.csdn.net/mingyang_wang/article/details/82912636

发布了4 篇原创文章 · 获赞 3 · 访问量 922

猜你喜欢

转载自blog.csdn.net/suiyuan2009/article/details/104097491