用json.loads()将字符串转换为json格式出错

今天爬取今日头条的街拍时,需要将里面的一个字符串变为json格式,结果直接转换就出现了

json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

查看发现是网页里面的字符串里面含有\, 如下面的字符串所示,在灭一个双引号前面和右斜杠前面都有一个\. 

{\"count\":9,\"sub_images\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/pgc-image\\/15308911861360624e6e374\",\"width\":4000,\"url_list\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/pgc-image\\/15308911861360624e6e374\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/pgc-image\\/15308911861360624e6e374\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/pgc-image\\/15308911861360624e6e374\"}],\"uri\":\"origin\\/pgc-image\\/15308911861360624e6e374\",\"height\":6000},{\"url\":\"http:\\/\\/p99.pstatp.com\\/origin\\/pgc-image\\/1530891187640293d64a75b\",\"width\":4000,\"url_list\":[{\"url\":\"http:\\/\\/p99.pstatp.com\\/origin\\/pgc-image\\/1530891187640293d64a75b\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/pgc-image\\/1530891187640293d64a75b\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/pgc-image\\/1530891187640293d64a75b\"}],\"uri\":\"origin\\/pgc-image\\/1530891187640293d64a75b\",\"height\":6000},{\"url\":\"http:\\/\\/p99.pstatp.com\\/origin\\/pgc-image\\/15308911869350d7e224617\",\"width\":4000,\"url_list\":[{\"url\":\"http:\\/\\/p99.pstatp.com\\/origin\\/pgc-image\\/15308911869350d7e224617\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/pgc-image\\/15308911869350d7e224617\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/pgc-image\\/15308911869350d7e224617\"}],\"uri\":\"origin\\/pgc-image\\/15308911869350d7e224617\",\"height\":6000},{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/pgc-image\\/1530891187266752e4a248a\",\"width\":4000,\"url_list\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/pgc-image\\/1530891187266752e4a248a\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/pgc-image\\/1530891187266752e4a248a\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/pgc-image\\/1530891187266752e4a248a\"}],\"uri\":\"origin\\/pgc-image\\/1530891187266752e4a248a\",\"height\":6000},{\"url\":\"http:\\/\\/p99.pstatp.com\\/origin\\/pgc-image\\/1530891187573e72c879774\",\"width\":4000,\"url_list\":[{\"url\":\"http:\\/\\/p99.pstatp.com\\/origin\\/pgc-image\\/1530891187573e72c879774\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/pgc-image\\/1530891187573e72c879774\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/pgc-image\\/1530891187573e72c879774\"}],\"uri\":\"origin\\/pgc-image\\/1530891187573e72c879774\",\"height\":6000},{\"url\":\"http:\\/\\/p99.pstatp.com\\/origin\\/pgc-image\\/153089118689443f3c70490\",\"width\":4000,\"url_list\":[{\"url\":\"http:\\/\\/p99.pstatp.com\\/origin\\/pgc-image\\/153089118689443f3c70490\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/pgc-image\\/153089118689443f3c70490\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/pgc-image\\/153089118689443f3c70490\"}],\"uri\":\"origin\\/pgc-image\\/153089118689443f3c70490\",\"height\":6000},{\"url\":\"http:\\/\\/p99.pstatp.com\\/origin\\/pgc-image\\/1530891186908d2f0efbf63\",\"width\":4000,\"url_list\":[{\"url\":\"http:\\/\\/p99.pstatp.com\\/origin\\/pgc-image\\/1530891186908d2f0efbf63\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/pgc-image\\/1530891186908d2f0efbf63\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/pgc-image\\/1530891186908d2f0efbf63\"}],\"uri\":\"origin\\/pgc-image\\/1530891186908d2f0efbf63\",\"height\":6000},{\"url\":\"http:\\/\\/p9.pstatp.com\\/origin\\/pgc-image\\/15308911853816554ab3238\",\"width\":4000,\"url_list\":[{\"url\":\"http:\\/\\/p9.pstatp.com\\/origin\\/pgc-image\\/15308911853816554ab3238\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/pgc-image\\/15308911853816554ab3238\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/pgc-image\\/15308911853816554ab3238\"}],\"uri\":\"origin\\/pgc-image\\/15308911853816554ab3238\",\"height\":6000},{\"url\":\"http:\\/\\/p99.pstatp.com\\/origin\\/pgc-image\\/15308912219659b671a7fad\",\"width\":4000,\"url_list\":[{\"url\":\"http:\\/\\/p99.pstatp.com\\/origin\\/pgc-image\\/15308912219659b671a7fad\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/pgc-image\\/15308912219659b671a7fad\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/pgc-image\\/15308912219659b671a7fad\"}],\"uri\":\"origin\\/pgc-image\\/15308912219659b671a7fad\",\"height\":6000}],\"max_img_width\":4000,\"labels\":[\"\\u4e09\\u91cc\\u5c6f\",\"\\u6444\\u5f71\"],\"sub_abstracts\":[\" \",\" \",\" \",\" \",\" \",\" \",\" \",\" \",\" \"],\"sub_titles\":[\"\\u8857\\u62cd\\u5317\\u4eac\\uff0c\\u771f\\u5b9e\\u7684\\u4e09\\u91cc\\u5c6f\\u8857\\u62cd\\uff0c\\u6709\\u4f60\\u559c\\u6b22\\u7684\\u5417\\uff1f\",\"\\u8857\\u62cd\\u5317\\u4eac\\uff0c\\u771f\\u5b9e\\u7684\\u4e09\\u91cc\\u5c6f\\u8857\\u62cd\\uff0c\\u6709\\u4f60\\u559c\\u6b22\\u7684\\u5417\\uff1f\",\"\\u8857\\u62cd\\u5317\\u4eac\\uff0c\\u771f\\u5b9e\\u7684\\u4e09\\u91cc\\u5c6f\\u8857\\u62cd\\uff0c\\u6709\\u4f60\\u559c\\u6b22\\u7684\\u5417\\uff1f\",\"\\u8857\\u62cd\\u5317\\u4eac\\uff0c\\u771f\\u5b9e\\u7684\\u4e09\\u91cc\\u5c6f\\u8857\\u62cd\\uff0c\\u6709\\u4f60\\u559c\\u6b22\\u7684\\u5417\\uff1f\",\"\\u8857\\u62cd\\u5317\\u4eac\\uff0c\\u771f\\u5b9e\\u7684\\u4e09\\u91cc\\u5c6f\\u8857\\u62cd\\uff0c\\u6709\\u4f60\\u559c\\u6b22\\u7684\\u5417\\uff1f\",\"\\u8857\\u62cd\\u5317\\u4eac\\uff0c\\u771f\\u5b9e\\u7684\\u4e09\\u91cc\\u5c6f\\u8857\\u62cd\\uff0c\\u6709\\u4f60\\u559c\\u6b22\\u7684\\u5417\\uff1f\",\"\\u8857\\u62cd\\u5317\\u4eac\\uff0c\\u771f\\u5b9e\\u7684\\u4e09\\u91cc\\u5c6f\\u8857\\u62cd\\uff0c\\u6709\\u4f60\\u559c\\u6b22\\u7684\\u5417\\uff1f\",\"\\u8857\\u62cd\\u5317\\u4eac\\uff0c\\u771f\\u5b9e\\u7684\\u4e09\\u91cc\\u5c6f\\u8857\\u62cd\\uff0c\\u6709\\u4f60\\u559c\\u6b22\\u7684\\u5417\\uff1f\",\"\\u8857\\u62cd\\u5317\\u4eac\\uff0c\\u771f\\u5b9e\\u7684\\u4e09\\u91cc\\u5c6f\\u8857\\u62cd\\uff0c\\u6709\\u4f60\\u559c\\u6b22\\u7684\\u5417\\uff1f\"]}

解决方法就是用replace()将/替换为''即可。

以为这样就可以顺利的执行了,结果。。又出现了一个错误。。

json.decoder.JSONDecodeError: Extra data: 

出现这个问题主要是json的格式问题,可能是里面包含了两个以上的records。因为json两个以上的records是要放在list里面的,如下面的json文件所示,两个name的records是放在一个key为foo的list里面:

{
    "foo" : [
       {"name": "XYZ", "address": "54.7168,94.0215", "country_of_residence": "PQR", "countries": "LMN;PQRST", "date": "28-AUG-2008", "type": null},
       {"name": "OLMS", "address": null, "country_of_residence": null, "countries": "Not identified;No", "date": "23-FEB-2017", "type": null}
    ]
}

网上也有很多可以在线检查json格式的网站,可以帮助发现问题。

但是,我放到网上发现json格式没有问题。。结果出现问题是因为我在匹配字符串的时候匹配了到了字符串前后的双引号,与字符串里面的双引号出现了冲突,才导致了上面的问题,如下图所示,在最外面的大括号外面多了双引号。

"{
    "foo" : [
       {"name": "XYZ", "address": "54.7168,94.0215", "country_of_residence": "PQR", "countries": "LMN;PQRST", "date": "28-AUG-2008", "type": null},
       {"name": "OLMS", "address": null, "country_of_residence": null, "countries": "Not identified;No", "date": "23-FEB-2017", "type": null}
    ]
}"

这个是我用来检测json格式的网站链接:https://www.bejson.com/

猜你喜欢

转载自blog.csdn.net/zx1245773445/article/details/84111121