字符串列表元组字典的转换

说明

这是我在爬虫中遇到的一个真实案例，这个爬虫是动态加载的。但是当我们拿到这个数据的时候，并不是很理想。其内容通过进一步解析之后是这样的。

([{"id":"680501","title":"揪住疫苗不放，公益诉讼请不要辜负民众期待","linkurl":"http://m.mp.oeeee.com/a/BAAFRD00002018072592681.html","time":"1532566055"},{"id":"680500","title":"支持性侵受害者发声，告诉她“你没错”","linkurl":"http://m.mp.oeeee.com/a/BAAFRD00002018072592682.html","time":"1532566055"}])

把这个数据格式化一下。其结构是最外层是一个元组，第二层是列表，最里面是若干字典。

(
	[
		{"id":"680501","title":"揪住疫苗不放，公益诉讼请不要辜负民众期待","linkurl":"http://m.mp.oeeee.com/a/BAAFRD00002018072592681.html","time":"1532566055"},
		{"id":"680500","title":"支持性侵受害者发声，告诉她“你没错”","linkurl":"http://m.mp.oeeee.com/a/BAAFRD00002018072592682.html","time":"1532566055"}
	]
)

目标：我们希望获取字典里面的信息，包括id，title，linkurl。

分析过程

1、当我第一眼看到这个字符串的时候，我就立马想，是不是可以直接转换成元组呢。

str_data = "我们需要处理的字符串"

tuple_data = tuple(str_data)
print tuple_data

原本以为这样就ok了，就变成了元组，我们就可以直接访问里面的元素。但是事实上是这样的，执行结果是：

('(', '[', '{', '"', 'i', 'd', '"', ':', '"', '6', '8', '0', '5', '0', '1', '"', ',', '"', 
't', 'i', 't', 'l', 'e', '"', ':', '"', '\xe6', '\x8f', '\xaa', '\xe4', '\xbd', '\x8f', 
'\xe7', '\x96', '\xab', '\xe8', '\x8b', '\x97', '\xe4', '\xb8', '\x8d', '\xe6', '\x94', 
'\xbe', '\xef', '\xbc', '\x8c', '\xe5', '\x85', '\xac', '\xe7', '\x9b', '\x8a', '\xe8', 
'\xaf', '\x89', '\xe8', '\xae', '\xbc', '\xe8', '\xaf', '\xb7', '\xe4', '\xb8', '\x8d', 
'\xe8', '\xa6', '\x81', '\xe8', '\xbe', '\x9c', '\xe8', '\xb4', '\x9f', '\xe6', '\xb0', 
'\x91', '\xe4', '\xbc', '\x97', '\xe6', '\x9c', '\x9f', '\xe5', '\xbe', '\x85', '"', ',', 
'"', 'l', 'i', 'n', 'k', 'u', 'r', 'l', '"', ':', '"', 'h', 't', 't', 'p', ':', '/', '/', 
'm', '.', 'm', 'p', '.', 'o', 'e', 'e', 'e', 'e', '.', 'c', 'o', 'm', '/', 'a', '/', 'B', 
'A', 'A', 'F', 'R', 'D', '0', '0', '0', '0', '2', '0', '1', '8', '0', '7', '2', '5', '9', 
'2', '6', '8', '1', '.', 'h', 't', 'm', 'l', '"', ',', '"', 't', 'i', 'm', 'e', '"', ':', 
'"', '1', '5', '3', '2', '5', '6', '6', '0', '5', '5', '"', '}', ']', ')')

2、元组不行就用列表来做吧，把整个字符的两端圆括号去掉。然后做处理。

结果都是一样的，都是用每一个字符来组成的一个列表。

转化操作eval()

1、将上述字符串转换为列表。

list_data = list(eval(str_data))
print type(list_data)
print list_data

执行结果是：

<type 'list'>

[{'time': '1532566055', 'linkurl': 'http://m.mp.oeeee.com/a/BAAFRD00002018072592681.html', 
'id': '680501', 'title': 
'\xe6\x8f\xaa\xe4\xbd\x8f\xe7\x96\xab\xe8\x8b\x97\xe4\xb8\x8d\xe6\x94\xbe\xef\xbc\x8c\xe5\x
85\xac\xe7\x9b\x8a\xe8\xaf\x89\xe8\xae\xbc\xe8\xaf\xb7\xe4\xb8\x8d\xe8\xa6\x81\xe8\xbe\x9c\
xe8\xb4\x9f\xe6\xb0\x91\xe4\xbc\x97\xe6\x9c\x9f\xe5\xbe\x85'}, {'time': '1532566055', 
'linkurl': 'http://m.mp.oeeee.com/a/BAAFRD00002018072592682.html', 'id': '680500', 'title': 
'\xe6\x94\xaf\xe6\x8c\x81\xe6\x80\xa7\xe4\xbe\xb5\xe5\x8f\x97\xe5\xae\xb3\xe8\x80\x85\xe5\x
8f\x91\xe5\xa3\xb0\xef\xbc\x8c\xe5\x91\x8a\xe8\xaf\x89\xe5\xa5\xb9\xe2\x80\x9c\xe4\xbd\xa0\
xe6\xb2\xa1\xe9\x94\x99\xe2\x80\x9d'}]

通过eval操作，直接就把字符串转换成了我们想要的可迭代对象（list）。从上述执行就过可以看出，eval是将里面最小单元拿出来，直接放在一个列表里面。

2、字符串转换为元组。

tuple_data = tuple(eval(str_data))
print type(tuple_data)
print tuple_data

执行结果为：

<type 'tuple'>

({'time': '1532566055', 'linkurl': 'http://m.mp.oeeee.com/a/BAAFRD00002018072592681.html', 'id': '680501', 'title': 
'\xe6\x8f\xaa\xe4\xbd\x8f\xe7\x96\xab\xe8\x8b\x97\xe4\xb8\x8d\xe6\x94\xbe\xef\xbc\x8c\xe5\x
85\xac\xe7\x9b\x8a\xe8\xaf\x89\xe8\xae\xbc\xe8\xaf\xb7\xe4\xb8\x8d\xe8\xa6\x81\xe8\xbe\x9c\
xe8\xb4\x9f\xe6\xb0\x91\xe4\xbc\x97\xe6\x9c\x9f\xe5\xbe\x85'}, {'time': '1532566055', 
'linkurl': 'http://m.mp.oeeee.com/a/BAAFRD00002018072592682.html', 'id': '680500', 'title': 
'\xe6\x94\xaf\xe6\x8c\x81\xe6\x80\xa7\xe4\xbe\xb5\xe5\x8f\x97\xe5\xae\xb3\xe8\x80\x85\xe5\x
8f\x91\xe5\xa3\xb0\xef\xbc\x8c\xe5\x91\x8a\xe8\xaf\x89\xe5\xa5\xb9\xe2\x80\x9c\xe4\xbd\xa0\
xe6\xb2\xa1\xe9\x94\x99\xe2\x80\x9d'})

同样的，我们可以清楚看到，也是将字符串里的最小单元拿出来放到一个元组里面。

3、实际上在将上述字符串转换为列表的时候，我们不指定数据类型的时候，它是会默认转换成列表的。

data = eval(str_data)