序列
Python有多种内建的序列,所有序列都可以做某些特定的操作,大致上常用的是:加,乘,索引,分片以及检查某个元素是否属于序列的成员。 在这里我们重点讨论两种,字典与列表。并且只讨论其索引效率
列表
列表是Python中最具灵活性的有序集合对象类型,其属性:
1. 任意对象的有序集合
2. 通过偏移读取
3. 可变长度、异构以及任意嵌套
4. 属于可变序列的分类
5. 对象引用数组
字典
除了列表外,字典是Python中最具灵活性的内置数据结构类型,其属性:
1. 通过键而不是偏移量来读取
2. 任意对象的无序集合(键的hash值存在有序)
3. 可变长、异构以及任意嵌套
4. 属于可映射类型
5. 对象引用表(散列表)
在Python中,字典是通过哈希表实现的。也就是说,字典是一个数组,而数组的索引是键经过哈希函数处理后得到的。哈希函数的目的是使键均匀地分布在数组中。
索引效率分析
通过上述可以简单地了解字典和列表的区别,列表是根据偏移量来读取的,字典是根据键的Hash来读取的。
通过实验来测试其索引效率:
#!/usr/bin/env python
# -*- coding:utf-8 -*-
###############################################
# File Name : test.py
# Author : Younger Liu
# Mail : [email protected]
# Created Time: Tue 19 Jun 2018 08:58:42 AM CST
# Description :
###############################################
import time
import random
import string
MAX_COUNT = 1000000
def test_dict(data, key=None):
b_time = int(time.time()*1000*1000)
data.get(key)
a_time = int(time.time()*1000*1000)
print("Elapsed time %d us to query [%s]" % (a_time - b_time, key))
def test_array(data, key=None):
b_time = int(time.time()*1000*1000)
for ele in data:
if ele == key:
break
continue
a_time = int(time.time()*1000*1000)
print("Elapsed time %d us to query [%s]" % (a_time - b_time, key))
def gen_arr():
arr = []
for i in range(MAX_COUNT):
ele = ''.join(random.sample(string.ascii_letters + string.digits, 8))
arr.append(ele)
if i == 0:
first = ele
elif i == MAX_COUNT - 1:
last = ele
return arr, first, last
def gen_dict():
dict = {}
for i in range(MAX_COUNT):
ele = ''.join(random.sample(string.ascii_letters + string.digits, 8))
dict[ele] = ele
if i == 0:
first = ele
elif i == MAX_COUNT -1:
last = ele
return dict, first, last
if __name__ == '__main__':
arr, first_ele, last_ele = gen_arr()
print("-----Query first ele in array-------")
test_array(arr, first_ele)
print("-----Query last ele in array-------")
test_array(arr, last_ele)
print("-----Query ele not in array-------")
test_array(arr, '111111')
dict, first_key, last_key = gen_dict()
print("-----Query first generated ele in dict-------")
test_dict(dict, first_key)
print("-----Query last generated ele in dict-------")
test_dict(dict, last_key)
print("-----Query ele not in dict-------")
test_dict(dict, '111111')
运行结果如下:
-----Query first ele in array-------
Elapsed time 10 us to query [dRP3aEZQ]
-----Query last ele in array-------
Elapsed time 37236 us to query [6agsBq37]
-----Query ele not in array-------
Elapsed time 38666 us to query [111111]
-----Query first generated ele in dict-------
Elapsed time 12 us to query [DkYN2xJL]
-----Query last generated ele in dict-------
Elapsed time 3 us to query [BrNCeWP2]
-----Query ele not in dict-------
Elapsed time 2 us to query [111111]
多次运行,其结果相差不多
由此可见,字典的索引效率要远远大于列表