day6、7：编码补充、集合

= 赋值
== 比较值是否相等
is 比较、比较的是内存地址
id (内容)测出内存地址

li1 = [1, 2, 3]
li2 = li1
print(id(li1), id(li2))　　#18298952 18298952

数字和字符串都有小数据池
数字的范围：-5~256，数据共用同一个内存地址
字符串：摸不清

一、编码

acsii：只有英文
　　　　字符：00000000 8位 1个字节表示1个字符

unicode：一个字符都是32位
　　　　英文字符：00000000 00000000 00000000 00000000 32位 4个字节表示1个字符
　　　　中文字符：00000000 00000000 00000000 00000000 32位 4个字节表示1个字符

utf-8：
　　　　英文字符：00000000 8位 1个字节表示1个字符
　　　　中文字符：00000000 00000000 00000000 24位 3个字节表示1个字符

gbk：
　　　　英文字符：00000000 8位 1个字节表示1个字符
　　　　中文字符：00000000 00000000 16位 2个字节表示1个字符

1.各个编码之间的二进制，是不能互相识别的，会产生乱码

2.文件的储存，传输，不能是unicode（只能是utf-8,utf-16,gbk,gb2312,asciid等）

3.在python3中：

　　str在内存中是用unicode编码，不能直接传输和存储，需经过bytes类型才能完成
　　　　对于英文：
　　　　　　str ：表现形式：s = "ppd" print(s,type(s)) #ppd <class 'str'>
　　　　　　编码方式： 010101010 unicode
　　　　　　bytes ：表现形式：s1 = b"ppd" print(s1,type(s1)) #b'ppd' <class 'bytes'>
　　　　　　编码方式： 000101010 utf-8 gbk...

　　　　对于中文：
　　　　　　str ：表现形式：s2 = "中国" print(s2,type(s2)) #中国 <class 'str'>
　　　　　　编码方式： 010101010 unicode
　　　　　　bytes ：表现形式：s3 = b"x\e91\e91\e01\e21\e31\e32" print(s3,type(s3)) #SyntaxError: bytes can only contain ASCII literal characters.（报错）
　　　　　　编码方式： 000101010 utf-8 gbk...

4.encode（编码）：如何将str转换成bytes类型

s = "ppd"
s1 = s.encode("utf-8")    
print(s1)    #b'ppd'
s2 = s2.encode("gbk")    
print(s2)    #b'ppd'

s = '中国'
s1 = s.encode("utf-8")    
print(s1)    #b'\xe4\xb8\xad\xe5\x9b\xbd'
s2 = s.encode("gbk")    
print(s2)    #b'\xd6\xd0\xb9\xfa'

二、元组补充：如果元祖里面只有一个元素且不加逗号，那此元素是什么类型，就是什么类型

tu1 = (1)
tu2 = (1,)
print(tu1,type(tu1))    #<class 'int'>
print(tu2,type(tu2))    #(1,) <class 'tuple'>
tu1 = ([1])
tu2 = ([1],)
print(tu1,type(tu1))    #[1] <class 'list'>
print(tu2,type(tu2))    #([1],) <class 'tuple'>

三、集合

集合：集合是无序的，不重复的数据集合，它里面的元素是可哈希的(不可变类型)，但是集合本身是不可哈希（所以集合做不了字典的键）的。
　　集合最重要的两点：
　　　　1.去重，把一个列表变成集合，就自动去重了。
　　　　2.关系测试，测试两组数据之前的交集、差集、并集等关系。

1.创建集合

set1 = set({123,"ppd"})
set2 = {123,"ppd"}
print(set1,set2)    #{123, 'ppd'} {123, 'ppd'}

2.集合的增

set1 = {123,"ppd"}
set1.add("苹果")
print(set1)        {'苹果', 123, 'ppd'}

#update:迭代着增加
set1 = {123, "ppd"}
set1.update("ppd")
print(set1)        #{'ppd', 123, 'p', 'd'}
set1.update("苹果")
print(set1)        {'ppd', 'p', 'd', '苹', 123, '果'}
set1.update([1, 2, 3])
print(set1)        {1, 2, 3, 'ppd', 'p', 'd', '苹', 123, '果'}

3.集合的删

set1 = {123, "ppd", "苹果"}

set1.remove(123)  # 删除一个元素
print(set1)

set1.pop()  # 随机删除一个元素
print(set1)

set1.clear()  # 清空集合
print(set1)  #set()

del set1  # 删除集合
print(set1)  #NameError: name 'set1' is not defined

4.集合的查

set1 = {123, "ppd", "苹果"}
for i in set1:
    print(i)

5.集合的其他运用

5.1交集：&或者intersection

set1 = {1, 2, 3, 4, 5}
set2 = {4, 5, 6, 7, 8}
print(set1 & set2)  # {4, 5}
print(set1.intersection(set2))  # {4, 5}

5.2反交集：^ 或者symmetric_difference

set1 = {1, 2, 3, 4, 5}
set2 = {4, 5, 6, 7, 8}
print(set1 ^ set2)  # {1, 2, 3, 6, 7, 8}
print(set1.symmetric_difference(set2))  #{1, 2, 3, 6, 7, 8}

5.3并集：| 或者 union

set1 = {1, 2, 3, 4, 5}
set2 = {4, 5, 6, 7, 8}
print(set1 | set2)  # {1, 2, 3, 4, 5, 6, 7, 8}
print(set1.union(set2))  # {1, 2, 3, 4, 5, 6, 7, 8}

5.4差集：- 或者 difference

set1 = {1, 2, 3, 4, 5}
set2 = {4, 5, 6, 7, 8}                
print(set1 - set2)  # {1, 2, 3}    输出set1（set2中有set1的那部分不输出）
print(set1.difference(set2))  #{1, 2, 3}

5.5子集与超集

set1 = {1, 2, 3}
set2 = {1, 2, 3, 4, 5, 6}

print(set1 < set2)  #set1是set2子集
print(set1.issubset(set2))  # set1是set2子集

print(set2 > set1)  #set2是set1超集
print(set2.issuperset(set1))  # set2是set1超集

6.frozenset不可变集合，让集合变成不可变类型
s = frozenset("meppd")
print(s, type(s))  #frozenset({'p', 'd', 'e', 'm'}) <class 'frozenset'>

day6、7：编码补充、集合

猜你喜欢