项目问题

使用函数translate()进行字符过滤时发现函数不能对str类型进行过滤，只能对bytes格式进行过滤。因此首先将字符串编码为bytes格式，过滤之后再将其解码为str格式，最后输出。

项目代码如下

import os
import string

def rename_files():
    #(1) get file names from a folder
     #r stands for rawpack (r代表原包装，表示接受字符串本身，不进行转译)
    file_list = os.listdir(r'E:\datamining\spyder\Udacity\course2_使用函数\prank\prank') 
    print(file_list)
    #saved_path = os.getcwd() #cwd: Current Working Directory 
    #设置当前工作路径
    os.chdir(r'E:\datamining\spyder\Udacity\course2_使用函数\prank\prank') 


    #(2) for each file,rename filename
    for file_name in file_list:
        #取得的file_name默认为str类型数据，此步将file_name以utf-8格式编码转为bytes，以便translate进行处理     
        file_name = file_name.encode(encoding = 'utf-8') 
        #利用string.translate进行字符过滤（此处用bytes格式过滤）
        os.rename(file_name,file_name.translate(None,b'0123456789'))
        #过滤后，将file_name解码为str格式
        file_name = file_name.decode() 


rename_files()

## 解决方案 ##

Python3中内置类型bytes和str用法及byte和string之间各种编码转换

Python 3最重要的新特性大概要算是对文本和二进制数据作了更为清晰的区分。文本总是Unicode，由str类型表示，二进制数据则由bytes类型表示。Python 3不会以任意隐式的方式混用str和bytes，正是这使得两者的区分特别清晰。你不能拼接字符串和字节包，也无法在字节包里搜索字符串（反之亦然），也不能将字符串传入参数为字节包的函数（反之亦然）.

python3.0中怎么创建bytes型数据

bytes([ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 ])

bytes( "python" , 'ascii' ) # 字符串，编码

设置一个原始的字符串

 
         >>> website  
         =  
         'http://www.169it.com/os' 
        
         >>>  
         type 
         (website) 
        
         < 
         class  
         'str' 
         > 
        
         >>> website 
        
         'http://www.169it.com/os' 
        
         >>>

按utf-8的方式编码，转成bytes

 
         >>> website_bytes_utf8  
         =  
         website.encode(encoding 
         = 
         "utf-8" 
         ) 
        
         >>>  
         type 
         (website_bytes_utf8) 
        
         < 
         class  
         'bytes' 
         > 
        
         >>> website_bytes_utf8 
        
         b 
         'http://www.169it.com/os' 
        
         >>>

按gb2312的方式编码，转成bytes

 
         >>> website_bytes_gb2312  
         =  
         website.encode(encoding 
         = 
         "gb2312" 
         ) 
        
         >>>  
         type 
         (website_bytes_gb2312) 
        
         < 
         class  
         'bytes' 
         > 
        
         >>> website_bytes_gb2312 
        
         b 
         'http://www.169it.com/os' 
        
         >>>

解码成string，默认不填

 
         >>> website_string  
         =  
         website_bytes_utf8.decode() 
        
         >>>  
         type 
         (website_string) 
        
         < 
         class  
         'str' 
         > 
        
         >>> website_string 
        
         'http://www.169it.com/os' 
        
         >>> 
        
         >>>

解码成string，使用gb2312的方式

 
     >>> website_string_gb2312  
     =  
     website_bytes_gb2312.decode( 
     "gb2312" 
     ) 
    
     >>>  
     type 
     (website_string_gb2312) 
    
     < 
     class  
     'str' 
     > 
    
     >>> website_string_gb2312 
    
     'http://www.169it.com/os' 
    
     >>>