iOS开发:URL编码解码

引出问题:当我们进行网络请求的时候,URL中有中文和特殊字符时,请求就会报错(基本都是Get请求),这个时候就需要对请求链接URL进行encode编码。

Objective-C中的URL编码解码

encode

- (NSString*)urlEncode
{
    NSString *encode = [self stringByAddingPercentEncodingWithAllowedCharacters:[NSCharacterSet URLQueryAllowedCharacterSet]];
    if (encode.length) {
        return encode;
    }
    return self;
}

编码用到了[NSCharacterSet URLQueryAllowedCharacterSet],这个我们稍后详细看一下。

decode

- (NSString*)urlDecode
{
    NSString *decode = [self stringByRemovingPercentEncoding];
    if (decode.length) {
        return decode;
    }
    return self;
}

NSCharacterSet字符集

NSCharacterSet对象表示一组Unicode兼容字符,我们对字符串进行编码用到的API是:

// Returns a new string made from the receiver by replacing all characters not in the allowedCharacters set with percent encoded characters. UTF-8 encoding is used to determine the correct percent encoded characters. Entire URL strings cannot be percent-encoded. This method is intended to percent-encode a URL component or subcomponent string, NOT the entire URL string. Any characters in allowedCharacters outside of the 7-bit ASCII range are ignored.
- (nullable NSString *)stringByAddingPercentEncodingWithAllowedCharacters:(NSCharacterSet *)allowedCharacters API_AVAILABLE(macos(10.9), ios(7.0), watchos(2.0), tvos(9.0));

通过将不在allowedCharacters集合中的所有字符替换为百分比编码字符,返回从接收器生成的新字符串。UTF-8编码用于确定编码字符的正确百分比。不能对整个URL字符串进行百分比编码。此方法旨在对URL组件或子组件字符串进行百分比编码,而不是对整个URL字符串进行百分比。allowedCharacters中超出7位ASCII范围的任何字符都将被忽略。(

意思就是:会对这个字符串进行Unicode(UTF-8)编码,另外将不在allowedCharacters集合中的所有字符替换为百分比编码字符,但你也不能对整个URL字符串进行编码,应该区别对待scheme、host、path、query。
注意点:不在allowedCharacters集合中的字符!不在allowedCharacters集合中的字符!不在allowedCharacters集合中的字符!这一点是其他博客都没说明的。

allowedCharacters这个字符集你可以自定义集合,也可以使用NSCharacterSet的类属性。

常用字符集

NSCharacterSet类属性API

@interface NSCharacterSet (NSURLUtilities)
// Predefined character sets for the six URL components and subcomponents which allow percent encoding. These character sets are passed to -stringByAddingPercentEncodingWithAllowedCharacters:.

// Returns a character set containing the characters allowed in a URL's user subcomponent.
@property (class, readonly, copy) NSCharacterSet *URLUserAllowedCharacterSet API_AVAILABLE(macos(10.9), ios(7.0), watchos(2.0), tvos(9.0));

// Returns a character set containing the characters allowed in a URL's password subcomponent.
@property (class, readonly, copy) NSCharacterSet *URLPasswordAllowedCharacterSet API_AVAILABLE(macos(10.9), ios(7.0), watchos(2.0), tvos(9.0));

// Returns a character set containing the characters allowed in a URL's host subcomponent.
@property (class, readonly, copy) NSCharacterSet *URLHostAllowedCharacterSet API_AVAILABLE(macos(10.9), ios(7.0), watchos(2.0), tvos(9.0));

// Returns a character set containing the characters allowed in a URL's path component. ';' is a legal path character, but it is recommended that it be percent-encoded for best compatibility with NSURL (-stringByAddingPercentEncodingWithAllowedCharacters: will percent-encode any ';' characters if you pass the URLPathAllowedCharacterSet).
@property (class, readonly, copy) NSCharacterSet *URLPathAllowedCharacterSet API_AVAILABLE(macos(10.9), ios(7.0), watchos(2.0), tvos(9.0));

// Returns a character set containing the characters allowed in a URL's query component.
@property (class, readonly, copy) NSCharacterSet *URLQueryAllowedCharacterSet API_AVAILABLE(macos(10.9), ios(7.0), watchos(2.0), tvos(9.0));

// Returns a character set containing the characters allowed in a URL's fragment component.
@property (class, readonly, copy) NSCharacterSet *URLFragmentAllowedCharacterSet API_AVAILABLE(macos(10.9), ios(7.0), watchos(2.0), tvos(9.0));

@end

这几个类属性有什么区别呢?只去看官方文档真不好理解有什么具体的区别。我们写一段代码简单测试一下,用这几个属性分别对 https://小明:[email protected]:80/app/home/list?name=中国&address=BJ&page=2&pageCount=&role=1#index 进行编码

URL结构

                    hierarchical part
        ┌───────────────────┴─────────────────────┐
                    authority               path
        ┌───────────────┴───────────────┐┌───┴────┐
  abc://username:[email protected]:123/path/data?key=value&key2=value2#fragid1
  └┬┘   └───────┬───────┘ └────┬────┘ └┬┘           └─────────┬─────────┘ └──┬──┘
scheme  user information     host     port                  query         fragment

  urn:example:mammal:monotreme:echidna
  └┬┘ └────────────┬───────────────┘
scheme              path

URL结构拆解

scheme host path query port user password fragment
https 192.168.1.1 /app/home/list name=中国&address=BJ&page=2&pageCount=&role=1 80 小明 pwd123 index

编码结果

类属性 编码后文本
URLUserAllowedCharacterSet https%3A%2F%2F%E5%B0%8F%E6%98%8E%3Apwd123%40192.168.1.1%3A80%2Fapp%2Fhome%2Flist%3Fname=%E4%B8%AD%E5%9B%BD&address=BJ&page=2&pageCount=&role=1%23index
URLPasswordAllowedCharacterSet https%3A%2F%2F%E5%B0%8F%E6%98%8E%3Apwd123%40192.168.1.1%3A80%2Fapp%2Fhome%2Flist%3Fname=%E4%B8%AD%E5%9B%BD&address=BJ&page=2&pageCount=&role=1%23index
URLHostAllowedCharacterSet https%3A%2F%2F%E5%B0%8F%E6%98%8E%3Apwd123%40192.168.1.1%3A80%2Fapp%2Fhome%2Flist%3Fname=%E4%B8%AD%E5%9B%BD&address=BJ&page=2&pageCount=&role=1%23index
URLPathAllowedCharacterSet https%3A//%E5%B0%8F%E6%98%8E:[email protected]:80/app/home/list%3Fname=%E4%B8%AD%E5%9B%BD&address=BJ&page=2&pageCount=&role=1%23index
URLQueryAllowedCharacterSet https://%E5%B0%8F%E6%98%8E:[email protected]:80/app/home/list?name=%E4%B8%AD%E5%9B%BD&address=BJ&page=2&pageCount=&role=1%23index
URLFragmentAllowedCharacterSet https://%E5%B0%8F%E6%98%8E:[email protected]:80/app/home/list?name=%E4%B8%AD%E5%9B%BD&address=BJ&page=2&pageCount=&role=1%23index

通过上面的表格看细节不太好比较,但是我们知道他们所编码的部分和字符集是不一样的,网络上大部分流传是这样的:

URLFragmentAllowedCharacterSet  "#%<>[\]^`{|}
URLHostAllowedCharacterSet      "#%/<>?@\^`{|}
URLPasswordAllowedCharacterSet  "#%/:<>?@[\]^`{|}
URLPathAllowedCharacterSet      "#%;<>?[\]^`{|}
URLQueryAllowedCharacterSet     "#%<>[\]^`{|}
URLUserAllowedCharacterSet      "#%/:<>?@[\]^`

那么对不对呢?依据是什么?我在Apple官网也没找到相关的资料证明这个,索性我们做一次实验吧:把ASCII中的字符用NSCharacterSet编码。
要编码的字符串是:NSString code = @" !"#$%&'()+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~" ASCII编码表中的32位到126位。

编码结果

类属性 编码后文本 被编码的字符集
URLUserAllowedCharacterSet %20!%22%23$%25&'()*+,-.%2F0123456789%3A;%3C=%3E%3F%40ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D~ @" "#%/:<>?@[\]^`{
URLPasswordAllowedCharacterSet %20!%22%23$%25&'()*+,-.%2F0123456789%3A;%3C=%3E%3F%40ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D~ @" "#%/:<>?@[\]^`{
URLHostAllowedCharacterSet %20!%22%23$%25&'()*+,-.%2F0123456789%3A;%3C=%3E%3F%40ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D~ @" "#%/:<>?@[\]^`{
URLPathAllowedCharacterSet %20!%22%23$%25&'()*+,-./0123456789:%3B%3C=%3E%3F@ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D~ @" "#%;<>?[\]^`{
URLQueryAllowedCharacterSet %20!%22%23$%25&'()*+,-./0123456789:;%3C=%3E?@ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D~ @" "#%<>[\]^`{
URLFragmentAllowedCharacterSet %20!%22%23$%25&'()*+,-./0123456789:;%3C=%3E?@ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D~ @" "#%<>[\]^`{

结论:网上流传的并不对,这个是我亲身实践得出的,开发中一般使用 URLQueryAllowedCharacterSetURLFragmentAllowedCharacterSet(他俩支持的字符集一样),这样就不会对URL常出现的 ?/: 进行编码了。

自定义字符集

经过上面的分析,我们对编码有了一定了解,那么像 '()*+,-. 等几个特殊字符,URLQueryAllowedCharacterSet 并不支持编码,和其他平台传输有乱码现象怎么办呢?这个时候就需要自定义字符集了。

    NSString *code = @" !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~";
    NSCharacterSet *invertedSet = [[NSCharacterSet characterSetWithCharactersInString:@" \"#%<>[\\]^`{|}'()*+,-."] invertedSet];
    NSString *encode = [code stringByAddingPercentEncodingWithAllowedCharacters:invertedSet];

//编码后encode: %20!%22%23$%25&%27%28%29%2A%2B%2C%2D%2E/0123456789:;%3C=%3E?@ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D~

变量 "#%<>[\]^``{|}'()*+,-. 为什么要 invertedSet 反转集合呢?因为 stringByAddingPercentEncodingWithAllowedCharacters 入参的字符集合是不会被编码的集合,我们反转之后就是对我们自定义的变量里面的字符进行编码了。

End。

猜你喜欢

转载自blog.csdn.net/wujakf/article/details/130541286
今日推荐