Four Advanced articles (say 8) 15 Shu be tolerant to diversity: HTTP entity data

Today I want to share with you the topic is "be tolerant to diversity: the HTTP entity data."

       This lecture is the first lecture "Advanced chapter" from now on, I will speak of a continuous length of 8 to detailed analysis in a variety of HTTP protocol header fields, including the definition, function, usage, and precautions Wait. School finished these courses, you can fully grasp the HTTP protocol.

       In the previous "Basics" in our understanding of the structure of HTTP packets, know a HTTP message is a "header + body" composed of. But then we focus on is the header, not related to the body. So, the first lecture "Advanced chapter" Starting from HTTP on the body.

Encoding the data type

       In the TCP / IP protocol stack, the data transmission are basically "header + body" format. However, TCP, UDP transport layer protocol is because they do not care what body of data is, as long as the data is sent to the other party even if it is to complete the task.

       The HTTP protocol is different, it is the application layer protocol, after the data arrives at work can only be said to be half completed, must also be told what this upper layer application data for the job or the upper application will "know what to do."

       You can imagine, if not inform HTTP data type of the function, the server "a big lump" of data sent to the browser, the browser sees is a "black box", this time how to do it?

       Of course, it can "guess." Because a lot of the data is a fixed format, so check through the first few bytes of data may be able to know that this is a GIF picture, or a MP3 music files, but this is no doubt a very inefficient way, and there will be a great chance check out the file type.

       Fortunately, long before the birth of the HTTP protocol already we have a solution for this problem, but it is used in e-mail system, so that e-mail can send any data other than ASCII code, the name of the program called " multipurpose Internet Mail extensions "(multipurpose Internet Mail extensions), referred to as MIME.

MIME is a lot of standards, but only HTTP "pilfering" take a part of it, used to mark the body of data type, which is what we usually always hear the " MIME of the type ."

       MIME data into eight categories, each category is subdivided at a plurality of sub-classes, the form "type / subtype" string clever enough, just in line with the characteristics of the plaintext HTTP, it is possible to easily into the HTTP header field.

Here a brief look at a few categories listed in the HTTP frequently encountered:

text: text format that is readable data, we should be the most familiar with text / html, and represents the hypertext documents, in addition to plain text text / plain, stylesheet text / css and so on.

image: i.e. the image file, there are image / gif, image / jpeg, image / png like.

audio / video: audio and video data, such as audio / mpeg, video / mp4 the like.

application: data format is not fixed, may be binary text may be, it must be explained by the upper application. Common are application / json, application / javascript, application / pdf, etc. In addition, if you really do not know what type of data is, as just said "black box", will be application / octet-stream, that is opaque binary data.

But only a MIME type is not enough, because HTTP in order to save transmission bandwidth and sometimes even compressed data, in order not to let the browser continues to "guess", also need to have a "Encoding type", tell what data is encoded with format, so that the other side can correctly decompress, to restore the original data.

Compared MIME type is, Encoding type a lot less, common only the following three ways:

gzip: GNU zip compression format, is the Internet's most popular compression formats;

deflate: zlib (deflate) compression format, popularity second only to gzip;

br: A new compression algorithm (Brotli) specifically for HTTP optimization.

Header field of the data types used

       With MIME type and Encoding type, regardless of the browser or the server can easily identify the type of body, it will be able to process the data correctly.

       To this end HTTP protocol defines two Accept request header field and two Content entity header fields for the client and server " content negotiation ." That is, the client tells the server with the Accept header hope what kind of data is received, and the server using the Content header to tell the client what actually transmitted data.

 

Accept field marker is the client appreciated MIME type, can be used "," a plurality of types are listed as separator, so that the server has more options, such as the head of the following:

 

Accept: text/html,application/xml,image/webp,image/png

Copy the code 

This is to tell the server: "I can understand HTML, XML text, as well as webp and png images, please give me these four types of formatted data."

Accordingly, the server will be used in the response packet header fields in the Content-Type tells the true type of entity data:

 

Content-Type: text/html

 

Content-Type: image/png

Copy the code 

See this message in the browser type is "text / html" to know HTML file, calls the layout engine to render a page, see "image / png" that this is a PNG file, it will appear on the page the image.

Accept-Encoding field markers are compression format supported by the client, such as the above said gzip, deflate the like, also can be used "," multiple lists, the server may choose one of the compressed data, the compression format actually used in response header field Content-Encoding in.

 

Accept-Encoding: gzip, deflate, br

 

Content-Encoding: gzip

Copy the code 

However, these two fields can be omitted, if there is no request packet Accept-Encoding field, it indicates that the client does not support the compressed data; if there are no packets Content-Encoding field of a response message, it means that the response data is not compressed.

Language type and encoding

       MIME type and Encoding type to solve the problem of computer understanding of body data, but the Internet around the world, people from different countries in different regions use a lot of different languages, though they are text / html, but how to make the browser show everyone can understand the language can read it?

       This is in fact the "internationalization" of the problem. HTTP uses a data type with a similar solution, and the introduction of two concepts: the type of language and character set.

       The so-called " language type " is the human use of natural language, such as English, Chinese, Japanese, etc., but these natural language may also affiliated regional dialect, so the need to have a clear distinction between when to use "type-subtype" of form, but here different formats and data type separator is not "/", but "-" .

       A few examples: en represent any English, en-US for US English, en-GB for British English, and Chinese zh-CN says that we use most often.

       Something about the computer processing of natural language there is a more troublesome called "character set."

       In the early computer development, people of various countries and regions, "fragmentation", invented a lot of ways to handle character encoding text, such as English-speaking world with ASCII, Chinese world with GBK, BIG5, Japanese and other world with Shift_JIS. The same piece of text displayed in a normal coding, it becomes possible for another mess after encoding.

       So then appeared Unicode and UTF-8, all the languages ​​of the world are housed in an encoding scheme where, UTF-8 character set has become a standard on the Internet.

The type of language used header fields

       The same, HTTP protocols using the Accept request header field, and entity header fields Content for client and server "on the language and encoding content negotiation ."

       Accept-Language field tag client appreciated natural language, but also allows "," a plurality of types of delimiters are listed, for example:

 

Accept-Language: zh-CN, zh, en

Copy the code 

The request header tells the server: "Give me the best Chinese characters zh-CN, if not to use other Chinese dialects, if you do not give the English."

Accordingly, the server should respond with a message in the header field Content-Language tells the client entity data type of the actual language used:

 

Content-Language: zh-CN

Copy the code 

Request in the HTTP header field in the character set used is the Accept-the Charset , Content-Charset header but the response is not actually corresponding to, but in the Content-Type after the data type of the field represented by "charset = xxx", this requires pay attention.

For example, the browser requests GBK, or UTF-8 character set, and then the server returns the UTF-8 encoding, this is the following:

 

Accept-Charset: gbk, utf-8

 

Content-Type: text/html; charset=utf-8

Copy the code 

But now the browsers support multiple character sets, typically does not send Accept-Charset, and the server will not send a Content-Language, because the language used can be inferred from the character set, it is generally only in the request in advance there Accept-Language field, there will only be a response in advance Content-Type field.

 

 

 

The value of quality content negotiation

       When used Accept, Accept-Encoding, Accept-Language header field in the request and the like in the HTTP protocol for content negotiation, may also be represented by a special "q" parameter weights to prioritize, where "q" is " quality factor "means.

       Maximum weights are 1, a minimum value of 0.01, a default value is 1, it means that if the value 0 is rejected. In the form of a particular type of data is added after the code or language ";", then "q = value".

       Here to remind that ";" usage in most programming languages ​​";" punctuate the tone better than, "", and in the HTTP content negotiation is actually just reversed over ";" meaning less than " ,"of.

Accept example, the following fields:

 

Accept: text/html,application/xml;q=0.9,*/*;q=0.8

Copy the code 

It represents the best browser you want to use the HTML file, the weight is 1, followed by the XML file, the weight is 0.9, and finally any data type, weight is 0.8. After the server receives the request header, it will calculate the weight, and then output HTML or XML priority according to the actual situation.

Content negotiation results

Content negotiation process is opaque, the algorithm used by each Web server are not the same. But sometimes, the server will respond in advance to pay more in a Vary field, recorded in the server content negotiation reference request header field, give a little information, such as:

 

Vary: Accept-Encoding,User-Agent,Accept

Copy the code 

Vary this field indicates that the server based on the Accept-Encoding, User-Agent and Accept header field three, then decided to send back the response packet.

Vary field can be considered in response to a special "version tag" messages. Accept changes every time the first peer requests, Vary will change along with the response packet. In other words, the same URI may have several different "versions", mainly used in the middle of the transmission link of proxy servers to cache service, there will be another mention when talking about "HTTP cache" after this.

Hands-on experiments

       Above finished the theoretical part, the next step is the actual hands-on. You can use our test environment, there is a mime in the directory www directory, which is pre-stored for several files, you can use to access URI in the form of, for example, "/ 15-1 name = file?":

 

http://www.chrono.com/15-1?name=a.json

 

http://www.chrono.com/15-1?name=a.xml

Copy the code 

Open the Developer Tools in Chrome where you can see Accept and Content Head:

 

 

 

You can also copy any files to mime directory, such as archive, MP3, pictures, videos, etc., then Chrome access to observe more of MIME type.

After these experiences, you can also leave the experimental environment, direct access to the major portals, take a look at the real-world HTTP network packets look like.

summary

Today we have learned in the HTTP data type and the type of language, here today to be a summary of the content.

 

 

 

Data indicate what type of entity data is to use a MIME type, the associated header field Accept and Content-Type;

Data compression coded representation of the entity data, associated Accept-Encoding header field and Content-Encoding;

It represents the language type of natural language, the entity header field related data, and the Accept-Language Content-Language;

Character set encoding entity data associated Accept-Charset header field and Content-Type;

Clients need to be "content negotiation", the server returns the required data in the most appropriate request header field to the server to use the Accept the like in advance;

Accept header field and the like can be used "," number of possible options in the order listed, may also be used "; q =" parameter to specify the exact weight.

Lesson at work

Try to explain this request header "Accept-Encoding: gzip, deflate; q = 1.0, *; q = 0.5, br; q = 0", and then simulate what response to the first server.

Suppose you want to use the POST method to submit some data in JSON format to the server, which contains the Chinese, the request header should look like it?

Try to use express delivery receipt metaphor about MIME, Encoding concepts.

Guess you like

Origin www.cnblogs.com/wxcx/p/12616574.html