Computer Networks - Top-Down Approach (Chapter 2 Learning Record)

This chapter learns about the application layer

Web applications are the raison d'être of computer networks.

Web Application Architecture

There are two dominant architectures for modern web applications: client-server architecture and peer-to-peer (P2P) architecture

Client-server architecture (client-server), in which there is one always-on host called the server, which serves requests from many other hosts called clients.

(A typical example is a web application, where an always-on web server serves requests from a browser (running on a client's host). When it receives a request, it sends the client the requested object in response.)

This structure has several characteristics : clients do not communicate directly with each other; servers have fixed, well-known addresses (IP addresses)

Well-known applications with this structure include Web, FTP, Telnet, and email

A P2P architecture in which there is minimal (or no) reliance on dedicated servers located in data centers, and instead applications use direct communication between intermittently connected pairs of hosts, called peers . This peer-to-peer communication does not have to go through a dedicated server, so the structure is called peer-to-peer.

One of the most fascinating features of this structure is its self-extensibility .

(For example, in a P2P file sharing application, although each peer generates workload by requesting files. Each peer also adds service capacity to the system by distributing files to other peers)

process communication

In operating system terms, it is actually processes rather than programs that communicate . A process can be thought of as a program running on an end system. Processes on two different end systems communicate with each other by exchanging messages across a computer network. The sending process generates and sends messages into the network, and the receiving process receives these messages and may respond with an echo message.

client and server processes

Network applications consist of pairs of processes that send messages to each other over the network.

(In a Web application, a client browser process exchanges messages with a Web server process; in a P2P system, files are transferred from a process in one peer to a process in another peer). For each pair of communication processes, one of the two processes is often identified as a client (client), and the other process is identified as a server (server),

Specifically, in a communication session scenario between a pair of processes, the process initiating the communication is identified as the client , and the process waiting to be contacted at the start of the session is the server .

Interface between process and computer network

Most programs consist of pairs of communicating processes, with the two processes in each pair sending messages to each other. Messages sent from one process to another must pass through the underlying network, and processes send and receive messages to and from the network through a software interface called a socket .

 A socket is an interface between the application layer and the transport layer within the same host. Since the socket is a programmable interface for establishing network applications, the socket is also called the application programming interface (Application Programming Interface, API) between the application program and the network, and the application developer can control the socket Everything on the application layer side, but the transport layer has very little control over that socket.

process addressing

In order for a process running on one host to send a packet to a process running on another host, the receiving process needs to have an address. In order to identify the receiving process, two kinds of information need to be defined: ①The address of the host ②In the destination host Specifies the identifier of the receiving process. In the Internet, a host is identified by its IP address (an IP address is a 32-bit quantity and it uniquely identifies the host). The receiving host uses a port number to identify the receiving process.

Transportation services provided by the Internet

The Internet provides two transport layer protocols for applications, UDP and TCP . When a software developer creates a new application for the Internet, the first decision to make is to choose UDP or TCP. Each protocol is the first to use them. Applications provide different collections of services.

TCP service 

The TCP service model includes connection-oriented services and reliable data transmission services. When an application invokes TCP as its transport protocol, the application can obtain both services from TCP.

  • Connection-oriented service: Before the application layer data packets start to flow, TCP allows the client and server to exchange transport layer control information with each other. This handshake process reminds the client and server to prepare for the arrival of a large number of packets. After the handshake phase , a TCP connection is established between the sockets of the two processes. This connection is full-duplex, that is, the processes connecting both parties can simultaneously send and receive messages on this connection. When the application finishes sending the message, the connection must be torn down
  • Reliable data delivery service: Communication processes can rely on TCP to deliver all sent data error-free and in proper order. When one side of the application streams bytes into the socket, it can rely on TCP to deliver the same stream of bytes to the receiver's socket, without byte loss and redundancy

The TCP protocol also has a congestion control mechanism. This service may not necessarily bring direct benefits to the communication process, but it can bring overall benefits to the Internet. When the network between the sender and receiver is congested, TCP's congestion control mechanism throttles the sending process.

UDP service 

UDP is a lightweight transport protocol that does not provide unnecessary services, it only provides minimal services.

  • UDP is connectionless, so there is no handshaking process before the two processes communicate.
  • The UDP protocol provides an unreliable data transmission service. When a process sends a message into a UDP socket, the UDP protocol does not guarantee that the message will reach the receiving process, and the message arriving at the receiving process may also be out of order. Arrived.

UDP does not include a congestion control mechanism, so the sender of UDP can inject data into its lower layer (network layer) at any rate it chooses

It should be noted here that neither TCP nor UDP provides any encryption mechanism. The data transmitted by the sending process into its socket is the same as the data transmitted to the destination process via the network, and is transmitted in plain text, which makes it possible for the information to be discovered in any intermediate link. Therefore, an enhanced version of TCP - Secure Sockets Layer (SSL) was developed . TCP enhanced with SSL can not only do everything that traditional TCP can do, but also provide key process-to-process security services, including encryption, data integrity and endpoint authentication.

application layer protocol

The application layer protocol defines how application processes running on different end systems transmit messages to each other, especially the application layer protocol defines:

  • The types of messages exchanged, such as request messages and response messages
  • The syntax of various message types, such as the fields in the message and how these fields are described
  • The semantics of the fields, that is, the meaning of the information in those fields
  • Rules that determine when and how a process sends and responds to messages 

Web and HTTP

HTTP profile 

The application layer protocol of the Web is the Hypertext Transfer Protocol (HyperText Transfer Protocol, HTTP), which is the core of the Web. HTTP is implemented by two programs: a client program and a server program. The client program and the server program run in different end systems and conduct conversations by exchanging HTTP messages. HTTP defines the structure of these messages and the way the client and server exchange messages.

{Web pages are made up of objects, an object is just a file, such as HTML files, JPEG graphics, Java applets, and they can be addressed by a URL address, each URL address consists of two parts: storage object server hostname and pathname of the object}

HTTP defines the way a Web client requests a Web page from a Web server, and the way the server transmits a Web page to a client. When a user requests a web page (such as clicking on a hyperlink), the browser sends an HTTP request message to the server for the objects contained in the page, and the server receives the request and responds with an HTTP response message containing these objects, as follows As shown in the figure.

HTTP uses TCP as its backing transport protocol (instead of UDP). The HTTP client first initiates a TCP connection with the server. Once the connection is established, the browser and the server process can access the TCP through the socket interface (the client's socket interface is the door between the client process and the TCP connection, and the server's The socket interface is the door between the server process and the TCP connection. The client sends an HTTP request message to its socket interface and receives an HTTP response message from its socket interface, as does the similar server side. Once the client sends a request message to its socket interface, the message is out of the client's control and enters the control of TCP, which provides reliable data transmission services for HTTP)

The server sends the requested file to the client without storing any state information about the client. If the client requests the same object continuously in a short period of time, the server will respond to each request in turn, and will not be affected by the previous response. Services are no longer available. Because the HTTP server does not store any information about the client, the HTTP protocol is said to be a stateless protocol (stateless protocol)

non-persistent connection and persistent connection

 The client and server communicate over a substantial time frame, where the client issues a series of requests and the server responds to each request, with each request/response pair sent over a separate TCP connection (non-persistent connection), or all The request and response are sent over the same TCP connection ( persistent connection ).

  • The process of non-persistent connection HTTP

In the case of a non-persistent connection, a Web page is transmitted from the server to the client (assuming that the page contains an HTML basic file and 10 JPEG graphics, and these 11 objects are located on the same server)

(1) The HTTP client process initiates a TCP connection to the server at port number 80, which is the default port for HTTP. A socket is associated with the connection on both the client and the server

(2) The HTTP client sends an HTTP request message to the server via its socket

(3) The HTTP server process receives the request message through its socket, and sends an HTTP response message that encapsulates the object to the client

(4) The HTTP server process notifies TCP to disconnect the TCP connection

(5) The HTTP client receives the response message, and the TCP connection is closed

The example steps above illustrate the use of non-persistent connections, where each TCP connection is closed after the server sends an object, ie the connection is not persisted for other objects. Each TCP connection transmits only one request message and one response message.

Round-trip time (Round-Trip Time, RTT) refers to the time it takes for a short packet to travel from the client to the server and back again. RTT includes packet propagation delay, packet queuing delay on intermediate routers and switches, and packet processing delay

Estimate the time it takes from the time the client requests the HTML base file to the time the client receives the entire file?

When the user clicks on the link, it causes the browser to initiate a TCP connection between it and the Web server; this involves a "three-way handshake" process, that is, the client sends a small TCP segment to the server, and the server uses a small TCP segment Confirmation and response are made, and finally the client returns an acknowledgment to the server.

The time spent in the first two parts of the three-way handshake takes one RTT. After the completion, the client sends an HTTP request message to the TCP connection, and the server makes an HTTP response to this HTTP request, which consumes another RTT. Therefore 

The total response time is two RTTs plus the time it takes for the server to transmit the HTML file. 

  • Persistent HTTP

Non-persistent connections have some disadvantages. First, a new connection must be established and maintained for each requested object. For each such connection, TCP buffers and TCP variables must be allocated in both the client and the server, which brings a serious burden to the Web server; Second, each object suffers a delivery delay of twice the RTT, one RTT for creating the TCP and another for requesting and receiving the object.

In the case of a persistent connection, the server keeps the TCP connection open after sending the response. Subsequent request and response messages can be sent over the same connection between the same client and server. The default mode of HTTP is to use persistent connections with pipelining.

User interaction with server : cookies

We already know that the HTTP server is stateless, but a Web site usually wants to be able to identify users, for which HTTP uses cookies.

As shown in the figure, the cookie technology has four components: ① a cookie header line in the HTTP response message; ② a cookie header line in the HTTP request message; Managed by the browser; ④A back-end database located on the Web site.

Cookies can be used to identify a user. When users first visit a site, they may be required to provide a user ID. In subsequent sessions, the browser passes a cookie header to the server, thereby identifying the user to the server. So cookies can establish a user session layer on top of stateless HTTP. But the use of cookies is still controversial because they are considered an invasion of user privacy.

web cache 

A web cache, also known as a proxy server, is a network entity that satisfies HTTP requests on behalf of the original web server.  

 

A user's browser can be configured so that all of the user's HTTP requests are first directed to the Web cache, and once a browser is configured, each browser request for an object is first directed to the Web cache.

Assuming the browser is requesting the object, here's what happens:

(1) The browser creates a TCP connection to the Web cache and sends an HTTP request to the object in the Web cache

(2) The Web cache checks to see if a copy of the object is stored locally, and if so, the Web cache returns the object to the client browser with an HTTP response message

(3) If the object is not in the Web cache, it opens a TCP connection to the object's origin server. The web cache then sends an HTTP request for the object over the cache-to-server TCP connection, and upon receipt of the request, the originating server sends an HTTP response with the object to the web cache.

(4) When the Web cache receives the object, it stores a copy in the local storage space, and sends the copy to the client's browser with an HTTP response message (through the existing client browser and the Web cache) TCP connections)

Note that a Web cache is both a server and a client. It is a server when it receives a request from the browser and sends back a response, and it is a client when it makes a request to the original server and receives a response.

There are two reasons for deploying Web caches on the Internet. First, Web caching can greatly reduce the response time to client requests; second, Web caching can greatly reduce the traffic on an organization's access link to the Internet.

e-mail in the internet

Email is an asynchronous communication medium.

The figure below gives an overview of the Internet e-mail system.

 It has three main components: user agent (user agent), mail server (mail server), Simple Mail Transfer Protocol (Simple Mail Transfer Protocol, SMTP).

A typical mail sending process: starting from the user agent of the sender, transmitted to the mail server of the sender, and then transmitted to the mail server of the receiver, and then distributed to the mailbox of the receiver here.

SMTP is the main application layer protocol in Internet e-mail. It uses the TCP reliable data transfer service to send mail from the sender's mail server to the receiver's mail server.

Suppose Alice wants to send a simple message to Bob,

(1) Alice invokes her mail agent program and provides Bob's mail address, composes a message, and instructs the user agent to send the message.

(2) Alice's user agent sends the message to her mail server, where it is placed in a message queue.

(3) The SMTP client running on Alice's mail server finds the message in the message queue, and it creates a TCP connection to the SMTP server running on Bob's mail server.

(4) After some initial SMTP handshakes, the SMTP client sends Alice's message over the TCP connection.

(5) On Bob's mail server, the SMTP server receives the message. Bob's mail server then places the message in Bob's mailbox.

(6) When Bob is convenient, he invokes the user agent to read the message.

 

SMTP generally does not use an intermediate mail server to send mail, even if the two mail servers are located on opposite sides of the world. If Bob's mail server is not powered on, the message will remain on Alice's mail server and wait for a new attempt, which means that the mail does not persist on some mail server in the middle.

Comparison of SMTP and HTTP

Both protocols are used to transfer files from one host to another: HTTP transfers files from a Web server to a Web client (usually a browser); SMTP transfers files from one mail server to another ( email messages), persistent HTTP and SMTP both use persistent connections when doing file transfers.

The difference between the two :

(1) HTTP is mainly a pull protocol (pull protocol), and users use HTTP to pull these information from the server. This TCP connection is initiated by the machine that wants to receive the file; on the other hand, SMTP is basically a push protocol (push protocol), that is, the sending mail server pushes the file to the receiving mail server, and this TCP connection is initiated by the machine that wants to send the file. and its initiation.

(2) SMTP requires each message (including their bodies) to be in 7-bit ASCII format. If a message contains non-7-bit ASCII characters or binary data, the message must be encoded according to 7-bit ASCII code. HTTP data is not subject to this restriction.

(3) The third important difference is how to handle a document that contains both text and graphics (and possibly other media types). HTTP encapsulates each object into its own HTTP response message, while SMTP takes all Message objects are placed in a message.

mail access protocol

Once SMTP delivers the mail message from Alice's mail server to Bob's mail server, the message is put into Bob's mailbox. As shown in the figure below, Alice's user agent uses SMTP to push the email message to her mail server, and her mail server (acting as an SMTP client) then uses SMTP to relay the email to Bob's mail server. Why is it divided into two steps here? Mainly because there is no way for Alice's user agent to reach an unreachable destination receiving server without relaying through Alice's mail server.

How does Bob get his mail on one of his ISP's mail servers by running a user agent on his local PC?

Bob's user agent cannot use SMTP to get the message, because fetching the message is a pull operation, and the SMTP protocol is a push protocol. We solve this problem by introducing a special mail access protocol that transfers messages from Bob's mail server to his local PC. There are currently some popular mail access protocols including the third version of the Post Office Protocol (PostOffice Protocol——Version3, POP3), Internet Mail Access Protocol (Internet Mail Access Protocol, IMAP) and HTTP 

As shown in the figure above (Figure 2-16), SMTP is used to transfer mail from the sender's mail server to the receiver's mail server, and is also used to transfer mail from the sender's user agent to the sender's mail server, such as POP3 The Mail Access Protocol is used to deliver mail from the recipient's mail server to the recipient's user agent.

 POP3

POP3 is an extremely simple mail access protocol defined by RFC1939. When the user agent (client) opens a TCP connection to the mail server (server) on port 110, POP3 begins to work. With the establishment of the TCP connection, POP3 works in three stages: authorization, transaction processing, and update. 

 IMAP

Web-based email

DNS: Directory Service for the Internet 

Hosts on the Internet, like human beings, can be identified in various ways. One way to identify a host is to use its hostname (hostname), and it can also be identified by an IP address.

Services provided by DNS

There are two ways to identify a host, by hostname or IP address. People like the easy-to-remember hostname identification method, while routers like fixed-length IP addresses with a hierarchical structure. In order to compromise, we need a directory service that can convert hostnames to IP addresses. This is the domain name system ( The main task of Domain Name System, DNS). DNS is: ① a distributed database implemented by a layered DNS server (DNS server); ② an application layer protocol that enables hosts to query distributed databases. The DNS protocol runs on UDP and uses port 53.

DNS is commonly used by other application-layer protocols, including HTTP, SMTP, and FTP, to resolve user-supplied hostnames into IP addresses.

As an example, consider what happens when a browser (ie, an HTTP client) running on a user's host requests the URL www.someschool.edu/index.html. In order for the user's host to send an HTTP request message to the Web server www.someschool.edu, the user's host must obtain the IP address of www.someschool.edu. It is done as follows

(1) Clients running DNS applications on the same user host

(2) The browser extracts the host name www.someschool.edu from the above URL, and passes this host name to the client of the DNS application

(3) The DNS client sends a request containing the host name to the DNS server

(4) The DNS client will eventually receive a response message containing the IP address corresponding to the host name

(5) Once the browser receives the IP address from DNS, it can initiate a TCP connection to the HTTP server process at port 80 of the IP address

In addition to converting hostnames to IP addresses, DNS provides some important services:

  • Host aliases A host with a complex hostname can have one or more aliases. Applications can call DNS to obtain the canonical hostname corresponding to the host alias and the IP address of the host.
  • mail server alias  
  • Load distribution   DNS is also used for load distribution between redundant servers

Overview of DNS working mechanism

Assume that some applications running on the user's host need to convert the host name to an IP address. These applications will call the DNS client and specify the hostnames that need to be translated. After the DNS on the user host receives it, it sends a DNS query message to the network. All DNS request and reply messages are sent over port 53 using UDP datagrams. After a certain time delay, the DNS on the user's host computer receives a DNS reply message providing the desired mapping, and the mapping result is passed to the application program calling the DNS. Therefore, from the perspective of invoking applications on the user host, DNS is a black box that provides simple and direct conversion services.

A simple design of DNS is to use only one DNS server on the Internet, which contains all the mappings. In this centralized design, clients send all queries directly to a single DNS server, and the DNS server responds directly to all query clients, but the problems with this centralized design include:

(1) Single point of failure If the DNS server crashes, the entire Internet will be paralyzed

(2) Communication capacity A single DNS has to handle all DNS queries

(3) Long-distance centralized database. It is impossible for a single DNS server to be "close" to all query clients, so it may cause serious delay

(4) Maintenance A single DNS server has to keep records for all Internet hosts, which will make this central database huge.

Therefore, DNS adopts a distributed design scheme

(1) Distributed, hierarchical structure

In order to deal with the scalability problem, DNS uses a large number of DNS servers, which are organized in a hierarchical manner and distributed all over the world. No DNS server has the mappings of all hosts on the Internet. Instead, these mappings are distributed on all DNS servers. Generally speaking, there are three types of DNS servers: root DNS servers, top-level domain (Top-Level Domain, TLD) DNS servers and authoritative DNS servers 

  • Root Servers There are more than 400 root servers all over the world. These root servers are managed by 13 different organizations. The root servers provide the IP addresses of the TLD servers
  • Top-level domain DNS servers   There are TLD servers for every top-level domain and for all country top-level domains. The TLD server provides the IP address of the authoritative DNS server
  • Authoritative DNS server Every organization with publicly accessible hosts on the Internet must provide publicly accessible DNS records that map the names of those hosts to IP addresses

The root, TLD, and authoritative DNS servers are all in the hierarchical structure of the DNS server. There is another important type of DNS server called the local DNS server . Strictly speaking, a local DNS server does not belong to The server hierarchy, but he is critical to the DNS hierarchy. Each ISP has a local DNS server, and when a host connects to an ISP, that ISP provides the IP address of a host that has the IP addresses of one or more of its local DNS servers

 DNS cache

 In order to improve latency performance and reduce the number of DNS packets transmitted around the Internet, DNS uses caching technology extensively. The principle of DNS caching is very simple. In a request chain, when a DNS server receives a DNS answer, it can cache the mapping in local storage. In fact, because of caching, except for a few DNS queries, the root server is bypassed. .

P2P file distribution

The applications we have described so far have used the client-server model, relying heavily on always-on infrastructure servers, whereas using a p2p architecture with minimal (or no) reliance on always-on infrastructure servers becomes Intermittently connected hosts (called peers) communicate directly with each other, and these peers are not owned by the service provider.

In the client-server mode for file distribution, the server must send a copy of the file to each peer, ie the server is under a great burden and consumes a lot of server bandwidth. In P2P file distribution, each peer is able to resend any part of the file that it has received to any other peer, thereby assisting the server in the distribution process.

Applications with a P2P architecture can be self-scaling. The immediate cause of this scalability is that peers, in addition to being consumers of bits, are also redistributors of them.

 

 

Guess you like

Origin blog.csdn.net/yangSHU21/article/details/131299155