《TCP/IP详解 卷2》 笔记:TCP的输出函数:tcp_output

版权声明:本文为博主原创文章,转载请注明原文出处。 https://blog.csdn.net/woay2008/article/details/79620953
tcp_output函数负责发送TCP报文段,代码中有很多地方都调用了它。
    tcp_usrreq函数在多种请求处理中调用了它:处理PRU_CONNECT,发送初始SYN;处理PRU_SHUTDOWN,发送FIN;处理PRU_RCVD,应用进程从socket接收缓冲区中读取若干数据后可能需要发送新的窗口大小通告;处理PRU_SEND,发送数据;处理PRU_SENDOOB,发送带外数据。
    tcp_fasttimo函数调用它发送延迟的ACK;
    tcp_timers函数在重传定时器超时时,调用它重传报文段;
    tcp_timers函数在持续定时器超时时,调用它发送窗口探测报文段;
    tcp_drop函数调用它发送RST;
    tcp_disconnect函数调用它发送FIN;
    tcp_input函数在需要输出或需要立即发送ACK时调用它;
    tcp_input函数在收到一个有效的纯ACK报文段且本地有数据发送时调用它;

    tcp_input函数在连续收到3个重复的ACK时,调用它快速重传一个报文段;

上述读者未见过的函数我会在后续文章中介绍。

在TCP首部中有6个标志位。它们中的多个可同时被置位。下面是每个标志的含义:
    URG    紧急指针有效。
    ACK    确认序号有效。
    PSH    接收方应该尽快将这个报文段交给应用层。
    RST    复位连接。
    SYN    用来发起一个连接。

    FIN    用来终止一个连接。

ACK、RST、SYN、FIN这几个标志位和TCP连接的建立和断开有关,它们是根据连接的状态设置的,TCP连接状态的定义如下:

/*
 * TCP FSM state definitions.
 * Per RFC793, September, 1981.
 */

#define	TCP_NSTATES	11

#define	TCPS_CLOSED		0	/* closed */
#define	TCPS_LISTEN		1	/* listening for connection */
#define	TCPS_SYN_SENT		2	/* active, have sent syn */
#define	TCPS_SYN_RECEIVED	3	/* have send and received syn */
/* states < TCPS_ESTABLISHED are those where connections not established */
#define	TCPS_ESTABLISHED	4	/* established */
#define	TCPS_CLOSE_WAIT		5	/* rcvd fin, waiting for close */
/* states > TCPS_CLOSE_WAIT are those where user has closed */
#define	TCPS_FIN_WAIT_1		6	/* have closed, sent fin */
#define	TCPS_CLOSING		7	/* closed xchd FIN; await FIN ACK */
#define	TCPS_LAST_ACK		8	/* had fin and close; await FIN ACK */
/* states > TCPS_CLOSE_WAIT && < TCPS_FIN_WAIT_2 await ACK of FIN */
#define	TCPS_FIN_WAIT_2		9	/* have closed, fin is acked */
#define	TCPS_TIME_WAIT		10	/* in 2*msl quiet wait after close */

#define	TCPS_HAVERCVDSYN(s)	((s) >= TCPS_SYN_RECEIVED)
#define	TCPS_HAVERCVDFIN(s)	((s) >= TCPS_TIME_WAIT)

char *tcpstates[] = {
	"CLOSED",	"LISTEN",	"SYN_SENT",	"SYN_RCVD",
	"ESTABLISHED",	"CLOSE_WAIT",	"FIN_WAIT_1",	"CLOSING",
	"LAST_ACK",	"FIN_WAIT_2",	"TIME_WAIT",
};

在什么样的状态需要发送什么标志位由tcp_outflags数组决定,它的定义如下:

/*
 * Flags used when sending segments in tcp_output.
 * Basic flags (TH_RST,TH_ACK,TH_SYN,TH_FIN) are totally
 * determined by state, with the proviso that TH_FIN is sent only
 * if all data queued for output is included in the segment.
 */
u_char	tcp_outflags[TCP_NSTATES] = {
    TH_RST|TH_ACK, 0, TH_SYN, TH_SYN|TH_ACK,
    TH_ACK, TH_ACK,
    TH_FIN|TH_ACK, TH_FIN|TH_ACK, TH_FIN|TH_ACK, TH_ACK, TH_ACK,
};

举例来说,当客户端执行主动打开时,连接的状态变为SYN_SENT,该状态对应的标志位是TH_SYN,也就是要发出SYN报文段,然后tcp_output函数就会发出SYN报文段。当服务端执行被动打开时,收到SYN包后,连接的状态变为SYN_RCVD,该状态的标志是TH_SYN|TH_ACK,TCP就会回SYN、ACK报文段。

为方便读者查阅,再补张TCP连接的状态变迁图:


tcp_output函数的代码如下:

/*
 * Tcp output routine: figure out what should be sent and send it.
 */
int
tcp_output(tp)
	register struct tcpcb *tp;
{
	register struct socket *so = tp->t_inpcb->inp_socket;
	register long len, win;
	int off, flags, error;
	register struct mbuf *m;
	register struct tcpiphdr *ti;
	u_char opt[MAX_TCPOPTLEN];
	unsigned optlen, hdrlen;
	int idle, sendalot;

	/*
	 * Determine length of data that should be transmitted,
	 * and flags that will be used.
	 * If there is some data or critical controls (SYN, RST)
	 * to send, then transmit; otherwise, investigate further.
	 */
	idle = (tp->snd_max == tp->snd_una); /*没有数据等待被确认,认为连接空闲*/
	if (idle && tp->t_idle >= tp->t_rxtcur) /*连接已空闲一段时间*/
		/*
		 * We have been idle for "a while" and no acks are
		 * expected to clock out any data we send --
		 * slow start to get ack "clock" running again.
		 */
		tp->snd_cwnd = tp->t_maxseg; /*拥塞窗口设置为一个报文段大小,强迫执行慢启动*/
again:
	sendalot = 0; /*该变量表示是否需要发送更多数据*/
	off = tp->snd_nxt - tp->snd_una; /*要发送的字节到发送窗口左边沿的偏移量,也就是已发送还未被确认的字节数*/
	win = min(tp->snd_wnd, tp->snd_cwnd); /*实际发送窗口大小是发送窗口和拥塞窗口的最小值*/

	flags = tcp_outflags[tp->t_state]; /*TCP首部中的标志位*/
	/*
	 * If in persist timeout with window of 0, send 1 byte.
	 * Otherwise, if window is small but nonzero
	 * and timer expired, we will send what we can
	 * and go to transmit state.
	 */
	if (tp->t_force) { /*t_force在持续状态或需要发送带外数据时置1*/
		if (win == 0) { /*持续状态*/
			/*
			 * If we still have some data to send, then
			 * clear the FIN bit.  Usually this would
			 * happen below when it realizes that we
			 * aren't sending all the data.  However,
			 * if we have exactly 1 byte of unset data,
			 * then it won't clear the FIN bit below,
			 * and if we are in persist state, we wind
			 * up sending the packet without recording
			 * that we sent the FIN bit.
			 *
			 * We can't just blindly clear the FIN bit,
			 * because if we don't have any more data
			 * to send then the probe will be the FIN
			 * itself.
			 */
			if (off < so->so_snd.sb_cc) /*还有待发送的数据,清除FIN标志*/
				flags &= ~TH_FIN;
			win = 1; /*窗口大小置为1,才能发出包含1字节数据的窗口探测报文*/
		} else { /*需要发送带外数据*/
			tp->t_timer[TCPT_PERSIST] = 0; /*持续定时器复位*/
			tp->t_rxtshift = 0; /*指数退避算法的索引清零*/
		}
	}

	len = min(so->so_snd.sb_cc, win) - off; /*本次能发出的数据量*/

	if (len < 0) { /*出现这种情况可能是1. 已发出FIN但未被确认,而且未开始重传FIN 2. 对端通告的窗口缩小了*/
		/*
		 * If FIN has been sent but not acked,
		 * but we haven't been called to retransmit,
		 * len will be -1.  Otherwise, window shrank
		 * after we sent into it.  If window shrank to 0,
		 * cancel pending retransmit and pull snd_nxt
		 * back to (closed) window.  We will enter persist
		 * state below.  If the window didn't close completely,
		 * just wait for an ACK.
		 */
		len = 0;
		if (win == 0) { /*对端通告的窗口大小为0*/
			tp->t_timer[TCPT_REXMT] = 0; /*重传定时器复位*/
			tp->snd_nxt = tp->snd_una; /*重置snd_nxt,准备进入持续状态*/
		}
	}
	if (len > tp->t_maxseg) { /*发送的数据量不能超过MSS*/
		len = tp->t_maxseg;
		sendalot = 1;
	}
	if (SEQ_LT(tp->snd_nxt + len, tp->snd_una + so->so_snd.sb_cc)) /*本次发送不能清空发送缓冲区,清除FIN标志*/
		flags &= ~TH_FIN;

	win = sbspace(&so->so_rcv); /*win变成了接收缓冲区可用空间*/

	/*
	 * Sender silly window avoidance.  If connection is idle
	 * and can send all data, a maximum segment,
	 * at least a maximum default-size segment do it,
	 * or are forced, do it; otherwise don't bother.
	 * If peer's buffer is tiny, then send
	 * when window is at least half open.
	 * If retransmitting (possibly after persist timer forced us
	 * to send into a small window), then must resend.
	 */
	if (len) { /*避免窗口糊涂症*/
		if (len == tp->t_maxseg) /*能够发送一个报文段大小,立即发送*/
			goto send;
		/*Nagle算法规定:如果某个连接需要等待对端的确认,则不允许TCP发送长度小于MSS的报文段*/
		if ((idle || tp->t_flags & TF_NODELAY) && /*(连接空闲或者禁止了Nagle算法)且本次发送清空发送缓冲区,立即发送*/
		    len + off >= so->so_snd.sb_cc)
			goto send;
		if (tp->t_force) /*需要强制发送数据,立即发送*/
			goto send;
		if (len >= tp->max_sndwnd / 2) /*接收方的窗口已至少打开了一半,则发送数据*/
			goto send;
		if (SEQ_LT(tp->snd_nxt, tp->snd_max)) /*重传的数据,立即发送*/
			goto send;
	}

	/*
	 * Compare available window to amount of window
	 * known to peer (as advertised window less
	 * next expected input).  If the difference is at least two
	 * max size segments, or at least 50% of the maximum possible
	 * window, then want to send a window update to peer.
	 */
	if (win > 0) { /*接收缓冲区有空间,决定是否发送窗口更新报文*/
		/* 
		 * "adv" is the amount we can increase the window,
		 * taking into account that we are limited by
		 * TCP_MAXWIN << tp->rcv_scale.
		 */
		long adv = min(win, (long)TCP_MAXWIN << tp->rcv_scale) -
			(tp->rcv_adv - tp->rcv_nxt); /*adv为接收窗口(通告给对端的)已打开的字节数*/

		if (adv >= (long) (2 * tp->t_maxseg)) /*增加的接收缓冲区空间(接收窗口)超过2个报文段大小,发送窗口更新*/
			goto send;
		if (2 * adv >= (long) so->so_rcv.sb_hiwat) /*增加的接收缓冲区空间(接收窗口)大于最大接收缓存的一半,发送窗口更新*/
			goto send;
	}

	/*
	 * Send if we owe peer an ACK.
	 */
	if (tp->t_flags & TF_ACKNOW) /*需要立即发送ACK,发送*/
		goto send;
	if (flags & (TH_SYN|TH_RST)) /*需要发送SYN或RST,发送*/
		goto send;
	if (SEQ_GT(tp->snd_up, tp->snd_una)) /*需要发送紧急数据,发送*/
		goto send;
	/*
	 * If our state indicates that FIN should be sent
	 * and we have not yet done so, or we're retransmitting the FIN,
	 * then we need to send.
	 */
	if (flags & TH_FIN && /*需要发送FIN。如果FIN未发送过,或者FIN等待重传,立即发送*/
	    ((tp->t_flags & TF_SENTFIN) == 0 || tp->snd_nxt == tp->snd_una))
		goto send;

	/*
	 * TCP window updates are not reliable, rather a polling protocol
	 * using ``persist'' packets is used to insure receipt of window
	 * updates.  The three ``states'' for the output side are:
	 *	idle			not doing retransmits or persists
	 *	persisting		to move a small or zero window
	 *	(re)transmitting	and thereby not persisting
	 *
	 * tp->t_timer[TCPT_PERSIST]
	 *	is set when we are in persist state.
	 * tp->t_force
	 *	is set when we are called to send a persist packet.
	 * tp->t_timer[TCPT_REXMT]
	 *	is set when we are retransmitting
	 * The output side is idle when both timers are zero.
	 *
	 * If send window is too small, there is data to transmit, and no
	 * retransmit or persist is pending, then go to persist state.
	 * If nothing happens soon, send when timer expires:
	 * if window is nonzero, transmit what we can,
	 * otherwise force out a byte.
	 */
	if (so->so_snd.sb_cc && tp->t_timer[TCPT_REXMT] == 0 &&
	    tp->t_timer[TCPT_PERSIST] == 0) { /*为了处理对端通告的窗口过小,也没有特殊原因需立即发送报文段的情况*/
		tp->t_rxtshift = 0;
		tcp_setpersist(tp); /*进入持续状态*/
	}

	/*
	 * No reason to send a segment, just return.
	 */
	return (0); /*再也没有任何理由需要发送报文段了,返回!*/

send:
	/*
	 * Before ESTABLISHED, force sending of initial options
	 * unless TCP set not to do any options.
	 * NOTE: we assume that the IP/TCP header plus TCP options
	 * always fit in a single mbuf, leaving room for a maximum
	 * link header, i.e.
	 *	max_linkhdr + sizeof (struct tcpiphdr) + optlen <= MHLEN
	 */
	optlen = 0;
	hdrlen = sizeof (struct tcpiphdr); /*TCP首部和IP首部大小,40字节*/
	if (flags & TH_SYN) { /*发送三次握手的第一个SYN*/
		tp->snd_nxt = tp->iss;
		if ((tp->t_flags & TF_NOOPT) == 0) { /*支持TCP选项*/
			u_short mss;

			opt[0] = TCPOPT_MAXSEG;
			opt[1] = 4;
			mss = htons((u_short) tcp_mss(tp, 0)); /*填充MSS选项*/
			bcopy((caddr_t)&mss, (caddr_t)(opt + 2), sizeof(mss));
			optlen = 4;
	 
			if ((tp->t_flags & TF_REQ_SCALE) &&
			    ((flags & TH_ACK) == 0 ||
			    (tp->t_flags & TF_RCVD_SCALE))) { /*支持窗口缩放因子选项*/
				*((u_long *) (opt + optlen)) = htonl( /*填充窗口缩放因子选项*/
					TCPOPT_NOP << 24 |
					TCPOPT_WINDOW << 16 |
					TCPOLEN_WINDOW << 8 |
					tp->request_r_scale);
				optlen += 4;
			}
		}
 	}
 
 	/*
	 * Send a timestamp and echo-reply if this is a SYN and our side 
	 * wants to use timestamps (TF_REQ_TSTMP is set) or both our side
	 * and our peer have sent timestamps in our SYN's.
 	 */
 	if ((tp->t_flags & (TF_REQ_TSTMP|TF_NOOPT)) == TF_REQ_TSTMP && /*支持时间戳选项*/
 	     (flags & TH_RST) == 0 &&
 	    ((flags & (TH_SYN|TH_ACK)) == TH_SYN || /*SYN报文或接收到带有时间戳的报文*/
	     (tp->t_flags & TF_RCVD_TSTMP))) { /*填充时间戳选项*/
		u_long *lp = (u_long *)(opt + optlen);
 
 		/* Form timestamp option as shown in appendix A of RFC 1323. */
 		*lp++ = htonl(TCPOPT_TSTAMP_HDR);
 		*lp++ = htonl(tcp_now);
 		*lp   = htonl(tp->ts_recent);
 		optlen += TCPOLEN_TSTAMP_APPA;
 	}

 	hdrlen += optlen; /*加上TCP选项后的TCP和IP首部长度*/
 
	/*
	 * Adjust data length if insertion of options will
	 * bump the packet length beyond the t_maxseg length.
	 */
	if (len > tp->t_maxseg - optlen) { /*调整加上选项后能发送的数据量*/
		len = tp->t_maxseg - optlen;
		sendalot = 1; /*表示还需发送更多的数据*/
		flags &= ~TH_FIN;
	 }


#ifdef DIAGNOSTIC
 	if (max_linkhdr + hdrlen > MHLEN)
		panic("tcphdr too big");
#endif

	/*
	 * Grab a header mbuf, attaching a copy of data to
	 * be transmitted, and initialize the header from
	 * the template for sends on this connection.
	 */
	if (len) { /*发送的报文段中带数据*/
		if (tp->t_force && len == 1)
			tcpstat.tcps_sndprobe++; /*窗口探测报文统计*/
		else if (SEQ_LT(tp->snd_nxt, tp->snd_max)) {
			tcpstat.tcps_sndrexmitpack++; /*重传的数据报文统计*/
			tcpstat.tcps_sndrexmitbyte += len;
		} else {
			tcpstat.tcps_sndpack++; /*正常的数据报文统计*/
			tcpstat.tcps_sndbyte += len;
		}

		MGETHDR(m, M_DONTWAIT, MT_HEADER); /*获取一个包含分组首部的mbuf*/
		if (m == NULL) {
			error = ENOBUFS;
			goto out;
		}
		m->m_data += max_linkhdr; /*为链路层首部预留空间*/
		m->m_len = hdrlen;
		if (len <= MHLEN - hdrlen - max_linkhdr) { /*数据能放在一个mbuf中,直接从发送缓冲区中复制数据*/
			m_copydata(so->so_snd.sb_mb, off, (int) len,
			    mtod(m, caddr_t) + hdrlen);
			m->m_len += len;
		} else { /*数据较大,创建新的mbuf,与包含分组首部的 mbuf 连接起来*/
			m->m_next = m_copy(so->so_snd.sb_mb, off, (int) len);
			if (m->m_next == 0) {
				(void) m_free(m);
				error = ENOBUFS;
				goto out;
			}
		}
		
		/*
		 * If we're sending everything we've got, set PUSH.
		 * (This will keep happy those implementations which only
		 * give data to the user when a buffer fills or
		 * a PUSH comes in.)
		 */
		if (off + len == so->so_snd.sb_cc) /*本次发送清空发送缓冲区,置PUSH标志*/
			flags |= TH_PUSH;
	} else { /*发送的报文段中不带数据*/
		if (tp->t_flags & TF_ACKNOW)
			tcpstat.tcps_sndacks++; /*纯ACK报文统计*/
		else if (flags & (TH_SYN|TH_FIN|TH_RST))
			tcpstat.tcps_sndctrl++; /*控制报文统计*/
		else if (SEQ_GT(tp->snd_up, tp->snd_una))
			tcpstat.tcps_sndurg++; /**/
		else
			tcpstat.tcps_sndwinup++; /*窗口更新报文统计*/

		MGETHDR(m, M_DONTWAIT, MT_HEADER); /*获取一个包含分组首部的mbuf*/
		if (m == NULL) {
			error = ENOBUFS;
			goto out;
		}
		m->m_data += max_linkhdr; /*为链路层首部预留空间*/
		m->m_len = hdrlen;
	}
	m->m_pkthdr.rcvif = (struct ifnet *)0;
	ti = mtod(m, struct tcpiphdr *);
	if (tp->t_template == 0)
		panic("tcp_output");
	bcopy((caddr_t)tp->t_template, (caddr_t)ti, sizeof (struct tcpiphdr)); /*复制TCP IP首部模板*/

	/*
	 * Fill in fields, remembering maximum advertised
	 * window for use in delaying messages about window sizes.
	 * If resending a FIN, be sure not to use a new sequence number.
	 */
	if (flags & TH_FIN && tp->t_flags & TF_SENTFIN && 
	    tp->snd_nxt == tp->snd_max) /*正重传FIN,不要使用一个新的序列号*/
		tp->snd_nxt--;
	/*
	 * If we are doing retransmissions, then snd_nxt will
	 * not reflect the first unsent octet.  For ACK only
	 * packets, we do not want the sequence number of the
	 * retransmitted packet, we want the sequence number
	 * of the next unsent octet.  So, if there is no data
	 * (and no SYN or FIN), use snd_max instead of snd_nxt
	 * when filling in ti_seq.  But if we are in persist
	 * state, snd_max might reflect one byte beyond the
	 * right edge of the window, so use snd_nxt in that
	 * case, since we know we aren't doing a retransmission.
	 * (retransmit and persist are mutually exclusive...)
	 */
	if (len || (flags & (TH_SYN|TH_FIN)) || tp->t_timer[TCPT_PERSIST]) /*有数据 或 SYN、FIN,或 持续状态*/
		ti->ti_seq = htonl(tp->snd_nxt); /*TCP首部中的序号为snd_nxt*/
	else /*纯ACK,我们希望序列号是下一个未发送的序列号*/
		ti->ti_seq = htonl(tp->snd_max); /*TCP首部中的序号为rcv_nxt*/
	ti->ti_ack = htonl(tp->rcv_nxt); /*TCP首部中的确认序号*/
	if (optlen) {
		bcopy((caddr_t)opt, (caddr_t)(ti + 1), optlen);
		ti->ti_off = (sizeof (struct tcphdr) + optlen) >> 2; /*TCP首部的长度*/
	}
	ti->ti_flags = flags; /*TCP首部中的标志位*/
	/*
	 * Calculate receive window.  Don't shrink window,
	 * but avoid silly window syndrome.
	 */
	/*计算通告的窗口大小,避免窗口糊涂症*/
	if (win < (long)(so->so_rcv.sb_hiwat / 4) && win < (long)tp->t_maxseg) /*不通告小窗口*/
		win = 0;
	if (win > (long)TCP_MAXWIN << tp->rcv_scale)
		win = (long)TCP_MAXWIN << tp->rcv_scale;
	if (win < (long)(tp->rcv_adv - tp->rcv_nxt)) /*不缩小通告窗口*/
		win = (long)(tp->rcv_adv - tp->rcv_nxt);
	ti->ti_win = htons((u_short) (win>>tp->rcv_scale)); /*TCP首部中的窗口大小*/
	if (SEQ_GT(tp->snd_up, tp->snd_nxt)) {
		ti->ti_urp = htons((u_short)(tp->snd_up - tp->snd_nxt)); /*TCP首部中的紧急字段*/
		ti->ti_flags |= TH_URG;
	} else
		/*
		 * If no urgent pointer to send, then we pull
		 * the urgent pointer to the left edge of the send window
		 * so that it doesn't drift into the send window on sequence
		 * number wraparound.
		 */
		tp->snd_up = tp->snd_una;		/* drag it along */

	/*
	 * Put TCP length in extended header, and then
	 * checksum extended header and data.
	 */
	if (len + optlen)
		ti->ti_len = htons((u_short)(sizeof (struct tcphdr) +
		    optlen + len)); /*IP首部中的报文长度*/
	ti->ti_sum = in_cksum(m, (int)(hdrlen + len)); /*TCP首部中的校验和*/

	/*
	 * In transmit state, time the transmission and arrange for
	 * the retransmit.  In persist state, just set snd_max.
	 */
	if (tp->t_force == 0 || tp->t_timer[TCPT_PERSIST] == 0) { /*非持续状态*/
		tcp_seq startseq = tp->snd_nxt;

		/*
		 * Advance snd_nxt over sequence space of this segment.
		 */
		if (flags & (TH_SYN|TH_FIN)) {
			if (flags & TH_SYN)
				tp->snd_nxt++; /*SYN标志占用一个序号*/
			if (flags & TH_FIN) {
				tp->snd_nxt++; /*FIN标志占用一个序号*/
				tp->t_flags |= TF_SENTFIN; /*已经发送过FIN*/
			}
		}
		tp->snd_nxt += len; /*更新下一个要发送的序号*/
		if (SEQ_GT(tp->snd_nxt, tp->snd_max)) {
			tp->snd_max = tp->snd_nxt;
			/*
			 * Time this transmission if not a retransmission and
			 * not currently timing anything.
			 */
			if (tp->t_rtt == 0) { /*未给报文段计过时(为了计算RTT)*/
				tp->t_rtt = 1;
				tp->t_rtseq = startseq; /*记录计算RTT的报文段的起始序号*/
				tcpstat.tcps_segstimed++;
			}
		}

		/*
		 * Set retransmit timer if not currently set,
		 * and not doing an ack or a keep-alive probe.
		 * Initial value for retransmit timer is smoothed
		 * round-trip time + 2 * round-trip time variance.
		 * Initialize shift counter which is used for backoff
		 * of retransmit time.
		 */
		if (tp->t_timer[TCPT_REXMT] == 0 &&
		    tp->snd_nxt != tp->snd_una) { /*有数据未确认且重传定时器未启动*/
			tp->t_timer[TCPT_REXMT] = tp->t_rxtcur; /*启动重传定时器*/
			if (tp->t_timer[TCPT_PERSIST]) { /*复位持续定时器*/
				tp->t_timer[TCPT_PERSIST] = 0;
				tp->t_rxtshift = 0;
			}
		}
	} else /*持续状态*/
		if (SEQ_GT(tp->snd_nxt + len, tp->snd_max))
			tp->snd_max = tp->snd_nxt + len;

	/*
	 * Trace.
	 */
	if (so->so_options & SO_DEBUG) /*TCP调试*/
		tcp_trace(TA_OUTPUT, tp->t_state, tp, ti, 0);

	/*
	 * Fill in IP length and desired time to live and
	 * send to IP level.  There should be a better way
	 * to handle ttl and tos; we could keep them in
	 * the template, but need a way to checksum without them.
	 */
	m->m_pkthdr.len = hdrlen + len; /*更新mbuf中分组首部中的长度字段*/

	((struct ip *)ti)->ip_len = m->m_pkthdr.len; /*IP数据报的长度*/
	((struct ip *)ti)->ip_ttl = tp->t_inpcb->inp_ip.ip_ttl;	/* XXX TTL字段*/
	((struct ip *)ti)->ip_tos = tp->t_inpcb->inp_ip.ip_tos;	/* XXX TOS字段*/
	error = ip_output(m, tp->t_inpcb->inp_options, &tp->t_inpcb->inp_route, /*调用ip_output函数发送报文段!*/
	    so->so_options & SO_DONTROUTE, 0);
		
	if (error) {
out:
		if (error == ENOBUFS) {
			tcp_quench(tp->t_inpcb, 0);
			return (0);
		}
		if ((error == EHOSTUNREACH || error == ENETDOWN)
		    && TCPS_HAVERCVDSYN(tp->t_state)) {
			tp->t_softerror = error;
			return (0);
		}
		return (error);
	}
	tcpstat.tcps_sndtotal++; /*更新发送过的报文统计*/

	/*
	 * Data sent (as far as we can tell).
	 * If this advertises a larger window than any other segment,
	 * then remember the size of the advertised window.
	 * Any pending ACK has now been sent.
	 */
	if (win > 0 && SEQ_GT(tp->rcv_nxt+win, tp->rcv_adv))
		tp->rcv_adv = tp->rcv_nxt + win; /*更新接收窗口边界*/
	tp->last_ack_sent = tp->rcv_nxt; /*记录最近发送的ACK值*/
	tp->t_flags &= ~(TF_ACKNOW|TF_DELACK); /*清除立即发送ACK和延迟的ACK标记*/
	if (sendalot) /*还有数据要发送,循环*/
		goto again;
	return (0);
}

tcp_output 函数最终调用IP层的输出函数ip_output函数将报文段发送出去。

tcp_mss函数决定TCP首部中的MSS选项值,这个值由多个因素决定,这里我不贴出它的代码,只贴出它的注释:

/*
 * Determine a reasonable value for maxseg size.
 * If the route is known, check route for mtu.
 * If none, use an mss that can be handled on the outgoing
 * interface without forcing IP to fragment; if bigger than
 * an mbuf cluster (MCLBYTES), round down to nearest multiple of MCLBYTES
 * to utilize large mbufs.  If no route is found, route has no mtu,
 * or the destination isn't local, use a default, hopefully conservative
 * size (usually 512 or the default IP max size, but no more than the mtu
 * of the interface), as we can't discover anything about intervening
 * gateways or networks.  We also initialize the congestion/slow start
 * window to be a single segment if the destination isn't local.
 * While looking at the routing entry, we also initialize other path-dependent
 * parameters from pre-set or cached values in the routing entry.
 */

决定好MSS的值后,这个函数会把接收缓冲区和发送缓冲区大小设置为MSS的倍数,然后把tcpcb结构中的t_maxseg和snd_cwnd都置为MSS。

猜你喜欢

转载自blog.csdn.net/woay2008/article/details/79620953