libnl-route 详解

Routing Family Netlink Library (libnl-route)

Thomas Graf
<[email protected]>
version 3.1, Aug 11 2011

1. Introduction

This library provides APIs to the kernel interfaces of the routing family.

2. Addresses

The link configuration interface is part of the NETLINK_ROUTE protocol family and implements the following netlink message types:

链路配置接口是NETLINK_ROUT协议簇的一部分,实现以下netlink消息类型:

  • View and modify the configuration of physical and virtual network devices.

  • 查看和修改物理网卡和虚拟网卡的配置信息。

  • Create and delete virtual network devices (e.g. dummy devices, VLAN devices, tun devices, bridging devices, …)

  • 创建和删除虚拟网卡

  • View and modify per link network configuration settings (e.g. net.ipv6.conf.eth0.accept_ranet.ipv4.conf.eth1.forwarding, …)

  • 查看和修改每个网络配置设置。

Naming Convention (network device, link, interface)

命名约定(网络设备、链路、接口)

In networking several terms are commonly used to refer to network devices. While they have distinct meanings they have been used interchangeably in the past. Within the Linux kernel, the term network device or netdev is commonly used In user space the term network interface is very common. The routing netlink protocol uses the term link and so does the iproute2 utility and most routing daemons.

在网络中,有几个术语通常用来指代网络设备。虽然它们有不同的含义,但在过去它们可以互换使用。在Linux内核中,术语netdevice或netdev通常用于用户空间,术语network interface非常常见。路由netlink协议使用术语link,iproute2实用程序和大多数路由守护进程也使用术语link。

This section describes the protocol semantics of the netlink based link configuration interface. The following messages are defined:

本节介绍基于netlink的链路配置接口的协议语义。定义了以下消息:

Message Type User → Kernel Kernel → User

RTM_NEWLINK

Create or update virtual network device

Reply to RTM_GETLINK request or notification of link added or updated

RTM_DELLINK

Delete virtual network device

Notification of link deleted or disappeared

RTM_GETLINK

Retrieve link configuration and statistics

RTM_SETLINK

Modify link configuration

See Netlink Library - Message Types for more information on common semantics of these message types.

3.1.1. Link Message Format

3.1.1. link 消息格式

All netlink link messages share a common header (struct ifinfomsg) which is appended after the netlink header (struct nlmsghdr).

所有netlink 的liink消息够共享一个common 头部(struct ifinfomsg) 他附加在netlink头部后面(struct nlmsghdr

Link Message Header

netlink header (上面灰色部分)      link header(下面红色部分) 

The meaning of each field may differ depending on the message type. A struct ifinfomsg is defined in <linux/rtnetlink.h> to represent the header.

每个字段的含义可能因消息类型而异。<linux/rtnetlink.h>中定义了一个结构ifinfo msg来表示头。

Address Family (8bit)   :The address family is usually set to AF_UNSPEC but may be specified in RTM_GETLINK requests to limit the returned links to a specific address family.

Address Family (8bit)   :地址族通常被设置为AF_UNSPEC,但可以在RTM_GETLINK请求中指定,以将返回的链接限制到特定的地址族。

Link Layer Type (16bit)  :Currently only used in kernel→user messages to report the link layer type of a link. The value corresponds to the ARPHRD_* defines found in <linux/if_arp.h>. Translation from/to strings can be done using the functions nl_llproto2str()/nl_str2llproto().

Link Layer Type (16bit)  :目前只在kernel→user消息中使用,用于报告链接的链接层类型。该值对应于<linux/if_arp.h>中的ARPHRD定义。可以使用函数nl_llproto2str()/nl_str2llproto()完成 从/到 字符串的转换。

Link Index (32bit)  :Carries the interface index and is used to identify existing links.

Link Index (32bit)  :携带接口索引并用于标识现有链接。

Flags (32bit)  :In kernel→user messages the value of this field represents the current state of the link flags. In user→kernel messages this field is used to change flags or set the initial flag state of new links. Note that in order to change a flag, the flag must also be set in the Flags Change Mask field.

Flags (32bit)  :在kernel→user messages中,此字段的值表示链接标志的当前状态。在用户→内核消息中,此字段用于更改标志或设置新链接的初始标志状态。请注意,要更改标志,还必须在标志更改掩码字段中设置该标志。

Flags Change Mask (32bit)  :The primary use of this field is to specify a mask of flags that should be changed based on the value of the Flags field. A special meaning is given to this field when present in link notifications, see TODO.

Flags Change Mask (32bit)  :此字段的主要用途是指定应根据“标志”字段的值更改的标志掩码。当此字段出现在链接通知中时,它有一个特殊的含义,请参阅TODO。

Attributes (variable)  :All link message types may carry netlink attributes. They are defined in the header file <linux/if_link.h> and share the prefix IFLA_.

Attributes (variable)  :所有链接消息类型都可能带有netlink属性。它们在头文件<linux/if_link.h>中定义,并共享前缀IFLA_

3.1.2. Link Message Types

RTM_GETLINK (user→kernel)

Lookup link by 1. interface index or 2. link name (IFLA_IFNAME) and return a single RTM_NEWLINK message containing the link configuration and statistics or a netlink error message if no such link was found.

按1查找链接。接口索引或2。链接名(IFLA_IFNAME)并返回一条包含链接配置和统计信息的RTM_NEWLINK消息,如果没有找到此类链接,则返回一条netlink错误消息。

Parameters:

  • Address family

    • If the address family is set to PF_BRIDGE, only bridging devices will be returned.

    • 如果地址族被设置为PF_BRIDGE,则只返回桥接设备

    • If the address family is set to PF_INET6, only ipv6 enabled devices will be returned.

    • 如果地址族设置为PF_INET6,则只返回启用ipv6的设备。

Flags:

  • NLM_F_DUMP If set, all links will be returned in form of a multipart message.

  • NLM_F_DUMP如果被设置,所有链接都将以多部分消息的形式返回。

Returns:

  • EINVAL if neither interface nor link name are set

  • 如果未设置接口或链接名称,则返回EINVAL

  • ENODEV if no link was found

  • 如果找不到链接,则返回ENODEV

  • ENOBUFS if allocation failed

  • 如果分配失败,则返回ENOBUFS

RTM_NEWLINK (user→kernel)

Creates a new or updates an existing link. Only virtual links may be created but all links may be updated.

创建新链接或更新现有链接。只能创建虚拟链接,但可以更新所有链接。

Flags:

  • NLM_F_CREATE Create link if it does not exist

  • NLM_F_CREATE创建链接(如果不存在)

  • NLM_F_EXCL Return EEXIST if link already exists

  • NLM_F_EXCL 如果链接已经存在返回EEXIST

Returns:

  • EINVAL malformed message or invalid configuration parameters

  • EINVAL 消息格式错误或配置参数无效

  • EAFNOSUPPORT if a address family specific configuration (IFLA_AF_SPEC) is not supported.

  • EAFNOSUPPORT 如果不支持特定于地址系列的配置(IFLA_AF_SPEC),则提供。

  • EOPNOTSUPP if the link does not support modification of parameters

  • EOPNOTSUPP 如果链接不支持修改参数

  • EEXIST if NLM_F_EXCL was set and the link exists alraedy

  • EEXIST  如果设置了NLM_F_EXCL并且链接已存在

  • ENODEV if the link does not exist and NLM_F_CREATE is not set

  • ENODEV  如果链接不存在并且未设置NLM_F_CREATE

RTM_NEWLINK (kernel→user)

This message type is used in reply to a RTM_GETLINK request and carries the configuration and statistics of a link. If multiple links need to be sent, the messages will be sent in form of a multipart message.

此消息类型用于响应RTM_GETLINK请求,并携带链接的配置和统计信息。如果需要发送多个链接,则消息将以多部分消息的形式发送。

The message type is also used for notifications sent by the kernel to the multicast group RTNLGRP_LINK to inform about various link events. It is therefore recommended to always use a separate link socket for link notifications in order to separate between the two message types.

消息类型还用于内核向多播组RTNLGRP_LINK发送通知,以通知各种链接事件。因此,建议始终为链接通知使用单独的链接套接字,以便在两种消息类型之间进行分离。

TODO: document how to detect different notifications

RTM_DELLINK (user→kernel)

Lookup link by 1. interface index or 2. link name (IFLA_IFNAME) and delete the virtual link.

根据接口索引或者链接名(IFLA_IFNAME)删除虚拟链路

Returns:

  • EINVAL if neither interface nor link name are set

  • EINVAL 如果接口和链接名称都未设置

  • ENODEV if no link was found

  • ENODEV 如果找不到链接

  • ENOTSUPP if the operation is not supported (not a virtual link)

  • ENOTSUPP如果不支持该操作(不是虚拟链接)

RTM_DELLINK (kernel→user)

Notification sent by the kernel to the multicast group RTNLGRP_LINK when

内核通过RTNLGRP_LINK链接向多播组发送通知

  1. a network device was unregistered (change == ~0)

  2. 网络设备已注销

  3. a bridging device was deleted (address family will be PF_BRIDGE)

  4. 已删除桥接设备

3.2. Get / List

3.2.1. Get list of links

To retrieve the list of links in the kernel, allocate a new link cache using rtnl_link_alloc_cache() to hold the links. It will automatically construct and send a RTM_GETLINK message requesting a dump of all links from the kernel and feed the returned RTM_NEWLINK to the internal link message parser which adds the returned links to the cache.

要检索内核中的链接列表,请使用rtnl_link_alloc_cache() 分配一个新的链接缓存来保存链接。它将自动构造并发送一个RTM_GETLINK消息,请求从内核转储所有链接,并将返回的RTM_NEWLINK提供给内部链接消息解析器,后者将返回的链接添加到缓存中。

#include <netlink/route/link.h>
/**
 * Allocate link cache and fill in all configured links.
 * @arg sk              Netlink socket.
 * @arg family          Link address family or AF_UNSPEC
 * @arg result          Pointer to store resulting cache.
 *
 * Allocates and initializes a new link cache. A netlink message is sent to
 * the kernel requesting a full dump of all configured links. The returned
 * messages are parsed and filled into the cache. If the operation succeeds
 * the resulting cache will a link object for each link configured in the
 * kernel.
 *
 * If \c family is set to an address family other than \c AF_UNSPEC the
 * contents of the cache can be limited to a specific address family.
 * Currently the following address families are supported:
 * - AF_BRIDGE
 * - AF_INET6
 *
 * @route_doc{link_list, Get List of Links}
 * @see rtnl_link_get()
 * @see rtnl_link_get_by_name()
 * @return 0 on success or a negative error code.
 */


int rtnl_link_alloc_cache(struct nl_sock *sk, int family, struct nl_cache **result)
 

The cache will contain link objects (struct rtnl_link, see Link Object) and can be accessed using the standard cache functions. By setting the family parameter to an address familly other than AF_UNSPEC, the resulting cache will only contain links supporting the specified address family.

缓存将包含链接对象(struct rtnl_link,请参阅link Object),并且可以使用标准缓存函数进行访问。通过将family参数设置为除AF_UNSPEC以外的地址族,生成的缓存将只包含支持指定地址族的链接。

The following direct search functions are provided to search by interface index and by link name:

#include <netlink/route/link.h>
/**
 * Lookup link in cache by interface index
 * @arg cache           Link cache
 * @arg ifindex         Interface index
 *
 * Searches through the provided cache looking for a link with matching
 * interface index.
 *
 * @attention The reference counter of the returned link object will be
 *            incremented. Use rtnl_link_put() to release the reference.
 *
 * @route_doc{link_list, Get List of Links}
 * @see rtnl_link_get_by_name()
 * @return Link object or NULL if no match was found.
 */
struct rtnl_link *rtnl_link_get(struct nl_cache *cache, int ifindex);

/**
 * Lookup link in cache by link name
 * @arg cache           Link cache
 * @arg name            Name of link
 *
 * Searches through the provided cache looking for a link with matching
 * link name
 *
 * @attention The reference counter of the returned link object will be
 *            incremented. Use rtnl_link_put() to release the reference.
 *
 * @route_doc{link_list, Get List of Links}
 * @see rtnl_link_get()
 * @return Link object or NULL if no match was found.
 */
struct rtnl_link *rtnl_link_get_by_name(struct nl_cache *cache, const char *name);

Example: Link Cache

struct nl_cache *cache;

struct rtnl_link *link;

if (rtnl_link_alloc_cache(sock, AF_UNSPEC, &cache)) < 0)

        /* error */
if (!(link = rtnl_link_get_by_name(cache, "eth1")))

        /* link does not exist */

/* do something with link */

rtnl_link_put(link);

nl_cache_put(cache);

3.2.2. Lookup Single Link (Direct Lookup)

If only a single link is of interest, the link can be looked up directly without the use of a link cache using the function rtnl_link_get_kernel().

如果只对单个链接感兴趣,则可以使用函数rtnl_link_get_kernel() 直接查找链接,而无需使用链接缓存。

#include <netlink/route/link.h>

int rtnl_link_get_kernel(struct nl_sock *sk, int ifindex, const char *name, struct rtnl_link **result);

It will construct and send a RTM_GETLINK request using the parameters provided and wait for a RTM_NEWLINK or netlink error message sent in return. If the link exists, the link is returned as link object (see Link Object).

它将使用提供的参数构造并发送RTM_GETLINK请求,并等待RTM_NEWLINK或netlink错误消息返回。如果链接存在,链接将作为链接对象返回(请参见链接对象)。

Example: Direct link lookup

struct rtnl_link *link;

if (rtnl_link_get_kernel(sock, 0, "eth1", &link) < 0)

        /* error */

/* do something with link */

rtnl_link_put(link);

3.2.3. Translating interface index to link name

3.2.3. 将接口索引转换为链接名称

Applications which require to translate interface index to a link name or vice verase may use the following functions to do so. Both functions require a filled link cache to work with.

需要将接口索引转换为链接名或链接名的应用程序可以使用以下函数进行转换。这两个函数都需要一个填充的链接缓存。

 /**
  * Translate interface index to corresponding link name
  * @arg cache           Link cache
  * @arg ifindex         Interface index
  * @arg dst             String to store name         存储名称的字符串
  * @arg len             Length of destination string 目标字符串的长度
  *
  * Translates the specified interface index to the corresponding
  * link name and stores the name in the destination string.
  *
  * @route_doc{link_translate_ifindex, Translating interface index to link name}
  * @see rtnl_link_name2i()
  * @return Name of link or NULL if no match was found.
  */
char *rtnl_link_i2name (struct nl_cache *cache, int ifindex, char *dst, size_t len);

/**
 * Translate link name to corresponding interface index
 * @arg cache           Link cache
 * @arg name            Name of link
 *
 * @route_doc{link_translate_ifindex, Translating interface index to link name}
 * @see rtnl_link_i2name()
 * @return Interface index or 0 if no match was found.
 */

int rtnl_link_name2i (struct nl_cache *cache, const char *name);

3.3. Add / Modify

Several types of virtual link can be added on the fly using the function rtnl_link_add ().

#include <netlink/route/link.h>

 /**
  * Add virtual link     添加虚拟链接
  * @arg sk              netlink socket.
  * @arg link            new link to add
  * @arg flags           additional netlink message flags
  *
  * Builds a \c RTM_NEWLINK netlink message requesting the addition of
  * a new virtual link.   建立一个netlink消息请求添加一个新的虚拟链接
  *
  * After sending, the function will wait for the ACK or an eventual
  * error message to be received and will therefore block until the
  * operation has been completed.
  *
  * @copydoc auto_ack_warning
  *
  * @return 0 on success or a negative error code.
  */

int rtnl_link_add(struct nl_sock *sk, struct rtnl_link *link, int flags);

3.4. Delete

The deletion of virtual links such as VLAN devices or dummy devices is done using the function rtnl_link_delete(). The link passed on to the function can be a link from a link cache or it can be construct with the minimal attributes needed to identify the link.

虚拟链路(如VLAN设备或虚拟设备)的删除是使用函数 rtnl_link_delete() 完成的。传递给函数的链接可以是来自链接缓存的链接,也可以用标识链接所需的最小属性来构造。

#include <netlink/route/link.h>

int rtnl_link_delete(struct nl_sock *sk, const struct rtnl_link *link);

The function will construct and send a RTM_DELLINK request message and returns any errors returned by the kernel.

该函数将构造并发送RTM_DELLINK请求消息,并返回内核返回的所有错误。

Example: Delete link by name

struct rtnl_link *link;

if (!(link = rtnl_link_alloc()))

/* error */

rtnl_link_set_name(link, "my_vlan");

if (rtnl_link_delete(sock, link) < 0)

/* error */
rtnl_link_put(link);

A link is represented by the structure struct rtnl_link. Instances may be created with the function rtnl_link_alloc() or via a link cache (see Get list of links) and are freed again using the function rtnl_link_put().

链接由结构rtnl_link表示。实例可以使用函数rtnl_link_alloc()创建,也可以通过链接缓存创建,并使用函数rtnl_link_put()再次释放。

#include <netlink/route/link.h>

struct rtnl_link *rtnl_link_alloc(void);

void rtnl_link_put(struct rtnl_link *link);

3.5.1. Name

The name serves as unique, human readable description of the link. By default, links are named based on their type and then enumerated, e.g. eth0, eth1, ethn but they may be renamed at any time.

该名称用作链接的唯一、可读的描述。默认情况下,链接根据其类型命名,然后枚举,例如eth0、eth1、ethn,但它们可以随时重命名。

Kernels >= 2.6.11 support identification by link name.

#include <netlink/route/link.h>

void rtnl_link_set_name(struct rtnl_link *link, const char *name);

char *rtnl_link_get_name(struct rtnl_link *link);

Accepted link name format: [^ /]* (maximum length: 15 characters)

3.5.2. Interface Index (Identifier)

The interface index is an integer uniquely identifying a link. If present in any link message, it will be used to identify an existing link.

接口索引是唯一标识链接的整数。如果存在于任何链接消息中,它将用于标识现有链接。

#include <netlink/route/link.h>

void rtnl_link_set_ifindex(struct rtnl_link *link, int ifindex);

int rtnl_link_get_ifindex(struct rtnl_link *link);

3.5.3. Group

Each link can be assigned a numeric group identifier to group a bunch of links together and apply a set of changes to a group instead of just a single link.

可以为每个链接分配一个数字组标识符,以便将一组链接组合在一起,并对组应用一组更改,而不仅仅是单个链接。

#include <netlink/route/link.h>

void rtnl_link_set_group(struct rtnl_link *link, uint32_t group);

uint32_t rtnl_link_get_group(struct rtnl_link *link);

3.5.4. Link Layer Address

The link layer address (e.g. MAC address).

#include <netlink/route/link.h>

void rtnl_link_set_addr(struct rtnl_link *link, struct nl_addr *addr);

struct nl_addr *rtnl_link_get_addr(struct rtnl_link *link);

3.5.5. Broadcast Address

The link layer broadcast address

#include <netlink/route/link.h>

void rtnl_link_set_broadcast(struct rtnl_link *link, struct nl_addr *addr);

struct nl_addr *rtnl_link_get_broadcast(struct rtnl_link *link);

3.5.6. MTU (Maximum Transmission Unit)

The maximum transmission unit specifies the maximum packet size a network device can transmit or receive. This value may be lower than the capability of the physical network device.

最大传输单元指定网络设备可以传输或接收的最大数据包大小。此值可能低于物理网络设备的容量。

#include <netlink/route/link.h>

void rtnl_link_set_mtu(struct rtnl_link *link, unsigned int mtu);

unsigned int rtnl_link_get_mtu(struct rtnl_link *link);

3.5.7. Flags

The flags of a link enable or disable various link features or inform about the state of the link.

链接的标志启用或禁用各种链接功能或通知链接的状态。

#include <netlink/route/link.h>

void rtnl_link_set_flags(struct rtnl_link *link, unsigned int flags);

void rtnl_link_unset_flags(struct rtnl_link *link, unsigned int flags);

unsigned int rtnl_link_get_flags(struct rtnl_link *link);
IFF_UP

Link is up (administratively)

IFF_RUNNING

Link is up and carrier is OK (RFC2863 OPER_UP)

IFF_LOWER_UP

Link layer is operational

IFF_DORMANT

Driver signals dormant

IFF_BROADCAST

Link supports broadcasting

IFF_MULTICAST

Link supports multicasting

IFF_ALLMULTI

Link supports multicast routing

IFF_DEBUG

Tell driver to do debugging (currently unused)

IFF_LOOPBACK

Link loopback network

IFF_POINTOPOINT

Point-to-point link

IFF_NOARP

ARP is not supported

IFF_PROMISC

Status of promiscious mode

IFF_MASTER

Master of a load balancer (bonding)

IFF_SLAVE

Slave to a master link

IFF_PORTSEL

Driver supports setting media type (only used by ARM ethernet)

IFF_AUTOMEDIA

Link selects port automatically (only used by ARM ethernet)

IFF_ECHO

Echo sent packets (testing feature, CAN only)

IFF_DYNAMIC

Unused (BSD compatibility)

IFF_NOTRAILERS

Unused (BSD compatibility)

To translate a link flag to a link flag name or vice versa:

#include <netlink/route/link.h>

char *rtnl_link_flags2str(int flags, char *buf, size_t size);

int rtnl_link_str2flags(const char *flag_name);

3.5.8. Transmission Queue Length

The transmission queue holds packets before packets are delivered to the driver for transmission. It is usually specified in number of packets but the unit may be specific to the link type.

传输队列在数据包被传递到驱动程序进行传输之前保存数据包。它通常以包的数量来指定,但是单位可能特定于链路类型。

#include <netlink/route/link.h>

void rtnl_link_set_txqlen(struct rtnl_link *link, unsigned int txqlen);

unsigned int rtnl_link_get_txqlen(struct rtnl_link *link);

3.5.9. Operational Status

The operational status has been introduced to provide extended information on the link status. Traditionally the link state has been described using the link flags IFF_UP, IFF_RUNNING, IFF_LOWER_UP, and IFF_DORMANT which was no longer sufficient for some link types.

引入了操作状态以提供有关链路状态的扩展信息。传统上,使用链接标志IFF_UPIFF_RUNNINGIFF_LOWER_UPIFF_DORMANT来描述链接状态,这对于某些链接类型来说已经不够了。

#include <netlink/route/link.h>

void rtnl_link_set_operstate(struct rtnl_link *link, uint8_t state);

uint8_t rtnl_link_get_operstate(struct rtnl_link *link);
IF_OPER_UNKNOWN

Unknown state

IF_OPER_NOTPRESENT

Link not present

IF_OPER_DOWN

Link down

IF_OPER_LOWERLAYERDOWN

L1 down

IF_OPER_TESTING

Testing

IF_OPER_DORMANT

Dormant

IF_OPER_UP

Link up

Translation of operational status code to string and vice versa:

#include <netlink/route/link.h>

char *rtnl_link_operstate2str(uint8_t state, char *buf, size_t size);

int rtnl_link_str2operstate(const char *name);

3.5.10. Mode

Currently known link modes are:

IF_LINK_MODE_DEFAULT

Default link mode

IF_LINK_MODE_DORMANT

Limit upward transition to dormant

#include <netlink/route/link.h>

void rtnl_link_set_linkmode(struct rtnl_link *link, uint8_t mode);

uint8_t rtnl_link_get_linkmode(struct rtnl_link *link);

Translation of link mode to string and vice versa:

char *rtnl_link_mode2str(uint8_t mode, char *buf, size_t len);

uint8_t rtnl_link_str2mode(const char *name);

3.5.11. IfAlias

Alternative name for the link, primarly used for SNMP IfAlias.

#include <netlink/route/link.h>

const char *rtnl_link_get_ifalias(struct rtnl_link *link);

void rtnl_link_set_ifalias(struct rtnl_link *link, const char *alias);

Length limit: 256

3.5.12. Hardware Type

#include <netlink/route/link.h>

#include <linux/if_arp.h>

void rtnl_link_set_arptype(struct rtnl_link *link, unsigned int arptype);

unsigned int rtnl_link_get_arptype(struct rtnl_link *link);

Translation of hardware type to character string and vice versa:

#include <netlink/utils.h>

char *nl_llproto2str(int arptype, char *buf, size_t len);

int nl_str2llproto(const char *name);

3.5.13. Qdisc

The name of the queueing discipline used by the link is of informational nature only. It is a read-only attribute provided by the kernel and cannot be modified. The set function is provided solely for the purpose of creating link objects to be used for comparison.

链接使用的排队规程的名称仅具有信息性。它是内核提供的只读属性,不能修改。set函数仅用于创建用于比较的链接对象。

For more information on how to modify the qdisc of a link, see section Traffic Control.

#include <netlink/route/link.h>

void rtnl_link_set_qdisc(struct rtnl_link *link, const char *name);

char *rtnl_link_get_qdisc(struct rtnl_link *link);

3.5.14. Promiscuity

The number of subsystem currently depending on the link being promiscuous mode. A value of 0 indicates that the link is not in promiscuous mode. It is a read-only attribute provided by the kernel and cannot be modified. The set function is provided solely for the purpose of creating link objects to be used for comparison.

当前子系统的数量取决于链路的混杂模式。值为0表示链接未处于混杂模式。它是内核提供的只读属性,不能修改。set函数仅用于创建用于比较的链接对象。

#include <netlink/route/link.h>

void rtnl_link_set_promiscuity(struct rtnl_link *link, uint32_t count);

uint32_t rtnl_link_get_promiscuity(struct rtnl_link *link);

3.5.15. RX/TX Queues

The number of RX/TX queues the link provides. The attribute is writable but will only be considered when creating a new network device via netlink.

#include <netlink/route/link.h>

void rtnl_link_set_num_tx_queues(struct rtnl_link *link, uint32_t nqueues);

uint32_t rtnl_link_get_num_tx_queues(struct rtnl_link *link);


void rtnl_link_set_num_rx_queues(struct rtnl_link *link, uint32_t nqueues);

uint32_t rtnl_link_get_num_rx_queues(struct rtnl_link *link);

3.5.16. Weight

This attribute is unused and obsoleted in all recent kernels.

3.6.1. Bonding

Example: Add bonding link

#include <netlink/route/link.h>

struct rtnl_link *link;

link = rtnl_link_bond_alloc();

rtnl_link_set_name(link, "my_bond");


/* requires admin privileges */

if (rtnl_link_add(sk, link, NLM_F_CREATE) < 0)

        /* error */

rtnl_link_put(link);

3.6.2. VLAN

extern char *           rtnl_link_vlan_flags2str(int, char *, size_t);

extern int              rtnl_link_vlan_str2flags(const char *);

extern int              rtnl_link_vlan_set_id(struct rtnl_link *, int);

extern int              rtnl_link_vlan_get_id(struct rtnl_link *);

extern int              rtnl_link_vlan_set_flags(struct rtnl_link *,unsigned int);

extern int              rtnl_link_vlan_unset_flags(struct rtnl_link *,unsigned int);

extern unsigned int     rtnl_link_vlan_get_flags(struct rtnl_link *);

extern int              rtnl_link_vlan_set_ingress_map(struct rtnl_link *,int, uint32_t);

extern uint32_t *       rtnl_link_vlan_get_ingress_map(struct rtnl_link *);

extern int              rtnl_link_vlan_set_egress_map(struct rtnl_link *,uint32_t, int);

extern struct vlan_map *rtnl_link_vlan_get_egress_map(struct rtnl_link *,int *);

Example: Add a VLAN device

struct rtnl_link *link;

int master_index;

/* lookup interface index of eth0 */

if (!(master_index = rtnl_link_name2i(link_cache, "eth0")))
        /* error */

/* allocate new link object of type vlan */

link = rtnl_link_vlan_alloc();

/* set eth0 to be our master device */

rtnl_link_set_link(link, master_index);

rtnl_link_vlan_set_id(link, 10);

if ((err = rtnl_link_add(sk, link, NLM_F_CREATE)) < 0)

        /* error */
rtnl_link_put(link);

3.6.3. MACVLAN

extern struct rtnl_link *rtnl_link_macvlan_alloc(void);

extern int              rtnl_link_is_macvlan(struct rtnl_link *);

extern char *           rtnl_link_macvlan_mode2str(int, char *, size_t);

extern int              rtnl_link_macvlan_str2mode(const char *);

extern char *           rtnl_link_macvlan_flags2str(int, char *, size_t);

extern int              rtnl_link_macvlan_str2flags(const char *);

extern int              rtnl_link_macvlan_set_mode(struct rtnl_link *,uint32_t);

extern uint32_t         rtnl_link_macvlan_get_mode(struct rtnl_link *);

extern int              rtnl_link_macvlan_set_flags(struct rtnl_link *,uint16_t);

extern int              rtnl_link_macvlan_unset_flags(struct rtnl_link *,uint16_t);

extern uint16_t         rtnl_link_macvlan_get_flags(struct rtnl_link *);

Example: Add a MACVLAN device

struct rtnl_link *link;

int master_index;

struct nl_addr* addr;

/* lookup interface index of eth0 */

if (!(master_index = rtnl_link_name2i(link_cache, "eth0")))

        /* error */

/* allocate new link object of type macvlan */

link = rtnl_link_macvlan_alloc();

/* set eth0 to be our master device */

rtnl_link_set_link(link, master_index);

/* set address of virtual interface */

addr = nl_addr_build(AF_LLC, ether_aton("00:11:22:33:44:55"), ETH_ALEN);

rtnl_link_set_addr(link, addr);

nl_addr_put(addr);

/* set mode of virtual interface */

rtnl_link_macvlan_set_mode(link, rtnl_link_macvlan_str2mode("bridge"));

if ((err = rtnl_link_add(sk, link, NLM_F_CREATE)) < 0)

        /* error */

rtnl_link_put(link);

3.6.4. VXLAN

extern struct rtnl_link *rtnl_link_vxlan_alloc(void);

extern int      rtnl_link_is_vxlan(struct rtnl_link *);

extern int      rtnl_link_vxlan_set_id(struct rtnl_link *, uint32_t);

extern int      rtnl_link_vxlan_get_id(struct rtnl_link *, uint32_t *);

extern int      rtnl_link_vxlan_set_group(struct rtnl_link *, struct nl_addr *);

extern int      rtnl_link_vxlan_get_group(struct rtnl_link *, struct nl_addr **);

extern int      rtnl_link_vxlan_set_link(struct rtnl_link *, uint32_t);

extern int      rtnl_link_vxlan_get_link(struct rtnl_link *, uint32_t *);

extern int      rtnl_link_vxlan_set_local(struct rtnl_link *, struct nl_addr *);

extern int      rtnl_link_vxlan_get_local(struct rtnl_link *, struct nl_addr **);

extern int      rtnl_link_vxlan_set_ttl(struct rtnl_link *, uint8_t);

extern int      rtnl_link_vxlan_get_ttl(struct rtnl_link *);

extern int      rtnl_link_vxlan_set_tos(struct rtnl_link *, uint8_t);

extern int      rtnl_link_vxlan_get_tos(struct rtnl_link *);

extern int      rtnl_link_vxlan_set_learning(struct rtnl_link *, uint8_t);

extern int      rtnl_link_vxlan_get_learning(struct rtnl_link *);

extern int      rtnl_link_vxlan_enable_learning(struct rtnl_link *);

extern int      rtnl_link_vxlan_disable_learning(struct rtnl_link *);

extern int      rtnl_link_vxlan_set_ageing(struct rtnl_link *, uint32_t);

extern int      rtnl_link_vxlan_get_ageing(struct rtnl_link *, uint32_t *);

extern int      rtnl_link_vxlan_set_limit(struct rtnl_link *, uint32_t);

extern int      rtnl_link_vxlan_get_limit(struct rtnl_link *, uint32_t *);

extern int      rtnl_link_vxlan_set_port_range(struct rtnl_link *,struct ifla_vxlan_port_range *);

extern int      rtnl_link_vxlan_get_port_range(struct rtnl_link *,struct ifla_vxlan_port_range *);

extern int      rtnl_link_vxlan_set_proxy(struct rtnl_link *, uint8_t);

extern int      rtnl_link_vxlan_get_proxy(struct rtnl_link *);

extern int      rtnl_link_vxlan_enable_proxy(struct rtnl_link *);

extern int      rtnl_link_vxlan_disable_proxy(struct rtnl_link *);

extern int      rtnl_link_vxlan_set_rsc(struct rtnl_link *, uint8_t);

extern int      rtnl_link_vxlan_get_rsc(struct rtnl_link *);

extern int      rtnl_link_vxlan_enable_rsc(struct rtnl_link *);

extern int      rtnl_link_vxlan_disable_rsc(struct rtnl_link *);

extern int      rtnl_link_vxlan_set_l2miss(struct rtnl_link *, uint8_t);

extern int      rtnl_link_vxlan_get_l2miss(struct rtnl_link *);

extern int      rtnl_link_vxlan_enable_l2miss(struct rtnl_link *);

extern int      rtnl_link_vxlan_disable_l2miss(struct rtnl_link *);

extern int      rtnl_link_vxlan_set_l3miss(struct rtnl_link *, uint8_t);

extern int      rtnl_link_vxlan_get_l3miss(struct rtnl_link *);

extern int      rtnl_link_vxlan_enable_l3miss(struct rtnl_link *);

extern int      rtnl_link_vxlan_disable_l3miss(struct rtnl_link *);

Example: Add a VXLAN device

struct rtnl_link *link;

struct nl_addr* addr;

/* allocate new link object of type vxlan */

link = rtnl_link_vxlan_alloc();

/* set interface name */

rtnl_link_set_name(link, "vxlan128");

/* set VXLAN network identifier */

if ((err = rtnl_link_vxlan_set_id(link, 128)) < 0)

        /* error */

/* set multicast address to join */

if ((err = nl_addr_parse("239.0.0.1", AF_INET, &addr)) < 0)

        /* error */

if ((err = rtnl_link_set_group(link, addr)) < 0)

        /* error */

nl_addr_put(addr);

if ((err = rtnl_link_add(sk, link, NLM_F_CREATE)) < 0)

        /* error */

rtnl_link_put(link);

4. Neighbouring

5. Routing

6. Traffic Control

The traffic control architecture allows the queueing and prioritization of packets before they are enqueued to the network driver. To a limited degree it is also possible to take control of network traffic as it enters the network stack.

流量控制体系结构允许在数据包排队到网络驱动程序之前对数据包进行排队和优先级排序。在一定程度上,也可以在网络流量进入网络堆栈时对其进行控制。

The architecture consists of three different types of modules:

该体系结构由三种不同类型的模块组成:

  • Queueing disciplines (qdisc) provide a mechanism to enqueue packets in different forms. They may be used to implement fair queueing, prioritization of differentiated services, enforce bandwidth limitations, or even to simulate network behaviour such as packet loss and packet delay. Qdiscs can be classful in which case they allow traffic classes described in the next paragraph to be attached to them.

  • Queueing disciplines (qdisc) 提供一种以不同形式将数据包排队的机制。它们可以用来实现公平排队、区分服务的优先级、强制带宽限制,甚至可以用来模拟诸如分组丢失和分组延迟之类的网络行为。Qdiscs可以是类的,在这种情况下,它们允许将下一段中描述的流量类附加到它们。

  • Traffic classes (class) are supported by several qdiscs to build a tree structure for different types of traffic. Each class may be assigned its own set of attributes such as bandwidth limits or queueing priorities. Some qdiscs even allow borrowing of bandwidth between classes.

  • Traffic classes (class)由多个qdisc支持,用于为不同类型的流量构建树结构。每个类可以被分配它自己的一组属性,例如带宽限制或排队优先级。一些qdisc甚至允许在类之间借用带宽。

  • Classifiers (cls) are used to decide which qdisc/class the packet should be enqueued to. Different types of classifiers exists, ranging from classification based on protocol header values to classification based on packet priority or firewall marks. Additionally most classifiers support extended matches (ematch) which allow extending classifiers by a set of matcher modules, and actions which allow classifiers to take actions such as mangling, mirroring, or even rerouting of packets.

  • Classifiers (cls)用于决定数据包应该排队到哪个qdisc/类。存在不同类型的分类器,从基于协议头值的分类到基于包优先级或防火墙标记的分类。此外,大多数分类器支持扩展匹配(ematch),它允许通过一组匹配器模块扩展分类器,以及允许分类器执行诸如损坏、镜像甚至重新路由数据包之类的操作

Default Qdisc

The default qdisc used on all network devices is pfifo_fast. Network devices which do not require a transmit queue such as the loopback device do not have a default qdisc attached. The pfifo_fast qdisc provides three bands to prioritize interactive traffic over bulk traffic. Classification is based on the packet priority (diffserv).

所有网络设备上使用的默认qdisc是pfifo_fast。不需要传输队列的网络设备(如环回设备)没有连接默认的qdisc。pfifo_fast qdisc提供了三个频段,将交互流量优先于批量流量。分类基于数据包优先级(diffserv)

Default Qdisc

Multiqueue Default Qdisc

If the network device provides multiple transmit queues the mq qdisc is used by default. It will automatically create a separate class for each transmit queue available and will also replace the single per device tx lock with a per queue lock.

如果网络设备提供多个传输队列,则默认情况下使用mq qdisc。它将为每个可用的传输队列自动创建一个单独的类,还将用每个队列锁替换单个每个设备的tx锁。

Multiqueue default Qdisc

Example of a customized classful qdisc setup

The following figure illustrates a possible combination of different queueing and classification modules to implement quality of service needs.

下图说明了不同排队和分类模块的可能组合,以实现服务质量需求。

Classful Qdisc diagram

6.1. Traffic Control Object

Each type traffic control module (qdisc, class, classifier) is represented by its own structure. All of them are based on the traffic control object represented by struct rtnl_tc which itself is based on the generic object struct nl_object to make it cacheable. The traffic control object contains all attributes, implementation details and statistics that are shared by all of the traffic control object types.

每种类型的流量控制模块(qdisc、类、分类器)都由自己的结构表示。所有这些都基于struct rtnl_tc表示的流量控制对象,而struct rtnl_tc本身基于通用对象struct nl_object使其可缓存。流量控制对象包含所有流量控制对象类型共享的所有属性、实现细节和统计信息。

struct rtnl_tc hierarchy

It is not possible to allocate a struct rtnl_tc object, instead the actual tc object types must be allocated directly using rtnl_qdisc_alloc()rtnl_class_alloc()rtnl_cls_alloc() and then casted to struct rtnl_tc using the TC_CAST() macro.

Usage Example: Allocation, Casting, Freeing

#include <netlink/route/tc.h>

#include <netlink/route/qdisc.h>

struct rtnl_qdisc *qdisc;

/* Allocation of a qdisc object */

qdisc = rtnl_qdisc_alloc();

/* Cast the qdisc to a tc object using TC_CAST() to use rtnl_tc_ functions. */

rtnl_tc_set_mpu(TC_CAST(qdisc), 64);

/* Free the qdisc object */

rtnl_qdisc_put(qdisc);

6.1.1. Attributes

Handle

The handle uniquely identifies a tc object and is used to refer to other tc objects when constructing tc trees.

void rtnl_tc_set_handle(struct rtnl_tc *tc, uint32_t handle);

uint32_t rtnl_tc_get_handle(struct rtnl_tc *tc);

Interface Index

The interface index specifies the network device the traffic object is attached to. The function rtnl_tc_set_link() should be preferred when setting the interface index. It stores the reference to the link object in the tc object and allows retrieving the mtu and linktype automatically.

void rtnl_tc_set_ifindex(struct rtnl_tc *tc, int ifindex);

void rtnl_tc_set_link(struct rtnl_tc *tc, struct rtnl_link *link);

int rtnl_tc_get_ifindex(struct rtnl_tc *tc);

Link Type

The link type specifies the kind of link that is used by the network device (e.g. ethernet, ATM, …). It is derived automatically when the network device is specified with rtnl_tc_set_link(). The default fallback is ARPHRD_ETHER (ethernet).

void rtnl_tc_set_linktype(struct rtnl_tc *tc, uint32_t type);

uint32_t rtnl_tc_get_linktype(struct rtnl_tc *tc);

Kind

The kind character string specifies the type of qdisc, class, classifier. Setting the kind results in the module specific structure being allocated. Therefore it is imperative to call rtnl_tc_set_kind() before using any type specific API functions such as rtnl_htb_set_rate().

int rtnl_tc_set_kind(struct rtnl_tc *tc, const char *kind);

char *rtnl_tc_get_kind(struct rtnl_tc *tc);

MPU

The Minimum Packet Unit specifies the minimum packet size which will be transmitted ever be seen by this traffic control object. This value is used for rate calculations. Not all object implementations will make use of this value. The default value is 0.

void rtnl_tc_set_mpu(struct rtnl_tc *tc, uint32_t mpu);

uint32_t rtnl_tc_get_mpu(struct rtnl_tc *tc);

MTU

The Maximum Transmission Unit specifies the maximum packet size which will be transmitted. The value is derived from the link specified with rtnl_tc_set_link() if not overwritten with rtnl_tc_set_mtu(). If no link and MTU is specified, the value defaults to 1500 (ethernet).

void rtnl_tc_set_mtu(struct rtnl_tc *tc, uint32_t mtu);

uint32_t rtnl_tc_get_mtu(struct rtnl_tc *tc);

Overhead

The overhead specifies the additional overhead per packet caused by the network layer. This value can be used to correct packet size calculations if the packet size on the wire does not match the packet size seen by the kernel. The default value is 0.

void rtnl_tc_set_overhead(struct rtnl_tc *tc, uint32_t overhead);

uint32_t rtnl_tc_get_overhead(struct rtnl_tc *tc);

Parent

Specifies the parent traffic control object. The parent is identifier by its handle. Special values are:

  • TC_H_ROOT: attach tc object directly to network device (root qdisc, root classifier)

  • TC_H_INGRESS: same as TC_H_ROOT but on the ingress side of the network stack.

    void rtnl_tc_set_parent(struct rtnl_tc *tc, uint32_t parent);
    
    uint32_t rtnl_tc_get_parent(struct rtnl_tc *tc);

Statistics

Generic statistics, see Accessing Statistics for additional information.

uint64_t rtnl_tc_get_stat(struct rtnl_tc *tc, enum rtnl_tc_stat id);

6.1.2. Accessing Statistics

The traffic control object holds a set of generic statistics. Not all traffic control modules will make use of all of these statistics. Some modules may provide additional statistics via their own APIs.

Table 1. Statistic identifiers  (enum rtnl_tc_stat)
ID Type Description

RTNL_TC_PACKETS

Counter

Total # of packets transmitted

RTNL_TC_BYTES

Counter

Total # of bytes transmitted

RTNL_TC_RATE_BPS

Rate

Current bytes/s rate

RTNL_TC_RATE_PPS

Rate

Current packets/s rate

RTNL_TC_QLEN

Rate

Current length of the queue

RTNL_TC_BACKLOG

Rate

# of packets currently backloged

RTNL_TC_DROPS

Counter

# of packets dropped

RTNL_TC_REQUEUES

Counter

# of packets requeued

RTNL_TC_OVERLIMITS

Counter

# of packets that exceeded the limit

  RTNL_TC_RATE_BPS and RTNL_TC_RATE_PPS only return meaningful values if a rate estimator has been configured.

Usage Example: Retrieving tc statistics

#include <netlink/route/tc.h>

uint64_t drops, qlen;

drops = rtnl_tc_get_stat(TC_CAST(qdisc), RTNL_TC_DROPS);

qlen  = rtnl_tc_get_stat(TC_CAST(qdisc), RTNL_TC_QLEN);

6.1.3. Rate Table Calculations

6.2. Queueing Discipline (qdisc)

Classless Qdisc

The queueing discipline (qdisc) is used to implement fair queueing, priorization or rate control. It provides a enqueue() and dequeue() operation. Whenever a network packet leaves the networking stack over a network device, be it a physical or virtual device, it will be enqueued to a qdisc unless the device is queueless. The enqueue() operation is followed by an immediate call to dequeue() for the same qdisc to eventually retrieve a packet which can be scheduled for transmission by the driver. Additionally, the networking stack runs a watchdog which polls the qdisc regularly to dequeue and send packets even if no new packets are being enqueued.

排队规程(qdisc)用于实现公平排队、优先化或速率控制。它提供了一个enqueue()和dequeue()操作。当网络包通过网络设备(无论是物理设备还是虚拟设备)离开网络堆栈时,它将排队到qdisc,除非该设备是无队列的。enqueue()操作之后立即调用同一个qdisc的dequeue(),以最终检索可由驱动程序安排传输的数据包。此外,网络堆栈运行一个看门狗,它定期轮询qdisc,以便在没有新数据包排队的情况下出列并发送数据包。

This additional watchdog is required due to the fact that qdiscs may hold on to packets and not return any packets upon dequeue() in order to enforce bandwidth restrictions.

由于qdiscs可能会保留数据包,并且在dequeue()时不会返回任何数据包,因此需要额外的看门狗来执行带宽限制。

Multiband Qdisc

The figure illustrates a trivial example of a classless qdisc consisting of three bands (queues). Use of multiple bands is a common technique in qdiscs to implement fair queueing between flows or prioritize differentiated services.

Classless qdiscs can be regarded as a blackbox, their inner workings can only be steered using the configuration parameters provided by the qdisc. There is no way of taking influence on the structure of its internal queues itself.

Classful Qdisc

Classful qdiscs allow for the queueing structure and classification process to be created by the user.

Classful Qdisc

The figure above shows a classful qdisc with a classifier attached to it which will make the decision whether to enqueue a packet to traffic class 1:1 or 1:2. Unlike with classless qdiscs, classful qdiscs allow the classification process and the structure of the queues to be defined by the user. This allows for complex traffic class rules to be applied.

上图显示了一个类qdisc,它附带了一个分类器,该分类器将决定是否将数据包排队到1:1或1:2的流量类。与无类QDISC不同,有类QDISC允许用户定义分类过程和队列结构。这允许应用复杂的流量类规则。

Table 2. List of Qdisc Implementations
Qdisc Classful Description

ATM

Yes

FIXME

Blackhole

No

This qdisc will drop all packets passed to it.

CBQ

Yes

The CBQ (Class Based Queueing) is a classful qdisc which allows creating traffic classes and enforce bandwidth limitations for each class.

DRR

Yes

The DRR (Deficit Round Robin) scheduler is a classful qdisc impelemting fair queueing. Each class is assigned a quantum specyfing the maximum number of bytes that can be served per round. Unused quantum at the end of the round is carried over to the next round.

DSMARK

Yes

FIXME

FIFO

No

FIXME

GRED

No

FIXME

HFSC

Yes

FIXME

HTB

Yes

FIXME

mq

Yes

FIXME

multiq

Yes

FIXME

netem

No

FIXME

Prio

Yes

FIXME

RED

Yes

FIXME

SFQ

Yes

FIXME

TBF

Yes

FIXME

teql

No

FIXME

Table 3. QDisc API Overview
Attribute C Interface

Allocation / Freeing

struct rtnl_qdisc *rtnl_qdisc_alloc(void);

void rtnl_qdisc_put(struct rtnl_qdisc *qdisc);

Addition

int rtnl_qdisc_build_add_request(struct rtnl_qdisc *qdisc, int flags,

                                 struct nl_msg **result);

int rtnl_qdisc_add(struct nl_sock *sock, struct rtnl_qdisc *qdisc,

                   int flags);

Modification

int rtnl_qdisc_build_change_request(struct rtnl_qdisc *old,

                                    struct rtnl_qdisc *new,

                                    struct nl_msg **result);

int rtnl_qdisc_change(struct nl_sock *sock, struct rtnl_qdisc *old,

                      struct rtnl_qdisc *new);

Deletion

int rtnl_qdisc_build_delete_request(struct rtnl_qdisc *qdisc,

                                    struct nl_msg **result);

int rtnl_qdisc_delete(struct nl_sock *sock, struct rtnl_qdisc *qdisc);

Cache

int rtnl_qdisc_alloc_cache(struct nl_sock *sock,

                           struct nl_cache **cache);

struct rtnl_qdisc *rtnl_qdisc_get(struct nl_cache *cache, int, uint32_t);



struct rtnl_qdisc *rtnl_qdisc_get_by_parent(struct nl_cache *, int, uint32_t);

6.2.1. Retrieving Qdisc Configuration

The function rtnl_qdisc_alloc_cache() is used to retrieve the current qdisc configuration in the kernel. It will construct a RTM_GETQDISC netlink message, requesting the complete list of qdiscs configured in the kernel.

#include <netlink/route/qdisc.h>



struct nl_cache *all_qdiscs;



if (rtnl_link_alloc_cache(sock, &all_qdiscs) < 0)

        /* error while retrieving qdisc cfg */

The cache can be accessed using the following functions:

  • Search qdisc with matching ifindex and handle:

    struct rtnl_qdisc *rtnl_qdisc_get(struct nl_cache *cache, int ifindex, uint32_t handle);
    
    
  • Search qdisc with matching ifindex and parent:

    struct rtnl_qdisc *rtnl_qdisc_get_by_parent(struct nl_cache *cache, int ifindex , uint32_t parent);
    
    
  • Or any of the generic cache functions (e.g. nl_cache_search(), nl_cache_dump(), etc.)

Example: Search and print qdisc

struct rtnl_qdisc *qdisc;

int ifindex;

ifindex = rtnl_link_get_ifindex(eth0_obj);

/* search for qdisc on eth0 with handle 1:0 */

if (!(qdisc = rtnl_qdisc_get(all_qdiscs, ifindex, TC_HANDLE(1, 0))))

        /* no such qdisc found */

nl_object_dump(OBJ_CAST(qdisc), NULL);

rtnl_qdisc_put(qdisc);

6.2.2. Adding a Qdisc

In order to add a new qdisc to the kernel, a qdisc object needs to be allocated. It will hold all attributes of the new qdisc.

#include <netlink/route/qdisc.h>

struct rtnl_qdisc *qdisc;

if (!(qdisc = rtnl_qdisc_alloc()))

        /* OOM error */

The next step is to specify all generic qdisc attributes using the tc object interface described in the section Attributes.

The following attributes must be specified: - IfIndex - Parent - Kind

/* Attach qdisc to device eth0 */

rtnl_tc_set_link(TC_CAST(qdisc), eth0_obj);

/* Make this the root qdisc */

rtnl_tc_set_parent(TC_CAST(qdisc), TC_H_ROOT);

/* Set qdisc identifier to 1:0, if left unspecified, a handle will be generated by the kernel. */

rtnl_tc_set_handle(TC_CAST(qdisc), TC_HANDLE(1, 0));

/* Make this a HTB qdisc */

rtnl_tc_set_kind(TC_CAST(qdisc), "htb");

After specyfing the qdisc kind (rtnl_tc_set_kind()) the qdisc type specific interface can be used to set attributes which are specific to the respective qdisc implementations:

/* HTB feature: Make unclassified packets go to traffic class 1:5 */

rtnl_htb_set_defcls(qdisc, TC_HANDLE(1, 5));

Finally, the qdisc is ready to be added and can be passed on to the function rntl_qdisc_add() which takes care of constructing a netlink message requesting the addition of the new qdisc, sends the message to the kernel and waits for the response by the kernel. The function returns 0 if the qdisc has been added or updated successfully or a negative error code if an error occured.

  The kernel operation for updating and adding a qdisc is the same. Therefore when calling rtnl_qdisc_add() any existing qdisc with matching handle will be updated unless the flag NLM_F_EXCL is specified.

The following flags may be specified:

NLM_F_CREATE

Create qdisc if it does not exist, otherwise -NLE_OBJ_NOTFOUND is returned.

NLM_F_REPLACE

If another qdisc is already attached to the same parent and their handles mismatch, replace the qdisc instead of returning -EEXIST.

NLM_F_EXCL

Return -NLE_EXISTS if a qdisc with matching handles exists already.

  The function rtnl_qdisc_add() requires administrator privileges.
/* Submit request to kernel and wait for response */

err = rtnl_qdisc_add(sock, qdisc, NLM_F_CREATE);



/* Return the qdisc object to free memory resources */

rtnl_qdisc_put(qdisc);



if (err < 0) {

        fprintf(stderr, "Unable to add qdisc: %s\n", nl_geterror(err));

        return err;

}

6.2.3. Deleting a qdisc

#include <netlink/route/qdisc.h>

struct rtnl_qdisc *qdisc;

qdisc = rtnl_qdisc_alloc();

rtnl_tc_set_link(TC_CAST(qdisc), eth0_obj);

rtnl_tc_set_parent(TC_CAST(qdisc), TC_H_ROOT);

rtnl_qdisc_delete(sock, qdisc)

rtnl_qdisc_put(qdisc);
  The function rtnl_qdisc_delete() requires administrator privileges.

6.2.4. HTB - Hierarchical Token Bucket

HTB Qdisc Attributes

Default Class

The default class is the fallback class to which all traffic which remained unclassified is directed to. If no default class or an invalid default class is specified, packets are transmitted directly to the next layer (direct transmissions).

uint32_t rtnl_htb_get_defcls(struct rtnl_qdisc *qdisc);

int rtnl_htb_set_defcls(struct rtnl_qdisc *qdisc, uint32_t defcls);

Rate to Quantum (r2q)

TODO

uint32_t rtnl_htb_get_rate2quantum(struct rtnl_qdisc *qdisc);

int rtnl_htb_set_rate2quantum(struct rtnl_qdisc *qdisc, uint32_t rate2quantum);

HTB Class Attributes

Priority

uint32_t rtnl_htb_get_prio(struct rtnl_class *class);

int rtnl_htb_set_prio(struct rtnl_class *class, uint32_t prio);

Rate

The rate (bytes/s) specifies the maximum bandwidth an invidivual class can use without borrowing. The rate of a class should always be greater or erqual than the rate of its children.

uint32_t rtnl_htb_get_rate(struct rtnl_class *class);

int rtnl_htb_set_rate(struct rtnl_class *class, uint32_t ceil);

Ceil Rate

The ceil rate specifies the maximum bandwidth an invidivual class can use. This includes bandwidth that is being borrowed from other classes. Ceil defaults to the class rate implying that by default the class will not borrow. The ceil rate of a class should always be greater or erqual than the ceil rate of its children.

uint32_t rtnl_htb_get_ceil(struct rtnl_class *class);

int rtnl_htb_set_ceil(struct rtnl_class *class, uint32_t ceil);

Burst

TODO

uint32_t rtnl_htb_get_rbuffer(struct rtnl_class *class);

int rtnl_htb_set_rbuffer(struct rtnl_class *class, uint32_t burst);

Ceil Burst

TODO

uint32_t rtnl_htb_get_bbuffer(struct rtnl_class *class);

int rtnl_htb_set_bbuffer(struct rtnl_class *class, uint32_t burst);

Quantum

TODO

int rtnl_htb_set_quantum(struct rtnl_class *class, uint32_t quantum);

extern int rtnl_htb_set_cbuffer(struct rtnl_class *, uint32_t);

6.3. Class

  UNSPEC TC_H_ROOT 0:pY pX:pY

UNSPEC

qdisc =

root-qdisc

class =

root-qdisc:0

qdisc =

pX:0

class =

pX:0

0:hY

qdisc =

root-qdisc

class =

root-qdisc:hY

qdisc =

pX:0

class =

pX:hY

hX:hY

qdisc =

hX:

class =

hX:hY

if pX != hX return -EINVAL

qdisc =

hX:

class =

hX:hY

6.4. Classifier (cls)

TODO

6.5. ClassID Management

TODO

6.6. Packet Location Aliasing (pktloc)

TODO

6.7. Traffic Control Module API

TODO

Version 3.1
Last updated 2014-01-21 20:43:12 CET

猜你喜欢

转载自blog.csdn.net/arv002/article/details/112649518