GNN+RA 文献阅读-- GNN对RA的建模

简述：主要是几篇如何利用GNN 对资源分配进行建模的paper，【1】【2】都是对无线链路建模，【3】比较有参考性，【4】偏于RL，对GNN表述模糊。

用GNN建模网络的思路：

1.Graph 是有向图还是无向图？

2. Node 表示什么？ feature 有哪些？

3.Edge表示什么？ feature 有哪些？

4.GNN 如何更新？目前一般的更新是先聚合邻居节点的消息，然后对所有information组合。

对聚合 AGGREGATE 和组合 COMBINE 会有不同的function选择。

[1]

Z. He, L. Wang, H. Ye, G. Y. Li, and B.-H. F. Juang, ‘Resource Allocation based on Graph Neural Networks in Vehicular Communications’, in 2020 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), in IEEE Global Communications Conference. New York: IEEE, 2020. doi: 10.1109/GLOBECOM42002.2020.9322537.

摘要：本文为V2X网络开发了一种分布式GNN增强的RL频谱共享方案。在我们提出的方法中，V2V网络被表示为一个图。V2V对的本地观测值和干扰链路的信道增益分别被视为节点和边的信息。我们使用GNN根据图信息学习V2V对对应的每个节点的低维特征。

Model：

Graph 建模为一个有向图

Node：each of the V2V pairs is regarded as a node

Edge：interference links between V2V pairs as edges.

Node features：node observation contains the VUEs channel gain and its corresponding transmit power

Edge features： edge weights are represented by the interference channel gain.

使用GNN 抽取 node features

本文使用的 Node feature传递消息的方式

|| 表示矩阵的拼接，相当于 node v 处的observation ，上一轮次的 mu值，现在所有邻居边的加和，上一轮次所有mu的加和，组成一个矩阵乘以一个权重。

$\mu$ 的维度为什么是可以设置的？？

算法的framework

大意就是两层循环，外层循环是t，训练的是RL网络，用来做决策，但是RL网络的输入状态s（t）不是直接输入的观察observation，而是使用GNN 先对 observation进行 feature抽取，然后作为s（t）的一部分，GNN feature 需要迭代训练一个网络，这是内层循环。

feature 抽取是抽取什么feature？？为什么 O即observation 和抽取后的feature要一起作为state， feature 不是已经包含observation了？

simulation部分数据很少，只和random 比较了，random大约是RL-GNN的83%的性能，说明这个GNN并不是非常好。

[2]

T. Chen, X. Zhang, M. You, G. Zheng, and S. Lambotharan, ‘A GNN-Based Supervised Learning Framework for Resource Allocation in Wireless IoT Networks’, IEEE Internet of Things Journal, vol. 9, no. 3, pp. 1712–1724, Feb. 2022, doi: 10.1109/JIOT.2021.3091551.

摘要：提出了一种基于图神经网络( Graph Neural Network，GNN )的框架，以有监督的方式解决这一挑战。具体地，将无线网络建模为有向图，其中期望的通信链路建模为节点，有害的干扰链路建模为边。

GAP：之前有一些GNN的工作[23][24] 限于同构的无线系统，可能无法兼容异构的物联网系统。此外，这些工作只研究了连续优化问题，所提出的方法可能无法处理离散优化问题

GNN 建模：整个被建模为有向图。

Node： the communication link between a transceiver pair can be treated as a node, and the Edge： interference link between two nodes can be treated as an edge.

Node features： The properties, such as the distance, channel information weight and priority that are related to communication links can be taken as node features.

Edge features: The properties, such as the distance and channel information that are related to interference links can be treated as edge features.

Update GNN的方式：

[3]

Z. Sun, Y. Mo, and C. Yu, ‘Graph-Reinforcement-Learning-Based Task Offloading for Multiaccess Edge Computing’, IEEE Internet of Things Journal, vol. 10, no. 4, pp. 3138–3150, Feb. 2023, doi: 10.1109/JIOT.2021.3123822.

摘要：启发式算法严重依赖于MEC系统的精确数学模型，DRL没有合理利用MEC图中设备之间的关系。针对这一问题，本文提出了一种基于图神经网络( Graph Neural Network，GNN )的任务卸载机制，该机制可以直接在具有消息传递和聚集性的图数据上进行学习。

GAP：启发式算法需要专家知识，难以胜任动态MEC场景下的实时决策，且环境变化往往需要解决。深度强化学习( DRL )将强化学习与深度神经网络( DNNs )相结合，为上述挑战提供了一种有前景的解决方案。研究人员研究了DRL在各种MEC任务卸载问题中的应用。由无线接入设备、无线信道和MEC服务器组成的MEC系统被认为是一个RL环境。利用DNN强大的建模能力进行综合表示，通过与环境的交互来学习卸载策略。然而，DNN并不能很好地表示图数据，以无线通信为边的MEC中组件之间的拓扑关系被忽略。因此，传统的DRL方法在MEC拓扑变得复杂时适应性较弱，且不适用。

实际上这篇paper 主要优化了offloadding变量

Model：

N个 wireless devices A个server 每一个devices 可以连接到1个或多个MEC server

the system time $\mathcal{T}$ is divided into consecutive time frames of equal length t.

动态：任务可以在 $\mathcal{T}$ 的任一个slot产生。 task可offload，可在local computing。

GNN 建模：将系统建模为一个无向图

Node： network entities in the MEC system （有WD 和server 两种类型）

Edge：WD and MEC communication connections

Node features：WD 节点 update according to $\mathbb{T}_t$ ,即每个slot产生的task 信息

Time varying ：Applications on the WD randomly generate tasks, the wireless channel gain and background noise of the MEC system are also time varying.

The topology of devices may change over time, such as nodes addition or reduction, edges establishment, and disconnection.

Other parameters (e.g., the maximum computational performance of a particular server node and the total communication bandwidth of a wireless AP) are fixed.

Goal： Our goal is to minimize the total task-weighted response time of the MEC system over time by designing a reasonable computation offloading policy and resource scheduling policy.

对于RL来说，其任务是在给定graph $\mathcal{G}$ 和task $\mathbb{T}_t$ 的条件下，找到最优的卸载策略 $\pi$ .

算法的主要框架：

1. 将MEC 和 WD 建模为 Graph，通过GNN 来抽取 Graph的特点，预测 x，其中x 表示offloading变量，注意这里预测出的 x 是每个元素是连续变量。

2.得到预测向量 $\hat{x}$ 之后，再使用量化方法（a modified orderpreserving algorithm）将其量化为 0或1。量化方法取决于探索空间 K，量化结果选取可以根据Q值。

Suppose that GNN has prediction ˆ xt at the time slot t, then we quantize ˆ xt and extend K binarized offloading actions. In general, K is related to the size of the search space of the extended algorithm. When the search space is constant, the larger K is, the better the extended algorithm is, but accordingly, the computational complexity increases.

Graph Node Embedding

The neighborhood information aggregation process as follows:

Edges embedding:

[4]

K. Li, W. Ni, X. Yuan, A. Noor, and A. Jamalipour, ‘Deep-Graph-Based Reinforcement Learning for Joint Cruise Control and Task Offloading for Aerial Edge Internet of Things (EdgeIoT)’, IEEE Internet of Things Journal, vol. 9, no. 21, pp. 21676–21686, Nov. 2022, doi: 10.1109/JIOT.2022.3182119.

摘要：本文研究了一种新的无人机巡航控制和任务卸载分配的联合优化，在物联网设备的计算能力和电池预算以及无人机的速度限制的情况下，最大化卸载到无人机的任务。由于优化包含较大的解空间，而无人机的瞬时网络状态是未知的，因此我们提出了一种新的基于深度图的强化学习框架。开发了一种优势行为者-评论家( A2C )结构，用于训练无人机在飞行速度、航向和卸载调度等方面的实时连续动作。

实际就是根据网络的现有状态，去决策无人机飞到哪个位置，以及选择哪个WD的任务进行处理。

Model：

N ground devices

1 UAV

GNN 建模：

这部分给的很模糊，说明了使用GNN 根据网络状态 state 来推理 action，即轨迹和资源分配。

给出了state 的定义和 action的定义，但是并没有具体给出在 GNN中，每个vertex 和edge怎么定义，包含哪些 feature，以及 message passing仅给出了定义，并没有说明本文是如何做的，以及用的究竟是哪一类的激活函数等。

State：