Computing Resource Representation in Computing Aware Networking

Introduction Traditionally, the network can only do traffic engineering according to the network statuses. As the trend of computing and network convergence, some works are proposed for network to be aware of service information, and can make a better choice in the traffic steering accordingly. Dynamic Anycast (Dyncast) and Computing Aware Networking (CAN) could make the routing decisions based on both the network and computing statuses, being considered as an advanced mechanism in computing and network convergence. In traditional network architecture, the network is only responsible for delivering packets between servers and clients, and is not aware of the computing information. and show that, when service instances are deployed at multiple geographical edge sites, Dyncast would achieve service equivalence and load balancing by considering both the service metrics and network metrics. However, the method of notifying the service metrics in the network, representation of computing resources, and signaling of computing resource to the network are still uncertain, which is important for the network domain to know about the computing domain. This document dose further exploring on the way of service metrics encoding and signaling.

Consideration of and representation of computing metric

Comparation of network metric and computing metric The main job of the network is to forward the packets of the users from the source to the destination, while the main job of the computing is to complete the various tasks of the users. The network metrics include the bandwidth, latency, jitter, etc. They can describe the abilities of the network, and are independent of the detailed realization of the underlayer technologies, such as the mode of the optical fiber, or the structure of a switch. The service metrics are more complex, which is hard to match the QoS/QoE. For example, if the task is the AI computing, such as the image processing, the computing resource can be measured by using FLOPS (Floating-point Operations Per Second) or TFLOPS (Tera FLOPS). However, it is more difficult to get the process time, which will be influenced by the current utilization rate of CPU, cache, and so on. Even some real-time OS or protocol are used, some times it will fail because of the deadlock or others mechanisms of OS.That is not to say there is any problem with the OS, but the complex environment in it. So the service metric will consider more factors to judge the performance, and how to be used in other domain to guarantee the E2E service quality.

Representation of computing metric Based on the diversity of computing resources, to use the information of computing resource for network, we can use two ways to represent them. At one aspect, we can offer a general computing load information to the ingress nodes. As an example, we perhaps only need to three values: one red value stands for the busy status, one yellow value stands for relatively busy status, one green value stands for free status. Therefore, the ingress node only needs to consider the yellow MECs and green MECs when doing load balancing, in which the green ones are more preferred. That is like the SR policy and could also be used together, for example, to choose a yellow path and a yellow service instance. At the other aspect, we can also offer some other computing related information to the ingress nodes for a specific, such as: the service information deployed on MEC, for example, Service ID, the maximum session number that the MEC can provide, the current session number that is in use, the CPU/GPU utilization, the FLOPS/HASH ability of the server, the available computing infrastructure of the server, etc. These additional information may be optional and encoded as TLVs. A specific service may have a specific preferred set of TLVs. When more information is offered, the ingress node can make a better decision according to more dimensions. For example, if multiple instances have the same free status, the ingress node can make a better choice according to the additional TLVs. The detailed decision algorithm is out of scope of this document.

Example precess of computing load information For a specific service, we can offer both a general computing load information and some more specific information about the computing. A general process about it is described as below. Step1: The service instances are deployed in multiple MECs. The ingress nodes of network working as the load balancing point needs to obtain the computing information. The service should have a specific SID, for example SID1, in the network, so that the ingress node can recognize and treat the service request differently according to SID. Step2: After obtaining the computing information of a service related to ServiceID1 from multiple MECs, the ingress nodes should record the computing information. Meanwhile, an ingress node should also be able to obtain network status, for example the latency to the egress of an MEC, and record it. Step3: An ingress node receives a packet targeted to the ServiceID1. According to the service metrics and network metrics it has recorded, the ingress node makes a decision about which MEC to use and forward the packet to the related egress. The selection method may be depended on the service. For example, it may be the one with the lowest latency among the ones that can offer the service, or the one with the best computing resource among the ones that have a latency fulfilling the service requirements, or a hybrid method. The purpose of the procedure is to find an MEC that is relatively near to the client, and also have enough computing resource for the service. However, the MECs that provide the service may be various, and perhaps have different computing abilities. Therefore, a load balancing method considering the computing resource is useful in this scenario.

Signaling of Computing Load Information For the signaling, a general process about it is described as below. Step1: The gateway of the MEC collects the status information of a service, such as SID1. For example, the controller in the MEC can collect the information and notify the gateway of the MEC. Step2: The egress of the MEC receives the service status information of the SID1 from the gateway of the MEC, and notify other network nodes including the ingress nodes. In the first step, the controller or the gateway perhaps can communication by using PCE or other protocol for the SDN controller. In the second step, the SDN method can also be used; however, communications between the controller of the MEC and the controller of the network may be needed, which is complicated. In this document, we suggest transferring computing information by using BGP. When we are notifying that the MEC can support SerivceID1, i.e., the route for ServiceID1, we can include additional computing information in its Extended Community.

IANA Considerations TBD.

Security Considerations TBD.

Acknowledgements TBD.