Network Coding Rate Regions

## Network Coding Rate Regions

Network Coding (NC) is a communication paradigm for transferring and storing information in networks. The traditional approach to transferring information across networks is routing, where nodes in the network simply forward packets appearing on incoming links to other nodes via outgoing links. A network coding scheme allows nodes to combine the incoming packets via linear or non-linear functions and forward the result and outperforms routing in simplest of networks (e.g. see the celebrated butterfly network). Codes for distributed storage systems with certain repair and reliability requirements can be thought of as network coding schemes with the following objectives: 1) use as less storage space as possible 2) transfer as less information as possible during repair of failed nodes, all the while, satisfying the reliability requirements.

We study the limits on performance of network coding in a very general setting called multi-source multi-sink network coding over directed acyclic graphs (DAGs), which allows multiple senders and multiple receivers demanding to receive packets from arbitrary subsets of senders. The limits on performance of network coding for a given network are best described using its rate region which is the set containing all rate vectors for which communication is possible using some network coding scheme. Characterizing these sets using linear/non-linear inequalities is known to be a notoriously difficult problem in information theory. Yan et. al. [B1] provided the following implicit characterization of rate region R*: (1)

where, ω is the vector of source information rates, r is the vector of edge capacities, Γn* is the yet unknown 2n - 1-dimensional set of all entropic vectors, L,L′′ are sets of constraints on entropy vectors encoding independence of sources and the fact that outgoing messages at a node are functions of incoming messages, and con(),,projω,r() represent convex hull,closure and projection onto ω,r coordinates respectively.

Roughly speaking, R* is the projection of a set with exponentially large dimension that is Γn*. However, characterizing Γn* is itself a long-standing fundamental open question in information theory. We use following approaches to handle this problem:

• Computation: Use computational means to determine rate regions of small networks. This is done by evaluating the formula in equation 1 with the unknown set Γn* replaced by known bounds (strict subsets or supersets of Γn*) on Γn*. This is essential for providing intuition about large networks.
• Symmetry: Use symmetry of networks to gain insights into structure of rate regions and improving efficiency of algorithms for computing bounds on rate regions.
• Hierarchy: Develop a hierarchy of networks. Based on the lines of graph well quasi ordering, this approach revolves around different notions defining sub-networks (minor relations). We try answering questions like: For which classes of networks and minor relations can we determine rate regions of large networks using rate regions its subnetworks?
• Applications: We apply our computational tools to solve the data distributed storage systems. One of the earlier models on distributed storage systems is Multilevel Diversity Coding Systems (MDCS). We solved thousands of MDCS instances and investigate their sufficiency of simple codes and sufficiency of Shannon outer bound.

### I Computation to determine the rate regions of small networks

Information theorists have traditionally proved rate regions by hand. This involves a converse proof: finding a collection of conic combinations of inequalities known to hold for the problem that gives the tightest collection of implied inequalities amongst all such collections, giving an outer bound on the rate region and an achievable code construction for every point in the outer bound, thus showing it to be the exact rate region. In network coding, however, this scheme falls apart due to sheer number of inequalities one must consider: e.g. the simplest polyhedral outer bound on Γn*, the LP outer bound, has 2n-2 + n inequalities[B2]. Imagine the manual labor required to find that many conic combination coefficients to prove just 1 inequality bounding the rate region! Similarly, finding an achievability scheme for a given rate point also involves a lot of head-scratching for arbitrary networks. Proving that a small network coding instance has a certain rate region can be seen as a three step process:

• Compute an outer bound Rout using outer bound Γnout Γn*
• Compute an inner bound Rin using an outer bound Γnin Γn*
• Do they match?
• Yes: Declare that we have found the rate region
• No: Repeat step 1 and 2 with improved bounds

By improving the inner/outer bounds in step 3, we mean smaller outer bound Γnout Γnout and/or a bigger inner bound Γnin Γnin. An important restriction we make when considering the inner/outer bounds in step 1) and 2) is to consider only polyhedral bounds that are in fact polyhedral cones. Steps 1) and 2) can now be stated as an instance of evaluating following equation, (2)

where L is the intersection of all network coding constraints. This restriction allows us to use a variety of polyhedral computation algorithms for problems such as polyhedral extreme ray enumeration, redundancy removal, and projection, for which there are nice implementations out there (see cdd, lrs, chm). While most known polyhedral outer bounds on Γn* have tractable inequality representations and humongous (mostly unknown) extreme ray representations, most polyhedral inner bounds are the opposite, with tractable extreme ray representations and humongous (mostly unknown) inequality representations.

#### I.1 Computing outer bounds

Outer bound Rout on network coding rate region can be obtained using an outer bound on region of entropic vectors Γnout. If the outer bound used is Shannon outer bound of Γn, we get an outer bound RLP on the rate region. Other outer bounds on region of entropic vectors, especially those tighter than Γn, can be obtained using non-Shannon outer bounds formed by adding non-Shannon information inequalities to Γn. One of the methods of computing non-Shannon information inequalities is by using Csirmaz’s extended copy lemma [B3]. Both of these problems: computation of of Rout and computation of non-Shannon information inequalities can be solved using our software chm that performs exact polyhedral projection using rational arithmetic and floating point simplex with basis identification for exact solutions to LPs (QsOpt_ex).

#### I.2 Computing Inner bounds

The most desirable property of an inner bound Γnin on region of entropic vector is a readily available construction of a collection random variables for every point in the bound. Two such classes of inner bounds are 1)Γnq and 2)Γn,k,qspace[A1], which are inner bounds obtained from 1) matroids representable over Fq 2) n-subspace arrangements over Fq that can be obtained from size k > n representable matroids. We device algorithms to compute these classes of inner bounds which are available in extreme ray representations of the associated polyhedral cones. Of special interest are the orderly generation algorithms [B4] and those based on homomorphism principle[B5].

#### I.3 From computer aided proof to traditional human-friendly proof

Recall that an inequality in the rate region of a network is in terms of rate variables (edge capacities) and source entropies. Sometimes, the inequalities in a rate region are easy to derive. However, for many instances, it is not easy to derive by hand all of the inequalities for the rate region. While we can use computation to find rate regions, when we are done, we would also like to have a more traditional converse proof i.e. we would like to know the conic combination of information inequalities that yielded a given rate region inequality. A technique based on LP duality is discussed in [B6B7] using which we show how to use a computer to generate traditional converse proofs automatically.

The projection of the Shannon outer bound intersected with the network constraints gives an outer bound on the network coding rate region, which is in principle equivalent to a converse proof. Suppose a polyhedral cone P is determined by Ax 0. For an inequality implied by Ax 0, bTx 0, where bT has zeros on eliminated dimensions, then there exists a vector λ 0 such that ATλ = b. The vector λ gives coefficients for the weighted sum of inequalities in P.

Such a coefficient vector λ need not be unique. Hence, among the various possibilities for λ satisfying b = ATλ, we want to select one that gives the simplest proof, which means it involves the smallest number of inequalities and constraints. This vector will thus be the sparsest vector λ, having the fewest non-zero entries, i.e. the lowest l0-norm.

Because the l0-norm is a complicated non-differentiable and non-linear function, this is a difficult optimization problem to solve directly. Hence we use the typical relaxation that replaces the l0-norm with the l1-norm. Furthermore, since λ 0, we do not need the absolute values in the l1-norm, and the problem is approximated by the linear program (3)

which is equivalent to (4)

An example of the computer aided proof applied to an MDCS instance can be found in Fig. 1. Figure 1: Computer aided proof for an example of MDCS. Computer gives inequalities and coefficients used to prove the last inequality in the rate region.

### II Using of symmetry of networks to our advantage

Graph theoretic symmetry, symmetries of network codes, and symmetries of rate regions are inherently linked to each other.[A2] Network symmetry group (NSG) is the common thread between these notions of symmetry. It can be computed as a subgroup of the automorphism group of a directed cyclic graph appropriately constructed from the underlying network’s directed acyclic graph. Generalizations of Chen and Yeung’s partition symmetrical entropy functions [B8] can be obtained corresponding to any network symmetry group. Knowledge of the network symmetry group can be utilized to reduce the complexity of computing the LP outer bounds on network coding capacity as well as the complexity of polyhedral projection for computing frate regions. Symmetry can also be used for improving the efficiency of computing bounds on network coding rate region. For example, computing the LP outer bound RLP [B2] on R* is easier when the NSG is known. The algorithm that can compute RLP when NSG is known is called symCHM [A3]. As the name suggests, it is a symmetry exploiting variant of CHM. symCHM is a general polyhedral projection algorithm i.e. it is not limited to computing network coding rate regions. Here is an example comparing symCHM to CHM. Figure 2: (top) Sequence of inner bounds produced by CHM and (bottom) sequence of inner bounds produced by symCHM while projecting a 4-cube to 3 dimensions given the knowledge that projection has S3 as its permutational symmetry group

### III Hierarchy of Networks

On the lines of the the forbidden minor characterizations for combinatorial structures such as graphs and matroids, we would like to see if sufficiency of linear codes for networks also has forbidden network minor characterization, atleast for restricted classes of networks. We define embedding operations in such a manner that if a class of codes is not optimal for a small network, then all large networks that have this small network embedded in it will have the same property. Furthermore, since we can obtain a database of rate regions of small networks, we would like to extend the solvability to large networks. Thus, we defined some combination operations so that rate region of the large network can be obtained from the rate regions of small networks involved in the combination. Fig. 3: Network embedding and combination operations to preserve important properties.

A demonstration of usefulness of embedding operations can be found in [A4], where we obtained forbidden minor list for MDCS problems up to 3-level 4-encoders ,with respect to the following properties 1) Sufficiency of binary scalar codes for achieving the entire rate region 2) Sufficiency of superposition coding, as in Fig. 4.  Fig. 4: Forbidden network minors for scalar binary codes (upper) and superposition coding (lower) on MDCS problems.

As a demonstration of usefulness of combination operations is shown in Fig. 5, which shows how to solve a large network by using combination operators on some small networks and their rate regions. Fig. 5: An example of large network that is a combination of five small networks and it can be solvable by combining rate regions of the five small networks. We have further extended these operators to general hyperedge networks in [A5]. We show that by applying recursively the seemingly straightforward combination and embedding operators described above, one can reason about large networks using small networks that are not directly related to them via one-shot use of combination and/or embedding operators.

Funding for this Topic: This work supported by the National Science Foundation under CCF-1421828 and CCF-1016588.

### Relevant Publications & Presentations

[A1]   J. Apte, C. Li, and J. Walsh, “Algorithms for computing network coding rate regions via single element extensions of matroids,” in Information Theory (ISIT), 2014 IEEE International Symposium on, June 2014, pp. 2306–2310.

[A2]   J. Apte and J. M. Walsh, “Symmetry in Network Coding,” in Int. Symp. Information Theory (ISIT), June 2015, to appear.

[A3]   ——, “Exploiting Symmetry in Computing Polyhedral Bounds on Network Coding Rate Regions,” in Int. Symp. Network Coding (NetCod), June 2015, to appear.

[A4]   Congduan Li, Steven Weber, and John MacLaren Walsh, “On Multilevel Diversity Coding Systems,” IEEE Trans. Inform. Theory, submitted on July 21, 2014. Reviews received January 6, 2016. [Online]. Available: http://arxiv.org/abs/1407.5659

[A5]   ——, “On Multi-source Networks: Enumeration, Rate Region Computation, and Hierarchy,” IEEE Trans. Inform. Theory, submitted on July 21, 2015. [Online]. Available: http://arxiv.org/abs/1507.05728

[A6]   C. Li, S. Weber, and J. Walsh, “Network embedding operations preserving the insufficiency of linear network codes,” in 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton), 2014, Oct 2014.

[A7]   C. Li, J. Apte, J. M. Walsh, and S. Weber, “A new computational approach for determining rate regions and optimal codes for coded networks,” in IEEE International Symposium on Network Coding (NetCod), Jun 2013, pp. 1 –6.

[A8]   C. Li, J. M. Walsh, and S. Weber, “Computational approaches for determining rate regions and codes using entropic vector bounds,” in 50th Annual Allerton Conference on Communication, Control and Computing, Oct 2012, pp. 1 –9.

[A9]   S. Weber, C. Li, and J. M. Walsh, “Rate region for a class of delay mitigating codes and p2p networks,” 46th Annual Conference on Information Sciences and Systems, March 2012.

### External References

[B1]   X. Yan, R. Yeung, and Z. Zhang, “An implicit characterization of the achievable rate region for acyclic multisource multisink network coding,” IEEE Transactions on Information Theory, vol. 58, no. 9, pp. 5625–5639, Sept 2012.

[B2]   R. W. Yeung, Information Theory and Network Coding. Springer, 2008.

[B3]   L. Csirmaz, “Information inequalities for four variables,” January 2013.

[B4]   B. D. McKay, “Isomorph-free exhaustive generation,” J. Algorithms, vol. 26, no. 2, pp. 306–324, Feb. 1998. [Online]. Available: http://dx.doi.org/10.1006/jagm.1997.0898

[B5]   Betten, A., Braun, M., Fripertinger, H., Kerber, A., Kohnert, A., and Wassermann, A., Error-Correcting Linear Codes: Classification by Isometry and Applications. Springer-Verlag Berlin Heidelberg, 2006, vol. 18.

[B6]   C. Tian, “Rate region of the (4, 3, 3) exact-repair regenerating codes,” in IEEE International Symposium on Information Theory (ISIT), July 2013, pp. 1426–1430.

[B7]   ——, “Characterizing the rate region of the (4,3,3) exact-repair regenerating codes,” IEEE Journal on Selected Areas in Communications, vol. 32, no. 5, pp. 967–975, May 2014.

[B8]   Q. Chen and R. W. Yeung, “Partition-symmetrical entropy functions,” CoRR, vol. abs/1407.7405, 2014. [Online]. Available: http://arxiv.org/abs/1407.7405