Introduction to PairViz

Catherine B. Hurley and R.W. Oldford

2018-08-09

PairViz is an R package which provides orderings of objects for visualisation purposes. The problem of constructing an ordering can be regarded as traversing possibly weighted graphs. In this vignette, we focus on graphs and their traversal. Applications to data visualisation are given in accompanying vignettes.

PairViz installation

PairViz is installed from CRAN in the usual way.

install.packages("PairViz")

PairViz uses the graph data structure provided by the graph package which is located on the Bioconductor repository rather than CRAN. To install this, use

source("https://bioconductor.org/biocLite.R")
biocLite("graph")

In this vignette, we will also show plots of the graph structure, which requires package igraph from CRAN. The following utility function will be helpful:

requireNamespace("igraph")
igplot <- function(g,weights=FALSE,layout=igraph::layout_in_circle, 
                   vertex.size=60, vertex.color="lightblue",...){
    g <- igraph::graph_from_graphnel(as(g, "graphNEL"))
    op <- par(mar=c(1,1,1,1))
    if (weights){
      ew <- round(igraph::get.edge.attribute(g,"weight"),3)  
      igraph::plot.igraph(g,layout=layout,edge.label=ew,vertex.size=vertex.size,vertex.color=vertex.color,...)
    }
    else
    igraph::plot.igraph(g,layout=layout,vertex.size=vertex.size,vertex.color=vertex.color,...)
    par(op)
}

Alternatively, graphs may be plotted using package Rgraphviz which is installed from Bioconductor:

source("https://bioconductor.org/biocLite.R")
biocLite("Rgraphviz")
library(Rgraphviz)
# For a graph g use
plot(g)

Graphs

A graph is a mathematical structure made up of nodes, and edges joining those nodes. We consider only graphs where edges are undirected. In some cases, edges are assigned weights, reflecting some measure of importance.

A complete graph is a graph where there is an edge connecting every pair of nodes. Here are two such complete graphs, k4 with four nodes and k5 with five nodes.

suppressPackageStartupMessages(library(PairViz))
k4 <- mk_complete_graph(4)
k5 <- mk_complete_graph(5)
igplot(k4)
igplot(k5)

Traversing graph edges

We will first construct graph traversals that visit every edge of the graph at least once. The PairViz function eulerian() gives such traversals:

eulerian(k5)
#>  [1] "1" "2" "3" "1" "4" "2" "5" "3" "4" "5" "1"

Mathematically speaking, an Eulerian path of a graph is a path that visits every edge of a graph exactly once. An Eulerian tour of a graph is an Eulerian path that is closed, i.e. ends up at the starting node. For the graph k5, one such Eulerian tour goes from 1 ->2 -> 3 -> 1 and so on until it ends back at node 1, as given by eulerian(k5). It is well-known that a graph has an Eulerian tour if every node has an even number of edges. This condition holds for a complete graph with an odd number of nodes, such as k5.

Similarly, a graph has an Eulerian path if all but two nodes have an even number of edges. If we remove one edge, say that connecting nodes 1 and 2 from k5, these two nodes have an odd number of edges while other nodes are even.

k5a <- removeEdge("1", "2", k5)
igplot(k5a)
eulerian(k5a)
#>  [1] "1" "3" "2" "4" "1" "5" "3" "4" "5" "2"

The Eulerian for k5a starts at one of the odd nodes (here “1”) and visits all edges ending at “2”, the other odd node.

Most graphs are not Eulerian, that is they do not meet the conditions for an Eulerian path to exist. The graph k4 for instance, has four nodes and all have three edges. In this case, any path visiting all edges must visit some edges more than once. This is what eulerian(k4) does:

eulerian(k4)
#> [1] "1" "3" "2" "4" "3" "4" "1" "2"

If you look closely you will see the edge connecting nodes “3” and “4” is visited twice. An other way to think of it is that extra edges need to be added to a graph so that all but two nodes become even. For k4 only one extra edge is needed, and in the example above that extra edge connects nodes “3” and “4”.

In summary, for any graph, the function eulerian() returns a path visiting all edges at least once. If the graph is Eulerian, the returned path visits each edge exactly once. Otherwise, some edges are visited twice. For more details of this algorithm, see Hurley and Oldford (2010, 2011).

Edge traversal of a weighted graph

First construct a graph with weighted edges.

n <- LETTERS[1:5]
g <- new("graphNEL",nodes=n)
efrom <- n[c(1,1,2,2,2,4)]
eto <- n[c(2:3,3:5,5)]
ew <- c(8,9,5:7,1)
g <- addEdge(efrom, eto, g, ew) 

To plot the graph use

igplot(g, weights=TRUE,edge.label.color="black")

eulerian(g)
#> [1] "E" "D" "B" "C" "A" "B" "E"

As the graph g is weighted, the eulerian algorithm starts at the edge with the lowest weight, and then proceeds to visit every edge, preferring lower weight edges whenever there is a choice of edge to be visited next. If the eulerian() function is called with the option weighted=FALSE, then weights are ignored.

eulerian(g, weighted=FALSE)
#> [1] "A" "B" "D" "E" "B" "C" "A"

In this case, the Eulerian path visits edges going next to the first available node in nodes(g).

Edge traversal of a complete graph

For complete graphs, PairViz includes some alternative constructions of Eulerian (or nearly Eulerian) paths.

Two other functions are

Edge traversal of a complete graph via Hamiltonian decompositions

A Hamiltonian path of a graph is a path that visits every node of a graph exactly once. For a general graph, determining whether such a path exists is difficult, and in fact is an NP-hard problem. For a complete graph of n nodes, the path 1,2,…,n is a Hamiltonian path.

Consider a complete graph with an odd number of nodes such as k5.

ew <- rep(1, length(edgeNames(k5)))
s1 <- c(1,5,9,10)
ew[s1]<- 5
ec <- rep("grey40", length(ew))
ec[s1]<- "cyan"
igplot(k5, edge.width=ew, edge.color=ec)

ew[]<- 1
s2 <- c(2,6:8)
ew[s2]<- 5
ec[] <- "grey40"
ec[s2]<- "magenta"
igplot(k5, edge.width=ew, edge.color=ec)

The blue hamiltonian path visits the nodes in order 1,2,3,5,4. The pink hamiltonian path visits the nodes in order 1,3,4,2,5. For a complete graph with an odd number of nodes such as k5, it is possible to construct an Eulerian tour by concatenating together a number of Hamiltonians. This construction is known as a Hamiltonian decomposition.

ew[]<- 5
ec[s1]<- "cyan"
s3 <- 3:4
ec[s3]<- "rosybrown1"
igplot(k5, edge.width=ew, edge.color=ec)
igplot(k5, edge.width=ew, edge.color=ec)

In the figure above, the light red edge from 4 to 1 connects the two hamiltonians, and the remaining light red edge from 5 to 1 connects the last node of the pink path to the first node from the blue path to complete the tour.

In PairViz, the two hamiltonian paths above are constructed by

hpaths(5)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    2    3    5    4
#> [2,]    1    3    4    2    5

If you want the eulerian composed of the two hamiltonians, use

hpaths(5, matrix=FALSE)
#>  [1] 1 2 3 5 4 1 3 4 2 5 1

The above constructions produce paths on complete graphs with an even number of nodes which are nearly eulerian. Look at

hpaths(6)
#>      [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,]    1    2    6    3    5    4
#> [2,]    2    3    1    4    6    5
#> [3,]    3    4    2    5    1    6
hpaths(6, matrix=FALSE)
#>  [1] 1 2 6 3 5 4 2 3 1 4 6 5 3 4 2 5 1 6

Each row of the result of hpaths produces a hamiltonian on k6 (a complete graph with 6 nodes). When the hamiltonians are concatenated, all edges on k6 are visited. With a close inspection, we see the edges connecting the rows (here 4-2 and 5-3) are visted twice. There is no need to connect the first node of row 1 to the last node of row 3, as this edge has already been visited.

Other isomorphic decompositions are obtained by supplying the first Hamiltonian as an argument to hpaths. Then all node labels are rearranged accordingly, as in the code snippet below.

hpaths(1:5)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    2    3    4    5
#> [2,]    1    3    5    2    4

Edge traversal of a weighted complete graph via hamiltonian decompositions

Consider the complete graph k7 whose edge weights are given by

set.seed(123)
k7 <- mk_complete_graph(7)
ew <- sample(numEdges(k7),numEdges(k7)) # a vector of edgeweights

The easiest way to assign weights to edges is to use a distance matrix:

d7 <- matrix(0,7,7)
d7[lower.tri(d7)] <- ew
d7[upper.tri(d7)]<-  t(d7)[upper.tri(d7)]
d7 
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,]    0    7   16    8   20   18    1
#> [2,]    7    0   19   13   15    6   11
#> [3,]   16   19    0    5   21   10   17
#> [4,]    8   13    5    0   12    2    9
#> [5,]   20   15   21   12    0    4   14
#> [6,]   18    6   10    2    4    0    3
#> [7,]    1   11   17    9   14    3    0
# or using the shortcut function edge2dist from PairViz
#d7 <- as.matrix(edge2dist(ew))

Now d7 is a symmetric matrix of distances, where the values in ew specify the distances in order 1-2, 1-3, …., 1-7, 2-7,…, 3-7, …,6-7.

k7 <- mk_complete_graph(d7)
igplot(k7, weights=TRUE,edge.label.color="black", vertex.label.cex=2,vertex.size=30)


# Unfortunately, plot.igraph does not show graph edge weights  automatically, you have to 
# input them as above. You might want to check that the igraph
# matches that of ew.
igraph::E(igraph::graph_from_graphnel(k7))
#> + 21/21 edges from f428096 (vertex names):
#>  [1] 1--2 1--3 1--4 1--5 1--6 1--7 2--3 2--4 2--5 2--6 2--7 3--4 3--5 3--6
#> [15] 3--7 4--5 4--6 4--7 5--6 5--7 6--7

Like hpaths(), the function weighted_hpaths() finds Hamiltonians which when concatenated visit all edges at least once. However weighted_hpaths() uses a greedy algorithm to make the first Hamiltonian low weight with weights increasing for successive Hamiltonians.

weighted_hpaths(d7)
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,]    1    7    6    5    4    3    2
#> [2,]    1    4    2    6    3    7    5
#> [3,]    1    6    4    7    2    5    3
# this version returns the eulerian
weighted_hpaths(d7, matrix=FALSE)
#>  [1] 1 7 6 5 4 3 2 1 4 2 6 3 7 5 1 6 4 7 2 5 3 1

We can easily calculate the edge weights and their sum for the first Hamiltonian.

o <- weighted_hpaths(d7, matrix=FALSE)
o1 <- o[1:8] # include the 8th to form the tour
d7e <- dist2edge(d7) 
# d7e is a vector giving edge weights in order (1,1)... (1,7), (2,3),.. (2,7) etc
h1weights <- path_weights(d7e, o1) # the edge weights for o1
# the same as
d7[cbind(head(o1,-1), o1[-1])]
#> [1]  1  3  4 12  5 19  7

h1weights
#> [1]  1  3  4 12  5 19  7
sum(h1weights)
#> [1] 51

The second and third Hamiltonians have edges whose sums are

o2 <- o[8:15]
sum(path_weights(d7e, o2))
#> [1] 88
o3 <- o[15:22]
sum(path_weights(d7e, o3))
#> [1] 92

The weighted_hpaths() function uses order_tsp(), a TSP solver (by default nearest insertion) to come up with the first Hamiltonian. This algorithm uses heuristics, and may not find the overall winner. As the graph has just 7 nodes, it is possible to use brute force evaluation to find the shortest hamiltonian tour.

order_best(d7,cycle=TRUE)
#> [1] 1 7 6 5 4 3 2
order_tsp(d7,cycle=TRUE)
#> [1] 1 7 6 5 4 3 2

In this example, we can confirm that order_tsp finds the shortest Hamiltonian tour. As order_best is computationally highly demanding, if you try it for graphs with more that maxexact (defaults to 9) nodes, only a sample of permutations is evaluated.

Comparisions of edge traversals on an unweighted complete graph

Consider the graph k7. At his stage, we have four algorithms for forming Eulerians which do not use weights. These are

e1 <- eseq(7)
e2 <- eseqa(7)
e3 <- eulerian(7) # same path as eulerian(k7, weighted=FALSE)
h1 <- hpaths(7, matrix=FALSE)

The plot of eseq(7) is coloured to show its recursive construction. eseq(3) is in tan, the additional edges added for eseq(5) are in blue, while those for eseq(7) are in pink.

The plot of eseqa(7) shows its construction; edges are a distance 1 (in tan), 2 (in blue) and 3 (in pink) apart repeatedly, considering the last node 7 and the first to be a distance 1 apart.

In eulerian(7), edges connecting to node 1 are in tan, edges from nodes 2 and 3 are in blue excepting those connecting node 1, while remaining edges involving nodes 4,5, 6 and 7 only are in pink. This display illustrates that the eulerian algorithm always moves to the lowest available node.

In the plot of hpaths(7) the three concatenated hamiltonians are coloured blue, pink and tan.

From these displays, if you want an eulerian visiting the low numbered nodes first, use eseq() or eulerian(). If you want an Eulerian where visits to a node are spread out, pick eseqa() or hpaths(). And of course, if the Hamiltonian property is important, then hpaths() is the best choice.

Comparisions of edge traversals on an weighted complete graph

We have two different Eulerians for an edge-weighted k7. These are

e4 <- eulerian(d7) # same path as eulerian(k7)
h2 <- weighted_hpaths(d7, matrix=FALSE)

which are plotted below. The colouring scheme used here is the same as that for the plots of eulerian(7) and hpaths(7).

Recall that the goal of our traversals of edge-weights graphs was to visit low-weight edges early in the path. To check this, we can find the edges weights for the paths using

d7e <- dist2edge(d7) 
path_weights(d7e, e4) # the edge weights for e4
#>  [1]  1  3  2  5 10  4 12  8  7  6 18 16 17  9 13 11 14 15 19 21 20
path_weights(d7e, h2) # the edge weights for h2
#>  [1]  4  2  5 17  1  7 15 12  9  3  6 19 16 20 14 11 13  8 18 10 21

Plotting these, we see that the edge weights for both e4 and h2 increase as the index increases. The path based on Hamiltonians is less successful at ordering the edge weights as it must visit a sequence of Hamiltonians. The edge weights for the first Hamiltonian are in blue, for the second in magenta and the third in tan.

References

C.B. Hurley and R.W. Oldford, Pairwise display of high dimensional information via Eulerian tours and Hamiltonian decompositions. Journal of Computational and Graphical Statistics. 19(10), pp. 861–886, 2010.

C.B. Hurley and R.W. Oldford, Eulerian tour algorithms for data visualization and the PairViz package. Computational Statistics, 26(4), pp 613–633, 2011.