Open sparsity.m in the Editor
Run in the Command Window

Sparse Matrices

This demonstration shows that reordering the rows and columns of a sparse matrix S can affect the time and storage required for a matrix operation such as factoring S into its Cholesky decomposition, S=L*L'.

Contents

Visualizing a Sparse Matrix

A SPY plot shows the nonzero elements in a matrix.

This spy plot shows a SPARSE symmetric positive definite matrix derived from a portion of the Harwell-Boeing test matrix "west0479", a matrix describing connections in a model of a diffraction column in a chemical plant.

load('west0479.mat')
A = west0479;
S = A * A' + speye(size(A));
pct = 100 / prod(size(A));

clf; spy(S), title('A Sparse Symmetric Matrix')
nz = nnz(S);
xlabel(sprintf('nonzeros=%d (%.3f%%)',nz,nz*pct));

Computing the Cholesky Factor

Now we compute the Cholesky factor L, where S=L*L'. Notice that L contains MANY more nonzero elements than the unfactored S, because the computation of the Cholesky factorization creates "fill-in" nonzeros. This slows down the algorithm and increases storage cost.

tic, L = chol(S)'; t(1) = toc;
spy(L), title('Cholesky decomposition of S')
nc(1) = nnz(L);
xlabel(sprintf('nonzeros=%d (%.2f%%)   time=%.2f sec',nc(1),nc(1)*pct,t(1)));

Reordering to Speed Up the Calculation

By reordering the rows and columns of a matrix, it may be possible to reduce the amount of fill-in created by factorization, thereby reducing time and storage cost.

We will now try three different orderings supported by MATLAB®.

Using the Reverse Cuthill-McKee

The SYMRCM command uses the reverse Cuthill-McKee reordering algorithm to move all nonzero elements closer to the diagonal, reducing the "bandwidth" of the original matrix.

p = symrcm(S);
spy(S(p,p)), title('S(p,p) after Cuthill-McKee ordering')
nz = nnz(S);
xlabel(sprintf('nonzeros=%d (%.3f%%)',nz,nz*pct));

The fill-in produced by Cholesky factorization is confined to the band, so that factorization of the reordered matrix takes less time and less storage.

tic, L = chol(S(p,p))'; t(2) = toc;
spy(L), title('chol(S(p,p)) after Cuthill-McKee ordering')
nc(2) = nnz(L);
xlabel(sprintf('nonzeros=%d (%.2f%%)   time=%.2f sec', nc(2),nc(2)*pct,t(2)));

Using Column Count

The COLPERM command uses the column count reordering algorithm to move rows and columns with higher nonzero count towards the end of the matrix.

q = colperm(S);
spy(S(q,q)), title('S(q,q) after column count ordering')
nz = nnz(S);
xlabel(sprintf('nonzeros=%d (%.3f%%)',nz,nz*pct));

For this example, the column count ordering happens to reduce the time and storage for Cholesky factorization, but this behavior cannot be expected in general.

tic, L = chol(S(q,q))'; t(3) = toc;
spy(L), title('chol(S(q,q)) after column count ordering')
nc(3) = nnz(L);
xlabel(sprintf('nonzeros=%d (%.2f%%)   time=%.2f sec',nc(3),nc(3)*pct,t(3)));

Using Minimum Degree

The SYMAMD command uses the approximate minimum degree algorithm (a powerful graph-theoretic technique) to produce large blocks of zeros in the matrix.

r = symamd(S);
spy(S(r,r)), title('S(r,r) after minimum degree ordering')
nz = nnz(S);
xlabel(sprintf('nonzeros=%d (%.3f%%)',nz,nz*pct));

The blocks of zeros produced by the minimum degree algorithm are preserved during the Cholesky factorization. This can significantly reduce time and storage costs.

tic, L = chol(S(r,r))'; t(4) = toc;
spy(L), title('chol(S(r,r)) after minimum degree ordering')
nc(4) = nnz(L);
xlabel(sprintf('nonzeros=%d (%.2f%%)   time=%.2f sec',nc(4),nc(4)*pct,t(4)));

Summarizing the Results

labels={'original','Cuthill-McKee','column count','min degree'};

subplot(2,1,1)
bar(nc*pct)
title('Nonzeros after Cholesky factorization')
ylabel('Percent');
set(gca,'xticklabel',labels)

subplot(2,1,2)
bar(t)
title('Time to complete Cholesky factorization')
ylabel('Seconds');
set(gca,'xticklabel',labels)