StatsCosmos: January 2016

There is something exciting about mining a set of interesting and useful results from a large complicated variance-covariance matrix, especially when they pertain to the global internet user population.

For an example how can one quickly identify, analyze, interpret and monitor trends embedded in a large dynamically evolving global dataset? The follow up question is how can all this be done in a systematic way that is meaningful at the global, regional and local level?

That is precisely why I find variance-covariance matrix decompositions so exciting. Firstly, they make use of matrix algebra to make the decomposition calculations more efficient. Secondly, when chosen based on their dimension reduction properties, they can translate this efficiency to the data analysis and interpretation. These five decompositions bring these two useful properties to the global population internet user analyses in an elegant manner.

Variance-covariance matrix decomposition one and two: Spectral decomposition and Singular Value Decomposition

A good starting point for variance-covariance decompositions is spectral decomposition for square-matrices. A quick summary of square-matrix Spectral Decomposition (SD) can be obtained from Madsen,Hansen and Winther (2004).

The method can be summarized as follows:

A real symmetric n×n matrix B has a spectral decomposition that can be expressed as:

where U is an orthonormal matrix and Λ is a diagonal matrix. An orthonormal matrix has the property:

The columns of U are eigenvectors of matrix B and the diagonal elements of Λ are the eigenvalues of matrix B. If B is positive-definite then the eigenvalues will all be positive.

The Spectral Decomposition allows one to build a Singular Value Decomposition (SVD) of a rectangular matrix. A quick summary of SVD for rectangular matrices can be obtained from Madsen, Hansen and Winther (2004). This can further be supplemented with another paper, Kittaneh andShebrawi (2005), which gives a good treatment of SVD, QR Decomposition and Polar Decomposition.

In singular value decomposition, a real m×q matrix D,where m≥q, has the decomposition:

where U is a m×q matrix with orthonormal columns (U^TU = I) while V is a q×q matrix with orthonormal columns (V^TV = I) and Γ is a q×q diagonal matrix with positive or zero elements, called singular values.

In our analysis spectral decomposition and SVD are considered together because a SVD was applied on the square spatial time series variance-covariance matrix. In our list analyses the matrix is square and thus the SD and SVD yield the same results.

The data used in the present analysis was obtained from the Internet World Stats database which can be accessed from the Internet World Stats website. An important website for internet population information is the Internet Society website.

The Internet World Stats database has a variety of views that make time-indexed spatial population information easily accessible for analysis. The time-indexed spatial information includes population, population internet use and recently population Facebook use (in 2012) aggregates. The website also provides links to the sources of the published data.

The analyses presented in this post make use of two country level tables for the time period 2008 to 2014. The first table is composed of population aggregates indexed by country and time. The second table is composed of internet user population aggregates indexed by country and time.

The first table has dimensions 246 (spatial regions/conceptual countries) by 7 (time period 2008 to 2014). The second table has dimensions 243 (spatial regions/conceptual countries) by 7 (time period 2008 to 2014). The different tables apply to the same total population, namely the global population, but differ in that some of the original countries in the Internet World Stats database spatial classification were aggregated to the resulting spatial regions or conceptual countries.

The available data allow for the decomposition(s) to be implemented on global population information and global internet user population information that accompanies the global population information.

The first step of the data analysis involves the construction of the spatial time series variance-covariance matrix to be fitted to the population matrix and the internet user population matrix data. The variance-covariance spatial time series model fitted to the data was that of variance-covariances that are separable in space and in time. The construction of a spatial time series separable variance-covariance matrix to fit to the data can be that outlined in Hirano (2014).

The variance–covariance matrices for the population and internet user population matrices can be calculated (fitted) using the formula:

where Σ_Sis the spatial variance-covariance matrix of the data, Σ_T the temporal variance-covariance matrix of the data and ⊗ is the Kronecker product operator.

In the case of the global population data, Σ_Sis a 246×246 covariance matrix of spatial regions and Σ_Tis a 7×7 covariance matrix of time periods. Σ is then a (246×7)×(246×7) or (1722×1722) matrix of covariances that are separable in time and space.

In the case of the global internet user population data, Σ_Sis a 243×243 covariance matrix of spatial regions and Σ_Tis a 7×7 covariance matrix of time periods. Σ is then a (243×7)×(243×7) or (1701×1701) matrix of covariances that are separable in time and space.

Global population spatial time series

The next step involves decomposing the global population spatial time series variance-covariance matrix fitted to the data (for 246 conceptual countries and time periods 2008 to 2014). The R (version 3.2.3) svd3dplot function from the Svdvisual package can be run for the fit and it will yield the following results for the SVD decomposition method.

The first two left singular vectors (U matrix) can then be used to create Cartesian plane coordinates of the variance-covariance matrix values in the left singular vector orthogonal basis space (i.e. combining the data for all the years).

The general description of the results is that the variance-covariance of the countries can be divided into three clusters. The first singular vector could pertain to the variance-covariance size and the second to a structural change affecting the variance-covariance. The small cluster on the bottom left could pertain to small countries with small variance. The small cluster on the top could pertain to countries that have a mid-size variance but whose variance values are affected by some structural change in the population. The large cluster could pertain to the remaining countries (i.e. countries with mid-size to large variance values).

The plot of the left singular vectors corresponds to the svd3dplot output of values that have extremely large SVD matrix products in the SVD one, data and data approximation visualizations in the first SVD plot (from the svd3dplot above).

The next step involved extracting the singular values from the singular value diagonal matrix Γ and calculating the variance explained by each singular value. The first singular value explains 96.99% of the variance and the second singular value 2.93%.

Global internet user population spatial time series

The procedures using the svd3dplot function and svd function can be run analogously for the 243 conceptual countries for the global internet user population. The plots yield similar features to that of the global population visualization.

The results for the variance explained by the first two singular values is virtually identical to the global population case in that the first singular value explains 96.99% of the variance and the second singular value 2.93%.

Variance-covariance matrix decomposition three: QR Decomposition

The QR decomposition (or factorization) of a square matrix is a decomposition of the matrix into an orthogonal matrix and a triangular matrix. A QR decomposition of a real square matrix B can be expressed as B = QR where Q is an orthogonal matrix (Q^TQ = I) and R is an upper triangular matrix. If B is non-singular, then the factorization is unique.

Global population spatial time series

Two QR decompositions of the variance-covariance matrix can be run in R. It is possible to run a decomposition that uses the LAPACK option in the R qr function from the R base package and one that does not. The LAPACK option generates a QR decomposition that has the full rank of the original matrix.

The two decompositions can be visualized using the first two orthogonal basis vectors (i.e. columns of matrix Q). The first two orthogonal basis vectors (column) can be used to create Cartesian plane coordinates of the variance-covariance values (i.e. combining the data for all the years). The QR decomposition without the LAPACK is slightly different from the SVD singular vector basis visualization.

The corresponding orthogonal basis matrix visualization from the QR decomposition that uses the LAPACK function yields a very similar visualization to that of the SVD visualization.

Global internet user population spatial time series

The two QR decompositions of the global internet user population variance-covariance matrix can be run analogously to that of the global population. The QR decomposition without using the LAPACK, as in the global population case, is slightly different from the SVD singular vector basis visualization.

The orthogonal basis matrix visualization of the variance-covariance that uses the LAPACK function also yields a different visualization to that of the SVD visualization. Hence, in the global internet population case both the LAPACK and non-LAPACK QR decompositions yield slightly different looking visualizations to the SVD visualization. The visualizations, however, seem to only differ in the orientation.

Variance-covariance matrix decomposition four: Polar decomposition

A source of excellent materials for Polar decomposition can be obtained from Shoemake and Duff (1992), and Higham (1986). Essentially, a polar decomposition for a square matrix B = QS, yields an orthogonal factor Q and a symmetric positive definite factor S.

The polar decomposition of the variance-covariance can be run using the R PolarDecomp function from the R geophys package.

Global population spatial time series

The first two orthogonal basis vectors (column) can be used to create Cartesian plane coordinates of the observation values (i.e. combining the data for all the years).

The plot corresponds with the property that if the matrix B being decomposed has positive rank then the orthogonal basis will be a pure rotation otherwise a rotation with a reflection.

The visualization of the scaling factors required to transform the orthogonal basis polar decomposition (visualization) of the variance-covariance matrix to the original variance-covariance matrix.

The two plots essentially provide a view of the variance-covariance matrix according to a reduced orthogonal basis and a supporting scaling measure. The scaling measure is what needs to be applied, in matrix scaling or stretch terms, to the resulting orthogonal basis representation to recover the original matrix.

Global internet user population spatial time series

The procedure was run for the global internet user population variance-covariance matrix in an analogous manner to that of the global population.

The plot also, as in the global population, corresponds with the property that if the matrix B being decomposed has positive rank then the orthogonal basis will be a pure rotation otherwise a rotation with a reflection.

The scaling factor visualization yielded a more ellipsoid looking pattern than in the case of global population visualization. The pattern is, however, commensurate with the different looking orthogonal visualization of the variance-covariance matrix.

The variety of patterns in the different orthogonal bases (SVD global population, SVD global internet user population, QR global population, QR global internet user population, Polar global population and Polar global internet user population) provide empirical material for further analyses and comparisons of the decompositions.

Variance-covariance matrix decomposition five: Spectral representation of a vector stationary process

A good starting point for spectral density estimation are the basic definitions from Stoica and Moses (2005). These can be supplemented with material from Shumway and Stoffer (2011), and Bloomfield (2000).

In spectral density estimation, one begins with a finite record of a signal. The aim is to determine the distribution of the signal power over the frequency.

Define:

Then, if:

then

exists and is called the Discrete-Time Fourier Transform (DTFT).

Using Parseval’s Equality define:

where

is the Energy Spectral density.

Then it is possible to write

where

is the average power in y(t).

Average power spectral density

Average Power Spectral Density (First Definition)

where r(k) is the autocovariance sequence (ACS).

where the * operator denotes the complex conjugate of a scalar or the conjugate transpose of a vector or matrix.

Hence:

and also

is the inverse DTFT.

Average Power Spectral Density (Second definition)

which is the finite DTFT of {y(t)}.

The definitions allow for the specification of the spectral density estimation problem.

Essentially, we begin with a sample {y(1),...,y(N)} and we need to find an estimate of the Average Power Spectral Density, then:

Two main approaches:

Nonparametric
Parametric

The two key approaches explored in the analysis are the periodogram and correlogram methods.

Periodogram

Correlogram

If the biased average power spectral density estimator

is used in the correlogram estimate

then:

This implies that

and

can be analyzed simultaneously.

Both measures are asymptotically unbiased for large N:

But both have a large variance (even for large N) and thus poor performance.

One of the approaches explored to cure this property has been to develop improved periodogram-based methods. These include the Blackman-Tukey method, Bartlett method, Welch method and Daniell method.

Improved Periodogram-based estimation method of Daniell

The idea then is to locally average (2J+1) samples in the frequency domain to reduce the variable by about (2J+1).

As J increases:

The bias increases because of the increased smoothing
The variance decreases because of the averaging

It is also possible to show that the Daniell periodogram estimate is approximately equal to the Blackman-Tukey periodogram estimate with a rectangular spectral window. Thus:

where

is the Blackman-Tukey improved periodogram-based estimate. The modified Daniell method puts half weights at the end points of the Daniell method.

Cross-spectrum

If {x(t)} and{y(t)}are jointly stationary and

then the variance-covariance function

has the representation (inverse Fourier Transform of the cross-spectrum)

where

is the Fourier transform of the autocovariance function.

Then the empirical squared coherency function is defined by:

and the empirical phase ϕ(ω) is defined by:

The cross-spectrum and autocovariances can be represented in matrix form as the spectral representation of a vector stationary process.

Hence starting with the p×p autocovariance function matrix:

of a p-dimensional vector stationary time series x = (x_1t,...,x_pt)' then one may use a vector of the DFTs, d(ω_j)= (d₁(ω_j),...,d_p(ω_j))', and estimate the spectral matrix by:

where I(ω_j)= d(ω_j)d*(ω_j) is a p×p complex matrix, d*(ω_j) =Modulus(d(ω_j)') is the conjugate transpose operation, L=2m+1 and m is a spectral density smoothing parameter(i.e. parameter for weighted average).

Again, as in the univariate case, the series can be smoothed before the Discrete Fourier transform is applied and one can use weighted estimation:

where the h_k's are smoothing weights such that

The spectral matrix estimates

and

can be used to generate estimates for the squared coherency function and the phase.

Global population spatial time series

Raw periodogram

The spectral density of the data can be estimated using the spec.pgram function from the R graphics package. The spectrum function and its related plots can be explored graphically using the plot.spec function from the graphics package.

Each line corresponds to the time series of each country. The frequencies shown are between 0 and 0.5. In interpreting the data a frequency of 1 corresponds to a series that makes one cycle per time unit, and 0.5 two, and so on. Also for discrete data at least two time points are required to determine a cycle. This means that the highest frequency of interest is 0.5. The 0.5 frequency is called the folding frequency and defines the highest frequency that can be seen from discretely sampled data. The higher frequencies will appear in the lower frequencies, called aliases. The periodogram also has the property that it has a mirroring effect at the folding frequency of 0.5, and so frequencies higher than 0.5 are not plotted.

It is useful to analyze the periodogram of the dis-aggregated data. In the next plot the periodograms of five countries with the largest populations is shown. Essentially, a (significant) peak in the periodogram of, say, 0.25 will correspond to a significant periodicity of ¹⁄_0.25= 4 time periods (or years) in the series. The peaks in the other frequencies can be interpreted analogously.

Cross Periodogram

Squared coherency

The squared coherency between the country data is strong. Essentially, the squared coherency takes on values between 0 and 1, with 0 indicating no dependence and 1 indicating exact linear dependence at the frequency,ω.

Phase

The phase spectra are generally difficult to interpret unless one makes the simplifying assumption of linear dependence. The assumption is reasonable in this case because the squared coherency indicates linear dependence. The interpretation is further made difficult by the context and also the length of the series. The series is only seven years which leads to very wide confidence intervals. The general approach to interpretation can, however, be illustrated. The approach is likely to lead to better quality or more relevant interpretations when using the smoothed results from the Daniell and modified Daniell smoothers below.

The sign of the phase at zero indicates/approximates the nature of the relationship. Basically a negative sign suggests a negative relationship and a positive sign suggests a positive relationship.

The sign of slope of a line that fits the phase indicates/approximates the nature of the lead/lag relationship. Essentially, a positive slope suggests that the series leads and a negative slope suggests that the series lags.

The size or absolute value of the slope indicates/approximates the size of the lead/lag. For an example a line through the origin and -1 at 0.2 cycles would indicate a lag/lead of 0.2 years (2.4 months).

For example for series two and series four in the plot. The value of roughly -3 at the origin would indicate a negative relationship. The progression of the line to 0 at 0.4 cycles would indicate a slope of 7.5 (positive). The slope of 7.5 at 0.4 cycles would indicate that series two leads series four by 3 years. Hence, high values of series two are associated with a decrease in series four three years later, and conversely.

This is, however, difficult to interpret as we are dealing with population values. It is thus difficult to interpret a negative relationship between the population values of countries (although not impossible). A better approach might be to use country population value weights/proportions to global population changes rather than population aggregates. The interpretation is, however, applicable to the global population internet user population because negative changes to the aggregates have a natural interpretation (i.e. a user can stop using the internet).

Smoothed periodogram (Daniell method)

The plot shows that the smoothing has removed the peaks in most of the series. This is reasonable in that the series only relate to seven years and a concrete interpretation of the results is risky. The best approach in the present situation is to add more data points before trying to interpret the results in a concrete manner. The next best approach is to interpret each of the raw periodogram peaks keeping in mind that the smoothed values are more accurate (i.e. have less error). This essentially means that one could interpret the raw periodogram estimates as guides or pointers to prospective significant periodicities if more data points had to be added.

The plot shows that all the small peaks in the raw periodogram of the five countries with the most people are also be smoothed.

Cross Periodogram (Daniell method)

Squared coherency (Daniell method)

The smoothed cross periodogram still has very wide confidence intervals which indicates that an overly concrete interpretation of the results is risky. In this light, as with the periodogram, an approach of looking to identify prospective significant relationships might be a better approach until more time data points are added.

Phase (Daniell method)

Smoothed periodogram (Modified Daniell method)

The plot shows that the observations about the smoothing of the periodogram peaks in the Daniell smoothed estimate also apply for the modified Daniell smoother.

Cross Periodogram (Modified Daniell method)

Squared coherency (Modified Daniell method)

Phase (Modified Daniell method)

Global internet user population spatial time series

Raw periodogram

The spectrum density of the data can also be generated analogously to the global population data.

The plots of the periodograms and the cross periodograms provide an analogous interpretation for the global internet user population, namely, more data points or a cautious interpretation with a prospective feel.

Cross Periodogram

Squared coherency

Phase

The interpretation of the phase for the global internet user population has more of a natural meaning than in the case of the global population. The setting with the cross periodogram, namely, the squared coherency and the phase are the same for the series two and four as in the case of the global population. The alternative recommended interpretation of the phase and squared coherency in the context of global population (using country weights/proportions to the global population) is equally applicable for the country internet user population.

Smoothed periodogram (Daniell method)

Cross-periodogram (Daniell method)

Squared coherency (Daniell method)

Phase (Daniell method)

Smoothed periodogram (Modified Daniell method)

Cross-periodogram (Modified Daniell method)

Squared coherency (Modified Daniell method)

Phase (Modified Daniell method)

The five matrix decompositions provide a wealth of information on the global internet user population. The visualizations of the data highlight the usefulness of the two features of efficiency in computation and dimension reduction in spatial time series variance-covariance matrix decompositions. The analyses are naturally also coherent at different levels of data aggregation. This is particularly useful when one is looking to program the procedures into a software package.

This framework is designed to receive source data updates at least once a month for the input data matrices. A more reasonable time frame for data updates, however, is a year. The framework can also be extended to generate more detailed population, social media and mobile technology estimates by incorporating high quality auxiliary information. The extended framework presently has a source for social media population and internet user technology estimation annual parameters that allows for updates three times a year.

The framework can presently allow for the generation of spatial time series forecasts and predictions over a prospective period of approximately three to five years. The forecasts module will additionally evolve to produce more accurate forecasts (over a longer time horizon) as more years of data are assimilated. The data can be assimilated back in time (using available pre-2008 data) and also prospectively (2015 onward).

The data updates, framework extensions and parameter updates will make for more informative variance-covariance matrix decomposition visualizations. This in turn will enhance the richness of information available to identify, analyze, interpret and monitor important trends in the global population and global internet user population spatial time series.

In this list we explored five spatial time series variance-covariance matrix decompositions. These were Spectral decomposition, Singular Value decomposition, QR decomposition, Polar decomposition and Vector stationary process spectral representations. In the case of each of the decompositions we were able to get an idea of the kind of information that can be generated to identify, analyze, interpret and monitor the trends in the populations.

Can you identify a better analysis framework that has better performance? Alternatively, is there one or two other decompositions that you can think of that can enhance the framework? Please let us know in the comments.

Check out our other blog posts and screencast series