Bayesian methods for complex statistical applications

$ \newcommand{\red}{\textcolor{red}} \newcommand{\blue}{\textcolor{blue}} \newcommand{\zerob} {{\bf 0}} \newcommand{\oneb} {{\bf 1}} \newcommand{\expect} {{\mathbb{E}}} \newcommand{\pt} {\tilde{p}} \newcommand{\alphat} {\tilde{\alpha}} \newcommand{\betat} {\tilde{\beta}} \newcommand{\pb} {\bar{p}} \newcommand{\thetab} {{\boldsymbol{\theta}}} \newcommand{\alphab} {{\boldsymbol{\alpha}}} \newcommand{\kappab} {{\boldsymbol{\kappa}}} \newcommand{\sigmab} {{\boldsymbol{\sigma}}} \newcommand{\nub} {{\boldsymbol{\nub}}} \newcommand{\gammab} {{\boldsymbol{\gamma}}} \newcommand{\deltab} {{\boldsymbol{\delta}}} \newcommand{\Deltab} {{\boldsymbol{\Delta}}} \newcommand{\etab} {{\boldsymbol{\eta}}} \newcommand{\Thetab} {{\boldsymbol{\Theta}}} \newcommand{\varthetab} {{\boldsymbol{\vartheta}}} \newcommand{\Yset} {\mathcal{Y}} \newcommand{\Xset} {\mathcal{X}} \newcommand{\intd} {\textrm{d}} \newcommand{\phib} {\boldsymbol{\phi}} \newcommand{\zetab} {\boldsymbol{\zeta}} \newcommand{\psib} {\boldsymbol{\psi}} \newcommand{\Sigmawinv} {{\boldsymbol{\it \Sigma}}_w^{-1}} \newcommand{\Sigmamat} {{\bm \Sigma}} \newcommand{\Sigmamatt} {\widetilde{\boldsymbol{\Sigma}}} \newcommand{\Qmatt} {\widetilde{\textbf{Q}}} \newcommand{\muvect} {\widetilde{\boldsymbol{\mu}}} \newcommand{\Psib} {{\bm \Psi}} \newcommand{\Upsilonmat} {{\boldsymbol{\it \Upsilon}}} \newcommand{\Lambdamat} {\mathbf{\Lambda}} \newcommand{\Gammamat} {{\boldsymbol{\it \Gamma}}} \renewcommand{\Gammamat} {{\boldsymbol{\Gamma}}} \newcommand{\Pimat} {{\boldsymbol{\it \Pi}}} \newcommand{\Amat} {\textbf{A}} \newcommand{\Bmat} {\textbf{B}} \newcommand{\Dmat} {\textbf{D}} \newcommand{\Gmat} {\textbf{G}} \newcommand{\Lmat} {\textbf{L}} \newcommand{\Qmat} {\textbf{Q}} \newcommand{\Rmat} {\textbf{R}} \newcommand{\Tmat} {\textbf{T}} \newcommand{\Qt} {\widetilde{\textbf{Q}}} \newcommand{\Qtinv} {\widetilde{\textbf{Q}}^{-1}} \newcommand{\Mmat} {\textbf{M}} \newcommand{\Cmat} {\mathbf{C}} \newcommand{\Jmat} {\mathbf{J}} \newcommand{\cmat} {\textbf{c}} \newcommand{\Kmat} {\textbf{K}} \newcommand{\Zmat} {\textbf{Z}} \newcommand{\Xmat} {\textbf{X}} \newcommand{\Xvec} {\mathbf{X}} \newcommand{\Imat} {\textbf{I}} \newcommand{\Umat} {\textbf{U}} \newcommand{\Pmat} {\textbf{P}} \newcommand{\Hmat} {\textbf{H}} \newcommand{\Vmat} {\textbf{V}} \newcommand{\bvec} {\textbf{b}} \newcommand{\dvec} {\textbf{d}} \newcommand{\avec} {\textbf{a}} \newcommand{\evec} {\textbf{e}} \newcommand{\hvec} {\textbf{h}} \newcommand{\xvec} {\textbf{x}} \newcommand{\yvec} {\textbf{y}} \newcommand{\zvec} {\textbf{z}} \newcommand{\wvec} {\textbf{w}} \newcommand{\vvec} {\textbf{v}} \newcommand{\svec} {\textbf{s}} \newcommand{\uvec} {\textbf{u}} \newcommand{\gvec} {\textbf{g}} \newcommand{\fvec} {\textbf{f}} \newcommand{\rvec} {\textbf{r}} \newcommand{\muvec} {\boldsymbol{\mu}} \newcommand{\Psix} {{\boldsymbol{\it \Psi}}_{\xvec}} \newcommand{\Phimat} {{\boldsymbol{\it \Phi}}} \newcommand{\Psitheta} {{\boldsymbol{\it \Psi}}_{\varthetab}} \newcommand{\Psia} {{\boldsymbol{\it \Psi}}_{A}} \newcommand{\Psixinv} {{\boldsymbol{\it \Psi}}_{\xvec}^{-1}} \newcommand{\vvm} {\boldsymbol {\mathcal \upsilon}} \newcommand{\upsilonb} {\boldsymbol {\upsilon}} \newcommand{\betab} {\boldsymbol {\beta}} \newcommand{\omegab} {\boldsymbol {\omega}} \newcommand{\Aop}{\boldsymbol{\mathcal{A}}} \newcommand{\ICE} {\textit{ICE}} \newcommand{\GIA} {\textit{GIA}} \newcommand{\GPS} {\textit{GPS}} \newcommand{\ERS} {\textit{ERS}} \newcommand{\GR} {\textit{GR}} \newcommand{\IS} {\textit{IS}} \newcommand{\ES} {\textit{ES}} \newcommand{\zeroes}{\mathop{\textrm{zeroes}}} \newcommand{\odd}{\mathop{\textrm{odd}}} \newcommand{\even}{\mathop{\textrm{even}}} \newcommand{\ff} {\textit{ff}} \newcommand{\fm} {\textit{fm}} \newcommand{\mf} {\textit{mf}} \newcommand{\inv} {\textit{inv}} \renewcommand{\zerob}{\mathbf{0}} \renewcommand{\v}{\mathbf{v}} \renewcommand{\u}{\mathbf{u}} \newcommand{\w}{\mathbf{w}} \renewcommand{\d}{\mathrm{d}} \newcommand{\Z}{\mathbf{Z}} \newcommand{\X}{\mathbf{X}} \newcommand{\x}{\mathbf{x}} \newcommand{\Y}{\mathbf{Y}} \newcommand{\Yvec}{\mathbf{Y}} \newcommand{\Yt}{\widetilde{\mathbf{Y}}} \newcommand{\Zvec}{\mathbf{Z}} \newcommand{\epsilonb}{\mbox{\boldmath{$\varepsilon$}}} \newcommand{\epsilonb}{\boldsymbol{\varepsilon}} \newcommand{\bS}{\mathbf{S}} \newcommand{\bK}{\mathbf{K}} \newcommand{\bI}{\mathbf{I}} \newcommand{\bR}{\mathbf{R}} \newcommand{\bC}{\mathbf{C}} \newcommand{\bB}{\mathbf{B}} \newcommand{\bP}{\mathbf{P}} \newcommand{\bQ}{\mathbf{Q}} \renewcommand{\L}{\mathbf{L}} \newcommand{\E}{\mathrm{E}} \newcommand{\cov}{\mathrm{cov}} \newcommand{\var}{\mathrm{var}} \newcommand{\Dist}{\mathrm{Dist}} \renewcommand{\prec}{\mathrm{prec}} \newcommand{\tr}{\mathrm{tr}} \newcommand{\diag}{\mathrm{diag}} \newcommand{\sgn}{\mathrm{sgn}} \newcommand{\trace}{\mathrm{tr}} \newcommand{\vect}{\mathrm{vec}} \newcommand{\Gau}{\mathrm{Gau}} \newcommand{\RR}{\mathbb{R}} \newcommand{\s}{\mathbf{s}} \newcommand{\p}{\mathbf{p}} \renewcommand{\a}{\mathbf{a}} \newcommand{\h}{\mathbf{h}} \renewcommand{\b}{\mathbf{b}} \renewcommand{\c}{\mathbf{c}} \newcommand{\z}{\mathbf{z}} \newcommand{\blambda}{\boldsymbol{\lambda}} \newcommand{\btheta}{\boldsymbol{\theta}} \newcommand{\balpha}{\boldsymbol{\alpha}} \newcommand{\bgamma}{\boldsymbol{\gamma}} \newcommand{\bbeta}{\boldsymbol{\beta}} \newcommand{\bzero}{\boldsymbol{0}} \newcommand{\bmu}{\boldsymbol{\mu}} \newcommand{\bSigma}{\bm{\Sigma}} \newcommand{\s}{\mathbf{s}} $

Motivation

We wish to develop flexible, scalable, exact and approximate, Bayesian posterior simulation methods, such as Markov chain Monte Carlo (MCMC), sequential Monte Carlo (SMC), particle Markov chain Monte Carlo (PMCMC), variational approximation (VA), and approximate Bayesian methods to solve real-world problems in areas, such as economic inequality and poverty measurement, health, economics, cognitive psychology, and finance. Note that many of the Bayesian approaches developed in this research can be useful in solving environmental problems.

Statistical Models and Applications

Economic Inequality and Poverty Measurements

Figure 1: Estimated Income Densities in Australia for the years 2001, 2006, 2010, 2014, and 2017. The units are hundreds of dollars.

In this research, we develop a range of Bayesian approaches for measuring inequality and welfare, where we make comparisons over time, over different populations and before and after the introduction of new policies such as taxes.

Statistical tests of dominance regularly appear in the economics and finance literature. They employ statistical methods to compare the distributions of two random variables (or one random variable at two points in time) in order to determine if one "dominates" another. Lander et al. (2020) develop a Bayesian approach for assessing Lorenz and stochastic dominance between two income distributions $X$ and $Y$.

Our approach begins with flexible specifications for income distributions $X$ and $Y$, such as finite or infinite mixtures of gamma densities (see Wiper et al., 2001 and Sethuraman, 1994, for further details). Figure 1 shows the estimated income densities in Australia for the years 2001, 2006, 2010, 2014, and 2017. The densities are all bimodal with a sharp peak between $10,000-$20,000 and a lower peak in the range $20,000-$40,000. Prior to 2010, there have been clear shifts to the right, but the distributions for 2010, 2014, and 2017 are similar, particularly those in 2014 and 2017. Once the income distributions have been estimated, we can then apply the Bayesian dominance approach in Lander et al. (2020) to compare pairs of income distributions.

It is important to note that income alone may not be sufficient to characterise the level of well-being of a person in a society. The need to understand welfare and inequality beyond the dimension of income has resulted in a significant increase in research on non-income well-being distributions. Gunawan et al. (2018b) proposed a Bayesian approach to compare two ordinal categorical distributions commonly occurring with data on non-income well-being distributions, such as self-reported health status, life satisfaction, education, and other variables. They show how to compute probabilities for inequality and stochastic dominance along with methodology for obtaining posterior densities for an inequality index for ordinal health data. Figure 2 shows that inequality is similar in 2005, 2008, and 2010, but greater in 2002.

Figure 2: Posterior densities of the Abul-Naga Inequality Index for self-reported health status data in Australia

The studies mentioned above are only based on independent analysis of individual well-being attributes. They do not explicitly represent a multidimensional approach as they do not allow for possible degrees of correlation between different well-being attributes. To understand the multidimensional nature of poverty and inequality, research on the joint distribution of several measures of well-being, such as income, education, health status, housing conditions, and happiness, must be conducted in a way that allows for statistical dependence of the measures. Future research will extend the framework to multivariate frameworks.

Time Series State Space Models

Figure 3: Sequential one-step ahead predictive density estimates for the equally weighted portfolio of US industry stock return from 25/10/2005 to 29/11/2005.

 

In this research, we consider Bayesian inference over parameters and the latent states in non-linear, non-Gaussian, state space models. In particular, we use a multivariate factor stochastic volatility model, which is an example of a high-dimensional state space model having both a large number of parameters and a large number of latent states. In general, joint inference over both the parameters and the unobserved latent states can be very challenging because the likelihood involves calculating an integral over a very high-dimensional latent state space.

Current approaches to estimate the multivariate stochastic volatility model use MCMC samplers (e.g. Chib et al., 2006; and Kastner et al., 2017), are neither exact nor flexible. Most of the applications of the multivariate factor stochastic volatility (SV) models are in the area of financial econometrics (Zhou et al., 2014). We now describe the multivariate SV model.

Suppose that $\boldsymbol{P}_{t}$ is an $S$-dimensional vector of daily stock prices, and define $\mathbf{y}_{t}\equiv\log\boldsymbol{P}_{t}-\log\boldsymbol{P}_{t-1}$, which is the log-return of the stocks. We model $\boldsymbol{y}_{t}$ as a multivariate factor SV model \[ \boldsymbol{y}_{t}=\boldsymbol{\beta}\boldsymbol{f}_{t}+\boldsymbol{V}_{t}^{\frac{1}{2}}\boldsymbol{\epsilon}_{t},\left(t=1,...,T\right), \] where $\boldsymbol{f}_{t}$ is an $K$-dimensional vector of latent factors (with $K\ll S$), and $\boldsymbol{\beta}$ is an $S\times K$ factor loading matrix of unknown parameters. We model the latent factors as, $\boldsymbol{f}_{t}\sim N\left(0,\boldsymbol{D}_{t}\right)$, and $\boldsymbol{\epsilon}_{t}\sim N\left(0,I\right)$. The time-varying variance matrices, $\boldsymbol{V}_{t}$ and $\boldsymbol{D}_{t}$, depend on unobserved log-volatilities $\boldsymbol{h}_{t}=\left(h_{1t},...,h_{St}\right)'$ and $\boldsymbol{\lambda}_{t}=\left(\lambda_{1t},...,\lambda_{Kt}\right)^{'}$ such that \[ \boldsymbol{V}_{t}=\textrm{diag}\left(\exp\left(h_{1t}\right),...,\exp\left(h_{St}\right)\right),\;\;\boldsymbol{D}_{t}=\textrm{diag}\left(\exp\left(\lambda_{1t}\right),...,\exp\left(\lambda_{Kt}\right)\right). \] Each of the unobserved log-volatilities $\lambda_{kt}$ and $h_{st}$ are assumed to follow independent, first-order autoregressive process with \[ h_{st}-\mu_{\epsilon s}=\phi_{\epsilon s}\left(h_{st-1}-\mu_{\epsilon s}\right)+\eta_{\epsilon st},\;\textrm{for}\;\eta_{\epsilon st}\sim N\left(0,\tau_{\epsilon s}^{2}\right);s=1,...,S, \] and \[ \lambda_{kt}=\phi_{fk}\left(\lambda_{kt-1}-\mu_{fk}\right)+\eta_{fkt},\;\textrm{for}\;\eta_{fkt}\sim N\left(0,\tau_{fk}^{2}\right),k=1,...,K. \] We denote the parameter vector for the multivariate factor SV model by \[ \theta=\left(\beta,\left\{ \left(\phi_{fk},\tau_{fk}^{2}\right):k=1,...,K\right\} ,\left\{ \left(\mu_{\epsilon s},\phi_{\epsilon s},\tau_{\epsilon s}^{2}\right):s=1,...,S\right\} \right). \] Mendes et al. (2020) and Gunawan et al. (2018a) proposed an efficient particle Markov chain Monte Carlo sampling scheme that is scalable in terms of the number of parameters, number of latent states, and number of observations. Gunawan et al. (2018c) also proposed a flexible density tempering approach to estimate a high-dimensional factor SV model. One of the advantages of the density-tempered approach is that it can provide sequential one-step ahead predictive densities of log return (see Figure 3). Once we have the one-step-ahead predictive densities of log return, we can compute some financial risk measures, such as value at risk (VaR).

Evidence Accumulation Models (EAMs)

Figure 4: Linear Ballistic Accumulator (LBA) Model.

Evidence accumulation models (EAMs) have been used to address theoretical and applied questions about human cognition, both in the general population and in clinical groups. For example, we can use the EAM to understand how ageing affects the decision making. It has been long known that in many cognitive tasks, older adults respond more slowly than the younger adults. For many decades, age-related slowing was attributed to the decrease in the rate of information processing (Salthouse, 1996). Ratcliff et al. (2004) found that age-related slowing of older adults was very often caused by increased caution rather than the decrease in the processing rate. In these cases, older adults can process the information as quickly as the younger participants, but they responded more slowly because they were more cautious. This kind of understanding could not have been reached without the use of the evidence accumulation model (EAM).

There are many EAMs, but almost all applied work has used the linear ballistic accumulator model (LBA; see Brown and Heathcote, 2008) and the diffusion decision model (DDM; see Ratcliff and McKoon, 2008). These models assume that decisions are made by accumulating evidence for each response (the speed of this accumulation is called "drift rate"). A decision is triggered as soon as the amount of evidence in favour of one response exceeds a threshold value (see Figure 4 for the LBA model).

The key inferences are drawn in two ways: either from parameter estimates or by comparing different versions of the EAMs estimated from the same data. These inferences rely on accurate parameter estimates and valid model-selection procedures, which can be difficult problems. Most modern applications of the model use hierarchical structures estimated in a Bayesian framework.

In this research, we develop a range of Bayesian methods to fit hierarchical EAMs and their extensions. Gunawan et al. (2020) propose two new methods, the particle Metropolis within Gibbs (PMwG) and the density-tempered sequential Monte Carlo (DT-SMC) approaches, for fitting the hierarchical LBA model that is more efficient than the differential evolution Markov chain Monte Carlo (DE-MCMC) sampler used for the estimation of the LBA since (Turner et al., 2013). Gunawan et al. (2019) extend the hierarchical LBA model to allow the individual-level parameters for each subject to change over blocks of the trials. Tran et al. (2020) proposed an efficient method for estimating marginal likelihood for models where the likelihood is intractable but can be estimated unbiasedly. They illustrate their method to estimate the marginal likelihood of different versions of the hierarchical LBA model.

The computational flexibility of the new methods allows the exploration of important psychological questions which have been neglected due to statistical intractability. For example, it is well known that there can be substantial sequential effects in decision-making data. Our approaches allow tractable extensions that explicitly take into account the sequential effects such as parameter evolution due to fatigue or learning.

Brown, S. and Heathcote, A. (2008). The simple complete model of choice reaction time: linear ballistic accumulation. Cognitive Psychology57, 153-178.

Chib, S., Nardari, F., and Shephard, N. (2006). Analysis of high dimensional multivariate stochastic volatility models. Journal of Econometrics134, 341-371.

Gunawan, D., Carter, C. K., and Kohn, R. (2018a). Efficiently combining pseudo marginal and particle Gibbs sampling. arXiv:1804.04359v2.

Gunawan, D., Griffiths, W. E., and Chotikapanich, D. (2018b). Bayesian inference for health inequality and welfare using qualitative data. Economics Letters162, 76-80.

Gunawan, D., Hawkins, G., Tran, M., Kohn, R., and Brown, S. (2020). New estimation approaches for the linear ballistic accumulator model. Journal of Mathematical Psychology.

Gunawan, D., Hawkins, G. E., Kohn, R., Tran, M. N., and Brown, S. D. (2019). Time-evolving psychological processes over repeated decisions. arXiv:1906.10838v1.

Gunawan, D., Kohn, R., Carter, C., and Tran, M. N. (2018c). Flexible density tempering approaches for state space models with an application to factor stochastic volatility models. arXiv:1805.00649v1.

Kastner, G., Schnatter, S. F., and Lopes, H. F. (2017). Efficient Bayesian inference for multivariate factor stochastic volatility models. Journal of Computational and Graphical Statistics, 26(4), 905-917.

Lander, D., Gunawan, D., Griffiths, W. E., and Chotikapanich, D. (2020). Bayesian assessment of Lorenz and stochastic dominance. Canadian Journal of Economics.

Mendes, E. F., Carter, C. K., Gunawan, D., and Kohn, R. (2020). A flexible particle Markov chain Monte Carlo method. Statistics and Computing, https://doi.org/10.1007/s11222-019-09916-7.

Ratcliff, R. and McKoon, G. (2008). The diffusion decision model: theory and data for two-choice decision tasks. Neural Computation, 20, 873-922.

Ratcliff, R., Thapar, A., Gomez, P., and McKoon, G. (2004). A diffusion model analysis of the effects of aging in the lexical-decision task. Psychology and Aging19(2), 278.

Salthouse, T. A. (1996). The processing-speed theory of adult age differences in cognition. Psychological Review103(3), 403.

Sethuraman, J. (1994). A constructive definition of Dirichlet priors. Statistica Sinica4, 639-650.

Tran, M. N., Scharth, M., Gunawan, D., Kohn, R., Brown, S., and Hawkins, G. (2020). Robustly estimating marginal likelihood for cognitive models via importance sampling. Behavior Research Methods.

Turner, B. M., Sederberg, P. B., Brown, S. D., and Steyvers, M. (2013). A method for efficiently sampling from distributions with correlated dimensions. Psychological Methods18(3), 368-384.

Wiper, M., Insua, D. R., and Ruggeri, F. (2001). Mixtures of gamma distributions with applications. Journal of Computational and Graphical Statistics10(3), 440-454.

Zhou, X., Nakajima, J., and West, M. (2014). Bayesian forecasting and portfolio decisions using dynamic dependent sparse factor models. International Journal of Forecasting30(4), 963-980.

Authored by D. Gunawan, 2020.