A slightly more sophisticated approach is stacking or stacked generalization

To the contrary, it has been recognized that multiple measures are necessary to gain meaningful information even within a single modality. It is this profoundly multivariate nature of mental disorders that has driven researchers to, for example, conduct genome-wide association studies and acquire whole-brain neuroimaging data. When aiming to build predictive models, this complexity necessitates the use of methods suitable for high-dimensional datasets in which the number of variables may far exceed the number of samples . Generally, the so-called Curse of Dimensionality is addressed in three ways . First, unsupervised methods for dimensionality reduction – such as Principal Component Analysis – may be used. These algorithms apply more or less straightforward transformations to the input data to yield a lower-dimensional representation. Also, they can extract a wide range of predefined features from raw-data. For example, distance measures can be extracted from raw protein sequences for classification in a fully automated fashion.Second, techniques integrating dimensionality reduction and predictive model estimation may be applied. In essence, they use penalties for model complexity, thereby enforcing simpler, often lower-dimensional models. Simply speaking, models containing more parameters must enable proportionally better predictions to be preferred over simpler models. These algorithms are at the heart of predictive analytics projects and include well-known techniques such as Support Vector Machines and Gaussian Process Classifiers as well as the numerous tree algorithms . Third, feature-engineering – i.e. all methods aiming to create useful predictors from the input data – can be used. In short,sliding grow tables feature-engineering aims to transform the input data in a way that optimally represents the underlying problem to the predictive model.

An illustrative example comes from a recent study which constructed a model predicting psychosis onset in high-risk youths based on free speech samples. Whereas it would have been near impossible to build a model based on the actual recordings of participants’ speech, the team achieved high accuracy in a cross-validation framework using speech features extracted with a Latent Semantic Analysis measure of semantic coherence and two syntactic markers of speech complexity. While these results still await fully independent replication, the approach shows that transforming the input data using domain-knowledge can greatly foster the construction of a predictive model. Demonstrating the problem dependent nature of feature-engineering, it might have been much easier to decode, for example, participants’ gender from the actual recordings than from LSA measures given the difference in pitch between males and females. In that it links data acquisition and model algorithms, feature engineering is not primarily a preprocessing or dimensionality-reduction technique, but a conceptually decisive step of building a predictive model. While important for all modalities, feature-engineering often plays a particularly crucial role when constructing predictive models based on physiological or biophysical data. On the one hand, these data are often especially high-dimensional , thus often requiring dimensionality-reduction. On the other hand, alternative transformations of the raw-data can contain fundamentally different, non-redundant information. For example, the same fMRI raw-data – i.e. measures of changes in regional blood-oxygen levels – can be processed to yield numerous, non-redundant representations . In addition, domain-knowledge regarding the choice of relevant regions-of-interest or atlas parcellations also fundamentally affects the representation of information in neuroimaging data.As different parameters can be meaningful in the context of different disorders, these examples powerfully illustrate the fundamental importance of domain-knowledge in feature-engineering. The sources of domain knowledge needed to decide which data representations might be optimal with regard to the problem at hand may range from large-scale meta-analyses, reviews and other empirical evidence to clinical experience.

Taking the traditionally somewhat subjective “art of feature-engineering” a step further, are automated feature-engineering algorithms. The former are akin to other unsupervised methods for dimensionality reduction, but can learn meaningful transformations from large, unlabeled datasets . In short, these algorithms form high-level representations of more basic regularities in the data .For example, we might use large datasets of resting-state fMRI to automatically uncover regularities using unsupervised learning. These newly constructed features might then provide a lower-dimensional, more informative basis for model-building in future fMRI projects. Note that domain-knowledge is not provided directly, but learned from independent data sources in this framework. While these techniques appear highly efficient as no expert involvement is required, discovering high-level features for the massively multivariate measures commonly needed in psychiatry will require extraordinarily large – though possibly unlabeled – datasets as well as computational power beyond the capabilities of most institutions today. Considering the developments in other areas such as speech recognition, we believe, however, that the significance of automated feature-engineering techniques can only grow in the years to come. In many ways, the theory-driven approach to Computational Psychiatry is following an at least equally promising – albeit extreme opposite –strategy: This approach builds mechanistic models based on theory and available evidence. After a model is validated, model parameters encapsulate a theoretical, often mechanistic, understanding of the phenomena . In many ways, the resulting models thus constitute highly-formalized representations of domain-knowledge, custom-tailored to the problem at hand. Unlike virtually all other approaches to feature-engineering, computational models allow researchers to test the validity of data-representations while simultaneously fully explicating domain-knowledge. While certainly more scientifically satisfying and theoretically superior to feature-engineering, constructing valid models is far from simple. Thus, we believe that this technology will gain in importance to the degree that building valid models proofs feasible, further intertwining theoretical progress and Predictive Analytics. Having discussed feature-engineering in greater detail, it is important to point out that model construction algorithms are not limited to the use of one single data-representation.

To the contrary, it is a particular strength of this approach – with algorithms usually allowing for massively multivariate data and model integration –that multiple, meaningful data representations can be combined to enable valid predictions . Summarizing, the acquisition of high-dimensional data is regularly required to capture the massively multivariate nature of the processes underlying psychiatric disorders. Even on a single level of observation, we thus need to deal with the Curse of Dimensionality. To this end, model building commonly includes steps such as simple dimensionality-reduction techniques and penalizing model-complexity as part of machine learning algorithms. Most importantly, however,indoor grow trays feature-engineering is used to create data-representations from the input data which enable machine learning algorithms to build a valid model. Feature-engineering may draw on partially or fully formalized domain-knowledge or a combination thereof. This prominent role of domain-knowledge underlines the interdependence of classic scientific approaches seeking mechanistic insight fostering theoretical development and Predictive Analytics approaches in mental health. While theoretical progress and meta-analytic evidence aid the construction of optimal features, a predictive analytics approach, in turn, allows for a direct assessment of the clinical utility of group-level evidence and theoretical advances. Thus, it is evident that these two branches of research are not mutually exclusive, but complementary approaches when aiming to benefit patients.Substantially aggravating the problem of dimensionality discussed above, mental disorders are characterized by numerous, possibly interacting biological, intrapsychic, interpersonal and socio-cultural factors.Thus, a clinically useful patient representation must probably, in many cases, be massively multi-modal, i.e. include data from multiple levels of observation – possibly spanning the range from molecules to social interaction. All these modalities might contain non-redundant, possibly interacting sources of information with regard to the clinical question. In fact, it is this peculiarity – distinguishing psychiatry from most other areas of medicine – which has hampered research in general and translational efforts in particular for decades now. As applying a simple predictive modelling pipeline on a multi-level patient representation would increase the already large number of dimensions for a unimodal dataset by several orders of magnitude, it might seem that Predictive Analytics endeavors are likely to suffer from similar if not larger problems. Indeed, neither of the dimensionality-reduction, regularization or even feature-engineering approaches outlined above is capable of seamlessly integrating such ultra-high-dimensional data from so profoundly different modalities. Considering the tremendous theoretical problems of understanding phenomena on one level of observation, we also cannot rely on progress regarding the development of a valid theory spanning multiple levels of observation in the near future. Likewise, detailed domain-knowledge across levels of observation is extremely difficult to obtain as empirical evidence as well as expert opinions are usually specific to one modality. Given the extreme amounts of data and the combinatorial explosion due to their potential interactions, fully automated feature-engineering approaches across levels of observation also appear unlikely in the near future. Finally, the often qualitatively different data sources alone – including genetics, proteomics, psychometry, and neuroimaging data as well as ambulatory assessments and information from various, increasingly popular wearable sensors – would make this a herculean task. A somewhat trivial solution would be to limit the predictive model to a single level of observation. If high-accuracy predictions can be obtained in this way – which might be considered unlikely at least for the most difficult clinical questions – such unimodal models are always preferable due to their comparatively high efficiency. Apart from the inherent multi-modal nature of mental disorders which might render unimodal models less accurate, it is, however, exactly these efficiency considerations which obviate the need for Predictive Analytics research to consider multiple levels of observation. In order to identify the most efficient combination of data sources in a principled way in the absence of detailed cross-modal expert knowledge and evidence, we have to learn it from the data. To this end, a plethora of machine learning approaches which can be broadly described as model integration techniques – have been developed.

Probably, the most intuitive way to combine information from different high-dimensional sources is by voting. In this framework, a predictive model is trained for each modality and the majority vote is used as the overall model prediction. In a binary classification – if we wish to predict therapeutic response from five multivariate data sources – we first train a model for each modality. Then, we count the number of models predicting a response and the number of models predicting no response . The final prediction of therapeutic response is given by the option receiving more votes across modalities. Here, again, a model is trained for each modality. The predictions are, however, not combined by voting, but used as input to another machine learning algorithm which constructs a final model with the unimodal predictions as features . In addition to these simple approaches, numerous other techniques Model Averaging, Bagging, Boosting or more sophisticated ensemble algorithms exist – each with different strengths and weaknesses which affect the computational infrastructure needed and interact with data structure within and across modalities. That said, most Predictive Analytics practitioners would agree that models – in the field – are most often constructed by evaluating a large number of approaches, i.e. by trial-and-error relying on computational power. However, it cannot be emphasized enough that this strategy must rely on the training data only. At no time and in no form, may the test set – i.e. the samples later used to evaluate model performance – be used in this process. Only in this way, we guarantee a valid estimation of predictive power in practice. Note that the techniques for model combination can generally be used also to construct predictive models from unimodal multivariate datasets as well . Given the multi-modal nature of psychiatric disorders, however, they hold particular value for cross-modal model integration.Importantly, the construction of models from multi-modal data does not mean that the final predictive model used in the clinic must also be multi-modal. To the contrary, by training models with multi-modal data, we not only guarantee maximum predictive power, but also gain empirical evidence regarding the utility of each modality. Analyzing the final model, we can investigate which modalities contribute substantial, non-redundant information. In an independent sample, we could then train a model based only on those modalities most important in the first model. With this iterative process, we can obtain not only the most accurate, but also the most efficient combination of modalities and variables in a principled manner. Thus, final models might only consist of very few modalities and variables fostering their widespread use also from a health economics point of view.