Am 18. November 2022 fand an der Universität Bielefeld der 11. Nachwuchsworkshop des Zentrums für Statistik statt.
Im Rahmen des Nachwuchsworkshops stellten sich Doktorandinnen und Doktoranden der am Zentrum für Statistik beteiligen Bereiche gegenseitig ihre Forschungsfelder vor und diskutierten darüber.
Programm des Nachwuchsworkshops
Vorträge der teilnehmenden Doktorandinnen und Doktoranden
(in der Reihenfolge der Vorträge)
Julian Wäsche
Faculty of Business Administration and Economics
Design of animal experiments investigating leukemia treatment outcome
Animal experiments are often conducted to initially study medication effects of cancer. Since ethical and financial considerations speak for only as few animals as necessary being used, medical scientists are interested in taking measurements of samples as efficiently as possible. This work presents simulative as well as analytical approaches to select measurement time points more efficiently. Based on a system of ODEs describing the temporal dynamics of cell composition, an optimized design aims to estimate unknown parameters reliably with a minimum resource use.
Jonas Bauer
Faculty of Business Administration and Economics
Are we there yet? About performance of Bayesian mixture models
Many fields deal with data which consist of multiple subgroups moderated by a latent state variable. Bayesian mixture models are well suited to capture such multi-modality while assessing the parameter uncertainty. However, the study of the posterior distribution remains the computational bottleneck, as class-specific parameters further inflate parameter spaces. In this talk, we describe how to generate posterior samples from such a model, exploit its sensitivity to misspecification, and propose extensions to reduce the computational burden.
Houda Yaqine
Faculty of Business Administration and Economics
The value of monitoring information for water quality management
Taking management actions in the environmental sector involves a great deal of uncertainty. In fact, before deciding whether active management is necessary or not, the information acquired through monitoring is essential to improve knowledge of the state of the ecosystem. When considering whether or not to invest in an expensive environmental management measure, the value of information analysis (VoI) provides a useful tool for determining whether or not more information is needed to reduce uncertainty before investing in a costly measure. Our results in this case study suggest that the expected benefits of monitoring information exceed their cost. In addition, we find that VoI is highly dependent on prior distributions and management costs: VoI is maximal when both the cost of the action and the prior likelihood of a good ecological state are moderate; however, relatively high and low costs, as well as low and high prior probabilities, both reduce the value of information, which reduces monitoring's expected value.
Predrag Pilipovic
Faculty of Business Administration and Economics
Splitting schemes for SDE parameter estimation
Surprisingly, robust estimators for nonlinear continuous time models based on stochastic differential equations are yet lacking. Most applications still use the Euler-Maruyama discretization, despite many proofs of its bias. More sophisticated methods, such as the Kessler, the Ozaki or MCMC methods, lack a straightforward implementation and can be numerically unstable. We propose two efficient and easy-to-implement likelihood-based estimators based on the Lie-Trotter (LT) and the Strang (S) splitting schemes. We show that S also has an order 1 mean-square convergence rate, which was already known for LT. Furthermore, we prove under the less restrictive one-sided Lipschitz assumption that both estimators are consistent and asymptotically normal. A numerical study on the 2D FitzHugh-Nagumo model and the 3D Lorenz chaotic system complement our theoretical findings.
Matthieu Bulté
Faculty of Business Administration and Economics
Estimation and asymptotics of Fréchet means with non-independent data
The problem of estimating and analyzing Fréchet means from non-independent data in general metric spaces is probably not the most straightforwardly applicable problem to tackle. However, it is an interesting problem and it in fact does find useful applications such as the analysis of brain scans, income distributions, or phylogenetic trees. As a specialized version of M-estimation, Fréchet means benefits from the large existing literature on M-estimation. However, M-estimation in the non-independent setting has received less attention in the literature and the problem of Fréchet means estimation in this setup has not been investigated and would open up the possibility of answering many standard statistical time-series questions in general metric spaces.
Sophie Thiesbrummel
Faculty of Business Administration and Economics
KINBIOTICS - A Decision-Support System for Antibiotic-Therapies
Patients who suffer from sepsis need to be treated as an emergency due to the high mortality rate of about 20 - 60 %. A suitable therapy with antibiotics in the first hours after diagnosis is crucial for lowering the risk of death and reducing serious consequences for the patients. However, many informative data are only available after 48 hours at the earliest. This includes details about the pathogen which is responsible for the disease as well as the results of a resistogram, which provides information about which antibiotics are effective. Therefore, the treating physician has to decide on an antibiotic based on a paucity of data. As a result, the chosen antibiotic might not be suitable to treat the sepsis of a patient and the therapy may need to be rather quickly adjusted. Our goal is to use a patient’s vital signs and laboratory values to model the effectiveness of a given antibiotic. For this, we work with the freely available Medical Information Mart for Intensive Care (MIMIC) IV database which contains health-related information of patients admitted to a tertiary academic medical center in Boston (USA). In our approach, we address the research question of modeling the effectiveness of antibiotics by using hidden Markov models (HMMs). HMMs are time series models and consist of two stochastic processes, an unobserved state process and a state-dependent process. In our setting, we use the patient’s vital signs and laboratory values for the state-dependent process. Possible hidden states could be states affecting the patient’s health condition, e. g. improvement or deterioration. These states might help to draw conclusions about the effectiveness of the antibiotics. Answering this research question could make an important contribution to better understanding the relationship between changes in vital signs, laboratory values and the effectiveness of antibiotics. In addition, we might be capable of quantifying and tracing the decision-making process of the physicians. This could be the first step towards improving the prescription of effective antibiotic treatments for new patients, as well as eventually providing a basis for a decision support system in case of sepsis. The hope is that future sepsis emergency treatment can be provided in a quicker, more targeted and more effective manner.
Hannah Marchi
Faculty of Business Administration and Economics
KINBIOTICS - A Decision-Support System for Antibiotic-Therapies
Antibiotic resistance represents a major challenge for society, health policy and the economy. Sometimes it occurs already a couple of years after the introduction of a new antibiotic (Ventola, 2015). The use of broad-spectrum antibiotics, which - as the name suggests - cover a wide range of pathogens, further enhances the spread of resistance. However, current initial therapy for sepsis usually consists of broad-spectrum antibiotics. The start of therapy within the first hours is critical for the survival probability of sepsis patients. Due to limited time and lack of information, a broad-spectrum antibiotic is a good initial choice to save patients’ lives since its wide coverage is likely to have an effect on the infection. However, the use of such broad-spectrum antibiotics greatly contributes to the spread and severity of antibiotic resistance in the long term. An important approach for reducing antibiotic resistance would thus be the targeted use of antibiotics in these cases. With the help of statistical methods, we aim to find narrow spectrum antibiotics which are equally or even better suitable than the initially prescribed broad-spectrum antibiotic. We use data of the freely available MIMIC IV (Medical Information Mart for Intensive Care) database (https://mimic.mit.edu/docs/about/), which contains health-related data from patients who were admitted to the critical care units of the Beth Israel Deaconess Medical Center (Boston, MA) between 2008 and 2019. We present our approach to deal with the question how to find a targeted therapy at the time of sepsis diagnosis and discuss upcoming challenges. In a previous step, we investigate how to model the effectiveness of a prescribed antibiotic. We use the outcome of this investigation to build an antibiotic-patient matrix which contains the effectiveness information per patient for each given antibiotic. This matrix is rather sparse, as each patient has received only a few of all possible antibiotics. The empty cells in the matrix would then be filled by combining similarity estimations of antibiotics and patients. In this way, we expect to predict the effectiveness of antibiotics which were actually not given to a specific patient. We investigate starting points and data requirements for the presented research question. Our final goal is to develop a clinical decision support system which recommends an effective and targeted initial antibiotic with minimal side effects.
Lennart Oelschläger
Faculty of Business Administration and Economics
Advances in the initialization of probit model estimation and the {ino} R package
This talk presents new initialization ideas for the numerically demanding maximum likelihood estimation of probit models in discrete choice applications. The strategies, which are based on linear OLS estimation and Gibbs sampling, aim to reduce numerical cost while increasing convergence rate. Furthermore, the basic initialization strategies presented at the 9th YRW (subsample, unit, alternating optimization, standardize) have been implemented in the {ino} R package (in collaboration with Marius Ötting) and can now be applied to numerical optimization problems more broadly. The talk introduces the package and demonstrates applications.
Julia Dyck
Faculty of Business Administration and Economics
Bayesian model fitting and statistical testing for signal detection of adverse drug reactions
After release of a drug to the market, pharmacovigilance monitors known adverse drug reactions and newly detected symptoms occuring with the drug intake. This is done to keep the drug’s harm profile updated and can potentially result in adjustments of the prescription labelling or – in the extreme case – a recall of the product from the market. In recent years the interest for the use of electronic health records and longitudinal data for pharmacovigilance increased. This data type and amount provides potential for the application of survival analysis tools in order to perform signal detection tests with respect to a suspected adverse drug reaction. Cornelius et. al provided a signal detection test based on the Weibull Shape Parameter (WSP). This approach was refined by Sauzet and Cornelius leading to the Power generalized Weibull Shape Parameter (PgWSP) test. Both approaches rely on data only. We believe that the performance of the PgWSP test can be improved by incorporating existing knowledge about the ADR profile of drugs from the same family. Our goal is to construct a bayesian version of the PgWSP test that allows for inclusion of prior knowledge about the drug family’s ADR profile.
In the talk, the main idea of the PgWSP test is explained. After that, our approach to embed the method’s components – model fitting and statistical testing - into the bayesian framework is presented.
Alexander Stappert
Faculty of Psychology and Sports Science
Validating a scale for measuring implicit theories of ability in the domain of statistics
The use of valid scales is crucial to the measurement of psychological qualities. In this study, the validation of a scale for measuring implicit theories of ability (ITA) in the domain of statistics is presented. ITA are defined as individual beliefs about the malleability of individual competencies. The validation of the scale is achieved by first examining its factorial structure by testing different measurement models. In the next step, structural equation models are used to examine the scales‘ construct and criterion-related validity. For this purpose the relationship between the ITA scale and (1) a statistical self-concept (2) a language-related self-concept (3) statistics anxiety and (4) joy of learning statistics are estimated. The results suggest a two-dimensional structure of the scale and reveal differential effects regarding the relationship between the ITA components and the validation variables. Cross-sectional ($N = 749$) and longitudinal ($N = 124$) self-report data from online questionnaires of students from German universities are used to test the hypotheses. To facilitate domain specificity, all constructs were measured considering the subject of learning statistics in the respective item-wordings.
Rouven Michels
Faculty of Business Administration and Economics
Extending the Dixon and Coles model: an application to women’s football data
For describing the number of goals in football, the model by Dixon and Coles (1997) has found tremendous impact. By extending the classical double Poisson model with two independent Poisson distributions in such a way that the probabilities for 0-0, 1-0, 0-1 and 1-1 can be changed, this model is widely considered as the standard model for football scores. In this talk, we show that this model is a special case of a multiplicative model known as the Sarmarnov family. Moreover, we extend the classical DC model in various ways. Those extended models are then fitted to women’s football data as previous models have been applied to men’s football only. However, the scores in women’s football are different to those of men’s football and thus, an independent analysis is needed. We find that an extended Sarmanov model emerges as the most promising model for women’s football scores.
Dora Tinhof
Faculty of Psychology and Sports Science
The Multi-Method Latent State-Trait Model for Random and Fixed Situations: Performance Evaluation & Software Implementation
Repeated calls for multi-method, multi-situation, and multi-time study designs have given rise to the development of a multi-method latent state-trait model for random and fixed situations (MM‐LST‐RF; Hintz, Geiser & Shiffman, 2018). It allows not only for an examination of (stable) trait and (variable) state (occasion / random situation) variables but also that of (fixed) situations and method effects as well as their interactions. This novel MM-LST-RF approach requires modeling numerous manifest and latent variables. On the one hand, it is therefore particularly interesting, how such a complex model performs under various conditions. Extending previous findings, we conducted an extensive Monte-Carlo simulation study to explore the influence of specific model and design aspects on convergence rates, model fit, estimation accuracy and power. On the other hand, the numerous variables, model-specific coefficients and necessary model restrictions impede the models practical applicability. On that account we developed an open source software implementation of the MM-LST-RF model based on the R package lavaan.
Sina Mews
Faculty of Business Administration and Economics
Modelling claims data using Markov-modulated marked Poisson processes
We explore Markov-modulated marked Poisson processes (MMMPPs) as a natural framework for modelling patients' disease dynamics over time based on medical claims data. In claims data, observations do not only occur at random points in time but are also informative, i.e. driven by unobserved disease levels, as poor health conditions usually lead to more frequent interactions with the healthcare system. Therefore, we model the observation process as a Markov-modulated Poisson process, where the rate of healthcare interactions is governed by a continuous-time Markov chain. Its states serve as proxies for the patients' latent disease levels and further determine the distribution of additional data collected at each observation time, the so-called marks. Overall, MMMPPs jointly model observations and their informative time points by comprising two state-dependent processes: the observation process (corresponding to the event times) and the mark process (corresponding to event-specific information), which both depend on the underlying states. The approach is illustrated using claims data from patients diagnosed with chronic obstructive pulmonary disease (COPD) by modelling their drug use and the interval lengths between consecutive physician consultations. The results indicate that MMMPPs are able to detect distinct patterns of healthcare utilisation related to disease processes and reveal inter-individual differences in the state-switching dynamics.
Carlina Feldmann
Faculty of Business Administration and Economics
Modelling state switching dynamics using P-splines
Hidden Markov models (HMMs) are flexible time series models that assume underlying states determining the distributions of observable outcomes. These state-dependent distributions can be modelled quite flexibly. However, modelling the state process, i.e. the probabilities of switching between states, is usually done with transformations of rather simple linear predictors. To increase flexibility, polynomials or trigonometric functions for cyclic patterns are used but choosing the right amount of flexibility can be difficult. We propose incorporating P-splines in the state process of HMMs, as they make use of an effective penalization strategy to choose the best number of parameters necessary to adequately fit the data. For the estimation of the model, a weighted GAM with a (multinomial) logit link within an EM algorithm is utilized. Conveniently, models with P-splines can easily be fitted in R using the mgcv package.
Sebastian Büscher
Faculty of Business Administration and Economics
Robust estimation of misspecified discrete choice models
Correlation between repeated choices is classically modelled by introducing random effects. This accounts for taste heterogeneity and creates correlation across the different observations for one individual. However, accounting for the autocorrelation of the error terms is tedious to implement in multinomial discrete choice models and includes the estimation of many additional parameters if no restrictive assumptions on the autocorrelation pattern are made. Explicitly modelling autocorrelated errors has, hence, rarely been considered. Ignoring possible autocorrelation of the errors, however, leads to a misspecification of the model, which cannot be completely compensated for by the introduction of mixed effects. Composite marginal likelihood (CML) estimation procedures, which use marginal pairwise likelihoods instead of the joint likelihood over all observations of one individual, now offer a way to mitigate the effects of misspecification by tuning the power weights attached to the pairwise likelihoods. We discuss the theory behind exploiting power weights of CML procedures to mitigate the effects of misspecification due to unaccounted autocorrelation, show some asymptotic results, and demonstrate the practical effect in a finite-sample simulation study.