Seminari del DiSIA
Abstract
Salvo indicazioni contrarie, i seminari si terranno in sala riunioni 205 (ex 32)
05/02/2025 ore 14.00
Double Welcome Seminar
Danilo Bolano and Alessandro Cardinali
A Life Course Study with n=1 (Mine) My primary research interest lies in life course studies, combining methodological development with applied research. In this welcome seminar, I will provide an overview of my academic background, highlighting key aspects of my research and professional (as well as residential) trajectories. The seminar will focus on two main facets of my academic identity: 1.The Social Statistician. I will discuss my work on developing and applying statistical tools for longitudinal data analysis, with a particular emphasis on methods designed to address the complexity of life trajectories. 2. The Social Demographer. This part will explore recent contributions to family demography, focusing on how demographic patterns and individual decisions shape life trajectories. I will conclude by outlining my plans for future research and how they align with the broader objectives of the department. -------- Efficient GMM Inference and Covariance Estimation for Costationary processes In this paper we propose a novel estimator for the time-varying covariance of locally stationary time series. This new approach is based on costationary combinations, that is, time-varying deterministic combinations of locally stationary time series that are second-order stationary. We first review the theory of costationarity and formalize a consistent Generalized Method of Moments (GMM) estimator for the coefficient vectors. We then use this new framework to derive an efficient covariance estimator. We show that the new approach has smaller variance than the alternative covariance estimators exclusively based on the evolutionary cross-periodogram, and is therefore appealing in a large number of applications. We confirm our theoretical findings through a simulation experiment. This shows that our approach improves substantially over competitors for finite sample sizes which are of common use. We then present a new analysis of the FTSE and SP500 log return series. We also analyze DEM/USD and GBP/USD exchange rate return series and show that our new estimator compares favorably with existing approaches and is capable of highlighting certain economic shocks in a clearer manner.
Referente: Raffaele Guetto e Monia Lupparelli
12/12/2024 ore 14.00
Indicatori e indici per misurare la realtà complessa
Matteo Mazziotta (ISTAT)
Le misurazioni sono alla base della scienza. Una delle esigenze più impellenti per gli studiosi e i policy maker è misurare, attraverso i numeri, fenomeni importanti per la nostra vita all’interno della società, monitorarne l’evoluzione nel tempo e analizzare le relazioni tra loro esistenti, in modo da consentire la comprensione della realtà complessa e decidere i giusti interventi per raggiungere determinati obiettivi. Molti fenomeni socioeconomici, ma anche ambientali, biologici e propri di altre scienze, sono multidimensionali e richiedono, per essere misurati, l’utilizzo di tecniche statistico-matematiche che ne semplifichino la lettura e la fruibilità per lo studio e l’analisi.La sfida che oggi si presenta agli ‘scienziati della misurazione’ è l’individuazione di metodologie ‘oggettive’ per quantificare fenomeni definiti ‘difficilmente misurabili’ o, a torto, ‘non misurabili’. Un indice composito è una combinazione matematica di un insieme di indicatori elementari che rappresentano le diverse componenti di un concetto multidimensionale da misurare.
Referente: Maria Veronica Dorgali
03/12/2024 ore 12.00
Come comunicare e promuovere la statistica
Patrizia Collesi (Istat)
Orientarsi in mezzo a numeri e dati ufficiali non è sempre facile, così come non è sempre facile capirne l'utilità, ed il ruolo della comunicazione è fondamentale per arrivare al pubblico. Tra i compiti dell'Istat c'è quello della promozione della cultura statistica e dell'uso corretto dei dati. Il modulo presenta le attività dell'Istituto rivolte a studenti, docenti, e al pubblico in generale e illustra le attività di comunicazione e promozione dei dati a livello europeo e internazionale (Eurostat, Unece) effettuate insieme alle istituzioni europee o internazionali.
Referente: Raffaele Guetto
26/11/2024 ore 14.00
Disparità, stereotipi e violenza di genere. Quantificare per informare
Chiara Landi (ISTAT), Donatella Merlini (UNIFI), Maria Cecilia Verri (UNIFI), Barbara Cagnacci (ISTAT)
Incontro di approfondimento sulla disparità, sugli stereotipi e sulla violenza contro le donne. Durante l’iniziativa in oggetto saranno presentati tre contributi: ciascuno di essi verterà su uno dei tre aspetti summenzionati e servirà ad inquadrare i fenomeni da un punto di vista quantitativo. Inoltre, durante il primo intervento saranno restituiti i risultati di un questionario somministrato i giorni antecedenti all’iniziativa agli studenti delle lauree triennali e magistrali (i quali sono caldamente invitati a partecipare). Incontro organizzato in collaborazione con Istat. Interventi: Gli stereotipi di genere: un quadro informativo e confronto con l'aula - Chiara Landi (ISTAT); Informatica: sostantivo femminile? - Donatella Merlini (UNIFI), Maria Cecilia Verri (UNIFI); Focus statistico relativo alla violenza sulle donne - Barbara Cagnacci (ISTAT)
21/11/2024 ore 14.00
The European Values Study 1981-2026: from face-to-face to self-completion
Ruud Luijkx (Tilburg University)
The European Values Study (EVS) was first conducted in 1981 and then repeated in 1990, 1999, 2008, and 2017, with the aim of providing researchers with data to investigate whether European individual and social values are changing and to what degree. The EVS is traditionally carried out as a probability-based face-to-face survey that takes around 1 hour to complete. In recent years, large-scale population surveys such as the EVS have been challenged by decreasing response rates and increasing survey costs. In the light of these challenges, six countries that participated in the last wave of the EVS tested the application of self-administered mixed-modes (Denmark, Finland, Germany, Iceland, the Netherlands, and Switzerland). In this contribution, I will present the implemented mode experiments from the last EVS wave (2017) and sketch the infra-structural challenges to move from face-to-face to self-completion. Special attention will be given to the Dutch situation and the national cooperation between ESS, ISSP, GGS and EVS. It is pivotal for data use in substantive research to make the reasoning behind design changes and country-specific implementations transparent as well as to highlight new research opportunities that will emerge when surveys cooperate and will use probabilistic web panels.
Referente: Raffaele Guetto
30/10/2024 ore 12.00
Mixture of Hidden Markov Models for labour statistics, the Italian case
Roberta Varriale (Department of Statistical Science, Sapienza University of Rome)
In recent decades, national statistical institutes in Europe have started to produce official statistics using multiple sources of information rather than a single source, usually a statistical survey. We show how latent variable models can be used for estimating employment proportions in the Italian regions using survey and administrative data, taking into account the deficiencies in the corresponding measurement process and the longitudinal structure of the data. To this purpose, we adopt a mixture of hidden Markov models (MHMM), a longitudinal extension of latent class analysis; in the model specification, we consider covariates and unobserved heterogeneity of the latent process among the units of the population.
Referente: Leonardo Grilli
11/10/2024 ore 12.00
A Betoidal-Multinomial model for estimating overdispersion in non-transformed scores of ANVUR’s department ranking procedure
Marco Doretti
The Italian Agency for the Evaluation of Universities and Research Institutes (ANVUR) periodically ranks Italian academic departments by means of a standardized performance index. As a matter of fact, recent investigations have raised concerns on the overall validity of such an index, most likely due to the presence of unmodeled intraclass correlation. In this talk, an adjusting solution is presented which is based on the introduction of an ad hoc continuous random variable. The essential features of this variable are outlined. In particular, we show its similarities with the symmetric Beta distribution, for which reason we refer to it as to the Betoidal distribution. A discretized version of this distribution, giving rise to what we term Betoidal-Multinomial model, is also introduced to address the fact that only a rounded version of the ANVUR performance index is publicly released. We derive a Maximum Likelihood estimation framework for such a model, and we apply it to data from ANVUR’s 2017 ranking exercise.
Referente: Monia Lupparelli
27/09/2024 ore 12.00
Covering Temporal Graphs: Complexity and Algorithms
Riccardo Dondi (Università degli Studi di Bergamo)
We consider two formulations related to the well-known vertex cover problem in the context of temporal graphs, that has been recently defined for the summarization of temporal networks. The problem formulations ask for the definition of an activity interval for each vertex; a temporal edge is covered when at least one of its endpoints is active at the time where the temporal edge is defined. In the first variant, called MinTimeLineCover, we want to cover all the temporal edges and the objective function is the minimization of the length of the activity intervals. In the second variant, called MaxTimeLineCover, the vertices have activity intervals of bounded length and the goal is to cover the maximum number of temporal edges. We discuss novel contributions on the computational complexity of the two formulations. Furthermore, we present approximation and fixed-parameter algorithms, both for general temporal graphs and for some restrictions.
Referente: Prof. Andrea Marino
24/09/2024 ore 12.00
A Mathematical Model of Immune Responses with CD4+ T cells and Tregs
Bruno Mendes Oliveira (University of Porto)
We use a a set of ordinary differential equations (ODE) to study mathematically the effect of regulatory T cells (Tregs) in the control of immune responses by CD4+ T cells. T cells trigger an immune response in the presence of their specific antigen, while regulatory T cells (Tregs) play a role in limiting auto-immune diseases due to their immune-suppressive ability. We have obtained explicit exact formulas that give the relationship between the concentration of T cells, the concentration of Tregs, and the antigenic stimulation of T cells, when the system is at equilibria, stable or unstable. We found a region of bistability, where 2 stable equilibria coexist. Making a cross section along the antigenic stimulation of T cells parameter, we observe an hysteresis bounded by two thresholds of antigenic stimulation of T cells. Moreover, there are values of the slope parameter of the tuning, between the antigenic stimulation of T cells and the antigenic stimulation of Tregs, for which an isola-center bifurcation appear and, for some other values, there is a transcritical bifurcation. Furthermore, we fitted this model to quantitative data regarding the infection of mice with lymphocytic choriomeningitis virus LCMV. We observed the proliferation of T cells and, to a lower extent, Tregs during the immune activation phase following infection and subsequently, during the contraction phase, a smooth transition from faster to slower death rates. Time evolutions of this model were also used to simulate the appearance of autoimmunity both due to cross-reactivity or due to bystander proliferation, and to simulate the suppression of the autoimmune line of T cells after a different line of T cells responds to a pathogen infection.
Referente: Michela Baccini
20/09/2024 ore 12.00
Evoluzione dell’urbanizzazione. Storia e sviluppo delle città nel mondo
Silvana Salvini (DiSIA, Università di Firenze)
Nel seminario si illustra il processo di urbanizzazione nel tempo e nello spazio. Dalla storia delle città nelle civiltà antiche fino alle città contemporanee si analizzano specificità e comunanze. Le fasi del processo dividono i paesi sviluppati (Ps) e i paesi in via di sviluppo (Pvs), delineando rispettivamente i fenomeni definibili in contro-urbanizzazione e urbanizzazione, con i borghi e le piccole città che si sviluppano nei primi e le megalopoli soprattutto nei secondi. Ci soffermiamo sulle città più grandi come New York, Istanbul, Lagos, sulle più ricche come Tokio, Parigi e Londra, le più belle come Praga, Lisbona e Vienna, Firenze; infine su quelle disagiate, caratterizzate da quartieri ricchi e confortevoli accanto alle baraccopoli, così dimostrando le problematiche della disuguaglianza. La relazione fra concentrazione urbana ed economia è essenzialmente diversa fra Ps e Pvs. La dimensione e il numero delle città mostrano nei due diversi contesti il passaggio da economia rurale, a economia industriale e quindi a crescita del settore terziario, così come le prospettive della qualità della vita. La relazione fra urbanizzazione e cultura è esemplificata con Parigi, Londra (gli anni Sessanta: the swinging London) e New York. Sia fecondità sia mortalità infantile nei Pvs hanno livelli più bassi in area urbana, mentre nei Ps i differenziali sono praticamente inesistenti. nei Ps (non ci sono relazioni evidenti) e nei Pvs (relazioni marcate). Le due rive del Mediterraneo, con le città dalla storia antica sono emblematiche a questo riguardo per effetto di cultura, religione e istruzione e emancipazione femminile. Il futuro di queste diverse caratteristiche è dominato dall’evoluzione delle megalopoli. Il fenomeno migratorio distingue l’evoluzione della popolazione nei diversi continenti, e le migrazioni interne nell’Africa sub-Sahariana hanno origini variegate, dai cambiamenti climatici ai conflitti, creando megalopoli sterminate. La pianificazione e le politiche urbane oggi tendono a costruire e a modificare i centri urbani in smart cities, con l’obiettivo della diminuzione dell’impronta ecologica. I fenomeni che ne derivano sono la densificazione e la verticalizzazione. Un focus di rilievo è rappresentato dai movimenti di pendolari a Tokio, New York contro, ad esempio, a Los Angeles. In parallelo si sviluppa la modifica della città con la globalizzazione e la localizzazione delle attività: Vienna, Parigi, Londra e Barcellona. Le prospettive future con gli esempi di Londra e Parigi: secondo l’Oecd nel secondo decennio del 2000 in Europa ci sono circa 800 città e la domanda è se esista un’identità europea anche soffermandosi sull’architettura, costruendo progetti nelle città europee per la sostenibilità ambientale. Fra le città extra-europee dei Ps si cita San Francisco, che cresce demograficamente ed economicamente, dove sono nati tutti i movimenti di liberazione sociale, come quello dei gay e dell’emancipazione femminile, fino agli ultimi anni con l’enorme diffusione della rivoluzione digitale. Un’attenzione particolare è dedicata alle città italiane fra crescita e decrescita. Città in declino numerico come Firenze, dove le nascite sono ormai da oltre 20 anni inferiori ai decessi e la popolazione invecchia, molto più dei comuni che circondano il capoluogo, che presentano una maggiore vitalità demografica. Firenze è un esempio di quanto sta accadendo in genere in Italia e in altre regioni europee.
Referente: Raffaele Guetto
27/06/2024 ore 09:30
Graphical Models for Human Mobility
Adrian Dobra (University of Washington)
Human mobility, or movement over short or long spaces for short or long distances of time, is an important yet under-studied phenomenon in the social and demographic sciences. Today a broad range of spatial data are available for studying human mobility, such as geolocated residential histories, high-resolution GPS trajectory data, and large-scale human-generated geospatial data sources such as mobile phone records and geolocated social media data. In this talk I will present statistical approaches which include graphical models that take advantage of these types of geospatial data sources to measure the geometry, size and structure of activity spaces, to assess the temporal stability of human mobility patterns, and to study the complex relationship between population mobility and the risk of HIV acquisition in South Africa.
Referente: Francesco Claudio Stingo
18/06/2024 ore 11.00
New Menger-like dualities in digraphs and applications to half-integral linkages
Raul Lopes
In the k-Directed Disjoint Paths (k-DDP) problem we receive a digraph D together with a set of pairs of terminal vertices s_1, t_1, … s_k,t_k and the goal is to decide if D contains a set of pairwise vertex-disjoint paths P_1, …, P_k such that each P_i is a path from s_i to t_i. The k-DDP problem finds applications in the design of VLSI circuits, high speed network routing, collision-free routing of agents, and others. Although this problem is hard to solve in general for even k=2 paths, polynomial-time algorithms are known for fixed k and some variations of the problem. A common relaxation allows for some degree of congestion in the vertices of the given digraph. In the context of providing algorithms for the congested version of k-DDP, we present new min-max relaxations in digraphs between the number of paths satisfying certain conditions and the minimum order of an object intersecting all such paths. Applying our tools, we manage to improve and simplify several previous results regarding relaxations of k-DDP in particular classes of digraphs.
Referente: Prof. Andrea Marino
29/05/2024 ore 12.00
Studying gambling behavior with Structural Equation Models
Kimmo Vehkalahti (Centre for Social Data Science, University of Helsinki, Finland)
In a recent paper (authored jointly with Maria Anna Donati and Caterina Primi from UniFi) we used Structural Equation Models (SEM) for studying gambling behavior of Italian high school students. We specified path models and tested indirect (serial mediation) hypotheses of how selected cognitive variables (correct knowledge of gambling and gambling-related cognitive distortions) and affective variables (positive economic perception of gambling and expectation and enjoyment and arousal towards gambling) are related to gambling frequency and gambling problem severity. SEMs conducted with adolescent gamblers attested two indirect effects from knowledge to problem gambling: One through gambling-related cognitive distortions and one through gambling frequency. Overall, our results confirmed that adolescent problem gambling is a complex phenomenon explained by multiple and different factors. In this talk, we will discuss the assumptions, choices, practices, and results of the SEM modeling process.
Referente: Chiara Bocci
24/05/2024 ore 12.00 - On-site and online seminar
Causal Modelling in Space and Time
Marco Scutari (Dalle Molle Institute)
The assumption that data are independent and identically distributed samples from a single underlying population is pervasive in statistical modelling. However, most data do not satisfy this assumption. Regression models have been extended to deal with structured data collected over time, spaces, and different populations. But what about causal network models, which are built on regression? In this talk, we will discuss how to produce causal models that can answer crucial causal questions in environmental sciences, epidemiology and other challenging domains that produce data with complex structures.
Referente: Florence Center for Data Science
10/05/2024 ore 14.00 - On-site and online seminar
Does the supply network shape the firm size distribution? The Japanese case
Corrado Di Guilmi (University of Florence)
The paper presents an investigation on how the upward transmission of demand shocks in the Japanese supply network influences the growth rates of firms and, consequently, shapes their size distribution. Through an empirical analysis, analytical decomposition of the growth rates’ volatility, and numerical simulations, we obtain several original results. We find that the Japanese supply network has a bow-tie structure in which firms located in the upstream layers display a larger volatility in their growth rates. As a result, the Gibrat’s law breaks down for upstream firms whereas downstream firms are more likely to be located in the power law tail of the size distribution. This pattern is determined by the amplification of demand shocks hitting downstream firms, and the magnitude of this amplification depends on the network structure and on the relative market power of downstream firms. Finally, we observe that in an almost complete network, in which there are no upstream or downstream firms, the power-law tail in firm size distribution disappears. An important implication of our results is that aggregate demand shocks can affect the economy both directly, through the reduction in output for downstream firms, and also indirectly by shaping the firm size distribution.
Referente: Florence Center for Data Science
07/05/2024 ore 12.00
Does access to regular work affect immigrants’ integration outcomes? Evidence from an Italian amnesty program
Chiara Pronzato (Università di Torino, Collegio Carlo Alberto)
Economic inclusion is often seen as a tool for social inclusion and integration of immigrants. In this paper, we estimate the impact of regular work, within one year of arriving in Italy, on the long-term integration of immigrant individuals, after a period of approximately 10 years. How important is it to guarantee a solid start for their integration and, therefore, for the social balance of the society as a whole? To answer this question, we analyze a sample of immigrants involved in the ISTAT Social Condition and Integration of Foreign Citizens survey that took place in Italy in 2011-12. Our impact estimates are based on instrumental variables, exploiting a 2002 amnesty that improved the probability of getting a regular job depending on the time of arrival. We find beneficial effects of early engagement in regular employment on various integration indicators, including trust in institutions, language proficiency, cultural assimilation.
Referente: Raffaele Guetto
03/05/2024 ore 11.30
Laying the Foundations for the Design and Analysis of Experiments with Large Amounts of Ancillary Data: Part 2
Geoff Vining (Department of Statistics, Virginia Tech)
The origins of the design and analysis of experiments required the analyst to evaluate the effects of treatments applied to properly defined experimental units. Fisher’s fundamental principles underlying the proper design of an experiment required: randomization, replication, and local control of error. Randomization assured that each experimental unit available for the experiment has exactly the same probability for being selected for each of the possible treatments. Replication allowed the analyst to evaluate the effects of the treatments by comparing treatment means. Local control of error represented the attempt to minimize the impact of other possible sources of variation. The fact that Fisher could not directly observe the effect of the “chance causes” of the variation forced the focus on comparing treatment means within his overall framework. Modern sensor technology allows the experimenter to observe the effects of many of the chance causes that Fisher could not. However, incorporating this information requires the analyst to model the data through proper linear or non-linear models, not by comparing treatment means. The resulting implications for the proper analysis, taking into account the available ancillary variables, are fascinating, with far-reaching implications for the future of the design and analysis of experiments. Part 2 extends the theoretical foundation to “pseudo” experiments based on the examples in Part 1. Part 2 “extracts” an experimental design from one example to illustrate the analysis as if it were a properly conducted experiment. The second example illustrates how to plan an experiment to follow up on the other example in Part 1 for a future 2^4 with four center runs experiment.
Referente: Rossella Berni
30/04/2024 ore 10.30
Laying the Foundations for the Design and Analysis of Experiments with Large Amounts of Ancillary Data: Part 1
Geoff Vining (Department of Statistics, Virginia Tech)
The origins of the design and analysis of experiments required the analyst to evaluate the effects of treatments applied to properly defined experimental units. Fisher’s fundamental principles underlying the proper design of an experiment required: randomization, replication, and local control of error. Randomization assured that each experimental unit available for the experiment has exactly the same probability for being selected for each of the possible treatments. Replication allowed the analyst to evaluate the effects of the treatments by comparing treatment means. Local control of error represented the attempt to minimize the impact of other possible sources of variation. The fact that Fisher could not directly observe the effect of the “chance causes” of the variation forced the focus on comparing treatment means within his overall framework. Modern sensor technology allows the experimenter to observe the effects of many of the chance causes that Fisher could not. However, incorporating this information requires the analyst to model the data through proper linear or non-linear models, not by comparing treatment means. The resulting implications for the proper analysis, taking into account the available ancillary variables, are fascinating, with far-reaching implications for the future of the design and analysis of experiments. Part 1 lays the theoretical foundation for a modern approach to the analysis of the experiment, taking full advantage of standard linear and non-linear model theory. Two real examples illustrate the concepts. In the process, it becomes quite clear why many people have recently noted serious issues with hypothesis tests in general.
Referente: Rossella Berni
17/04/2024 ore 12.00
Learning Gaussian graphical models for paired data with the pdglasso
Saverio Ranciati
In this talk we present the pdglasso, an approach for statistical inference with Gaussian graphical models on paired data, that is when there are exactly two dependent groups and the interest lies on learning the two networks together with their across-graph association structure. The modeling framework contains coloured graphical models and, more precisely, a subfamily of RCON models suited to deal with paired data. Algorithmic implementation, relevant submodels, and maximum likelihood estimates are discussed. We also illustrate the associated R package 'pdglasso', its main contents and usage. Results on simulated and real-data environments are discussed at the end.
Referente: Maria Francesca Marino
11/04/2024 ore 11.00
Double machine learning for sample selection models
Michela Bia ( Liser and University of Luxembourg)
This paper considers the evaluation of discretely distributed treatments when outcomes are only observed for a subpopulation due to sample selection or outcome attrition. For identification, we combine a selection-on-observables assumption for treatment assignment with either selection-on-observables or instrumental variable assumptions concerning the outcome attrition/sample selection process. We also consider dynamic confounding, meaning that covariates that jointly affect sample selection and the outcome may (at least partly) be influenced by the treatment. To control in a data-driven way for a potentially high dimensional set of pre- and/or post-treatment covariates, we adapt the double machine learning framework for treatment evaluation to sample selection problems. We make use of (a) Neyman-orthogonal, doubly robust, and efficient score functions, which imply the robustness of treatment effect estimation to moderate regularization biases in the machine learning-based estimation of the outcome, treatment, or sample selection models and (b) sample splitting (or cross-fitting) to prevent overfitting bias. We demonstrate that the proposed estimators are asymptotically normal and root-n consistent under specific regularity conditions concerning the machine learners and investigate their finite sample properties in a simulation study. We also apply our proposed methodology to the Job Corps data for evaluating the effect of training on hourly wages which are only observed conditional on employment. The estimator is available in the causalweight package for the statistical software R.
Referente: Alessandra Mattei
28/03/2024 ore 12.00 - On-site and online seminar
Interplay between Privacy and Explainable AI
Anna Monreale (Università di Pisa)
In recent years we are witnessing the diffusion of AI systems based on powerful machine learning models which find application in many critical contexts such as medicine, financial market, credit scoring, etc. In such contexts, it is particularly important to design Trustworthy AI systems while guaranteeing the interpretability of their decisional reasoning, and privacy protection and awareness. In this talk, we will explore the possible relationships between these two relevant ethical values to take into consideration in Trustworthy AI. We will answer research questions such as: how explainability may help privacy awareness? Can explanations jeopardize individual privacy protection?
Referente: Florence Center for Data Science
20/03/2024 ore 12.00
Two tales of the information matrix test
Gabriele Fiorentini (University of Florence)
The talk is based on the results of two related notes in which we derive explicit expressions for the information matrix test of two rather popular models: The multinomial logit and the finite mixture of multivariate Gaussians. Information matrix tests for multinomial logit models In this paper we derive the information matrix test for multinomial logit models in which the explanatory variables are common across categories, but their effects are not. We show that the vectorised sum of the outer product of the score and the Hessian matrix coincides with the Kronecker product of the outer product of the generalised residuals minus their covariance matrix conditional on the explanatory variables times the outer product of those variables. Therefore, we can reinterpret it as a multivariate version of White's (1980) heteroskedasticity test, which agrees with Chesher's (1983) interpretation of the information matrix test as a Lagrange multiplier test for unobserved heterogeneity. Our Monte Carlo experiments confirm that using the theoretical expressions for the covariance matrices of the influence functions involved leads to substantial reductions in the size distortions of our testing procedures in finite samples relative to the outer product of the score versions, and that the parametric bootstrap practically eliminates them. We also show that the information matrix test has good power against various misspecification alternatives. The information matrix test for Gaussian mixtures In incomplete data models the EM principle implies the moments the Information Matrix test assesses are the expectation given the observations of the moments it would assess were the underlying components observed. This principle also leads to interpretable expressions for their asymptotic covariance matrix adjusted for sampling variability in the parameter estimators under correct specification. Monte Carlo simulations for finite Gaussian mixtures indicate that the parametric bootstrap provides reliable finite sample sizes and good power against various misspecification alternatives. We confirm that 3-component Gaussian mixtures accurately describe cross-sectional distributions of per capita income in the 1960-2000 Penn World Tables.
Referente: Monia Lupparelli
15/03/2024 ore 11.00 - On-site and online seminar
Bayesian modelling for spatially misaligned health areal data
Silvia Liverani (Queen Mary University of London)
The objective of disease mapping is to model data aggregated at the areal level. In some contexts, however, (e.g. residential histories, general practitioner catchment areas) when data is arising from a variety of sources, not necessarily at the same spatial scale, it is possible to specify spatial random effects, or covariate effects, at the areal level, by using a multiple membership principle. In this talk I will investigate the theoretical underpinnings of these application of the multiple membership principle to the CAR prior, in particular with regard to parameterisation, properness and identifiability, and I will present the results of an application of the multiple membership model to diabetes prevalence data in South London, together with strategic implications for public health considerations.
Referente: Florence Center for Data Science
11/03/2024 ore 14.00
Trajectories of loneliness in later life – Evidence from a 10-year English panel study
Giorgio Di Gessa (University College London)
Loneliness is generally defined as the discrepancy between individuals’ desired and actual social interactions and emotional support. Although the prevalence of loneliness is high among older people and is projected to rise, few studies have examined longitudinal patterns of loneliness. Moreover, most studies have focused on more “objective” risk factors for loneliness such as partnership status and frequency of contact, overlooking the quality of the relationship with and support from family and friends. Using data from six waves of the English Longitudinal Study of Ageing (2008/09 to 2018/19, N=4740), we used group-based trajectory modelling to identify distinctive trajectories of loneliness. Multinomial regression models were then used to examine characteristics associated with these trajectories, with a particular focus on size, support, closeness, and frequency of contact with social network members. We identified 5 groups of loneliness trajectories in later life, representing “stable low” (40% of the sample), “medium/low” (26%), “stable high” (11%), and “increasing” (14%) or “decreasing” (9%) levels of loneliness over time. Although there are socioeconomic and demographic differences across these trajectories of loneliness, health and relationship quality are their main drivers. Respondents with poor and deteriorating health were more likely to be classified as having “stable high” or “increasing” loneliness. Even if not having social networks is undoubtedly associated with higher risks of persistent loneliness, having friends and family is not enough: Respondents with low quality of relationships with both friends and family were also significantly more likely to be classified as having “stable high” or “increasing” levels of loneliness.
Referente: Raffaele Guetto
27/02/2024 ore 12.00
Ersilia Lucenteforte: Spreading evidence: a heterogeneous journey in Medical Statistics
Chiara Marzi: Interdisciplinary biomedical research: exploring Brain Complexity, Machine Learning, and Environmental Epidemiology
Welcome seminar: Ersilia Lucenteforte, Chiara Marzi
Ersilia Lucenteforte:
My past research has explored into various aspects of medical statistics, spanning from cancer epidemiology to pharmacoepidemiology and clinical research, with a strong emphasis on Evidence-Based Medicine. In this welcome seminar, I will provide a brief overview of my past activities and discuss my recent focus on a crucial aspect of pharmacoepidemiology: the analysis of medication adherence.
Chiara Marzi:
This seminar offers a brief journey through the diverse facets of interdisciplinary biomedical research, as seen through the eyes of a young researcher. I will delve into my main research themes - past, present, and future - spanning from understanding the complexity of the brain to exploring the practical applications of machine learning in medicine, and investigating the impacts of environmental factors on health.
Referente: Raffaele Guetto
23/02/2024 ore 12.00 - On-site and online seminar
Nonhomogeneous hidden semi-Markov models for environmental toroidal data
Francesco Lagona (University of Roma Tre)
A novel hidden semi-Markov model is proposed to segment bivariate time series of wind and wave directions according to a finite number of latent regimes and, simultaneously, estimate the influence of time-varying covariates on the process’ survival under each regime. The model integrates survival analysis and directional statistics by postulating a mixture of toroidal densities, whose parameters depend on the evolution of a semi-Markov chain, which is in turn modulated by time-varying covariates through a proportional hazards assumption. Parameter estimates are obtained using an EM algorithm that relies on an efficient augmentation of the latent process. Fitted on a time series of wind and wave directions recorded in the Adriatic sea, the model offers a clear-cut description of sea state dynamics in terms of latent regimes and captures the influence of time-varying weather conditions on the duration of such regimes.
Referente: Florence Center for Data Science
15/02/2024 ore 12.00
Do Intergeneration Household Structures Reflect Differences in American Middle School Students' School Experiences and Engagement in Schoolwork?
Peter Brandon (University at Albany)
American children grow up in a variety of household structures. Across these households, resources, parenting styles, household composition, and surrounding neighborhoods can vary. Studies suggest that the intermingling of these social, economic, and demographic factors affects children’s well-being and later transitions into adulthood. Thus, households in which children find themselves are consequential and shape their future opportunities. Among the households in which American children grow up, two of the more significant types are three- and skipped-generation households. Our understanding of these particular households has expanded, but there is still much to learn, especially about the everyday experiences of children growing up in these two types of households. Among those everyday experiences worth investigating further are those related to schooling. Positive schooling experiences and a child’s interest in learning are crucial to their development and identity. Preliminary findings from this study suggest schooling experiences and engagement in schoolwork, outside of the classroom, for children in intergenerational households may differ from their peers growing up in other households. The study speculates about interventions focused on the home environment or at the school that might ensure children in intergenerational households are not educationally disadvantaged.
Referente: Giammarco Alderotti
12/01/2024 ore 14.30 - Please register here to participate online: https://docs.google.com/forms/d/e/1FAIpQLSdkfhnDMP2j5cI32B38DC4oACXej9W7pKj2keSwVDPtybvahw/viewform?usp=pp_url
A multi-fidelity method for uncertainty quantification in engineering problems
Lorenzo Tamellini (CNR-IMATI Pavia)
Computer simulations, which are nowadays a fundamental tool in every field of science and engineering, need to be fed with parameters such as physical coefficients, initial states, geometries, etc. This information is however often plagued by uncertainty: values might be e.g. known only up to measurement errors, or be intrinsically random quantities (such as winds or rainfalls). Uncertainty Quantification (UQ) is a research field devoted to dealing efficiently with uncertainty in computations. UQ techniques typically require running simulations for several (carefully chosen) values of the uncertain input parameters (modeled as random variables/fields), and computing statistics of the outputs of the simulations (mean, variance, higher order moments, pdf, failure probabilities), to provide decision-makers with quantitative information about the reliability of the predictions. Since each simulation run typically requires solving one or more Partial Differential Equations (PDE), which can be a very expensive operation, it is easy to see how these techniques can quickly become very computationally demanding. In recent years, multi-fidelity approaches have been devised to lessen the computational burden: these techniques explore the bulk of the variability of the outputs of the simulation by means of low-fidelity/low-cost solvers of the underlying PDEs, and then correct the results by running a limited number of high-fidelity/high-cost solvers. They also provide the user a so-called “surrogate-model” of the system response, that can be used to approximate the outputs of the system without actually running any further simulation. In this talk we illustrate a multi-fidelity method (the so-called multi-index stochastic collocation method) and its application to a couple of engineering problems. If time allows, we will also briefly touch the issue of coming upwith good probability distributions for the uncertain parameters, e.g. by Bayesian inversion techniques. References: 1) C. Piazzola, L. Tamellini, The Sparse Grids Matlab Kit – a Matlab Implementation of Sparse Grids for High-Dimensional Function Approximation and Uncertainty Quantification, ACM Transactions on Mathematical Software, 2023 2) C. Piazzola, L. Tamellini, R. Pellegrini, R. Broglia, A. Serani, and M. Diez. Comparing Multi-Index Stochastic Collocation and Multi-Fidelity Stochastic Radial Basis Functions for Forward Uncertainty Quantification of Ship Resistance. Engineering with Computers, 2022 3) M. Chiappetta, C. Piazzola, L. Tamellini, A. Reali, F. Auricchio, M. Carraturo Data-informed uncertainty quantification for laser-based powder bed fusion additive manufacturing arXiv:2311.03823
Referente: Florence Center for Data Science - Prof.ssa Anna Gottard
Ultimo aggiornamento 15 gennaio 2025.