bannerUnifi Home Page DiSIA
logo UniFI ridotto
DISIA Dipartimento di Statistica, Informatica, Applicazioni 'Giuseppe Parenti'
Dipartimento di Statistica, Informatica, Applicazioni 'Giuseppe Parenti'
stampa

Archivio seminari del DiSIA

Abstract


22/12/2022 - Room 327, Centro Didattico Morgagni
Viale Morgagni, 40, Firenze

Causal inference: past, present, future

Fabrizia Mealli (Università degli Studi di Firenze)

Christmas Lecture
Causal inference: past, present, future

Documenti: Xmas Lecture   

Torna alla lista dei seminari archiviati


21/12/2022 - The Seminar will be available also online: meet.google.com/bav-zvgc-yzx

Socio-demographic cues and willingness to talk about politics: an experimental approach

Moreno Mancosu (Università di Torino)

Recently, the debate around political discussions argued that people tend to use socio-political cues to indirectly identify the partisanship of their discussants (by relying on their lifestyle): when exposed to a lifestyle stereotype of a right-wing/left-wing person (such as a latte drinker/a pickup truck driver in the US), people tend to avoid them, as they represent an outgroup stereotype. An alternative approach (the social distance/homophily argument), states that people generally look for homophily - namely, (not necessarily political) interactions with people who are similar to them in terms of socio-demographic characteristics. The present paper aims at combining the homophily/political discussions arguments, by testing whether the sole socio-demographic differences between people lead to higher/lower propensities to talk about politics. In other words, we ask ourselves whether people are able to indirectly “guess” another individual’s current affairs views by just relying on their socio-demographic properties. To do so, we use a CAWI survey administered via Pollstar, an opt-in community managed by academics, and we design a vignette experiment. In the experiment, respondents (n~2,000) are requested to declare the likelihood of talking about current affairs with a person having specific characteristics. The hypothetical discussant presents randomized socio-demographic characteristics (age, gender, income, and educational level). The randomized characteristics are successively coupled with the bogus respondent's characteristics, to provide measures of social distance between the respondent and the hypothetical discussant. We believe that the results of the experiment will shed light on the relationship between homophily and political behavior.

Referente: Raffaele Guetto

Torna alla lista dei seminari archiviati


02/12/2022

Giulia Cereda: Comparing different methods for the rare type match problem
Cecilia Viscardi: Approximate Bayesian computation: methodological developments and novel applications
Alberto Cassese: Long story short: 11 years of (my) research summarized in 30 minutes

Welcome seminar:
Alberto Cassese, Giulia Cereda, Cecilia Viscardi

Giulia Cereda.
Title: Comparing different methods for the rare type match problem
Abstract:A classical problem of forensic statistics is that of evaluating a match between a DNA profile found on the crime scene and a suspect’s DNA profile, in the light of the two competing hypotheses (the crime stain has been left by the suspect or by another person). The evaluation is based on the calculation of the likelihood ratio, but the likelihood of the data under the competing hypotheses is unknown. The “rare type match problem” is the situation in which the matching DNA profile is not in the database of reference, hence it is difficult to have an idea of its frequency in the population. In the last years, I have proposed and analyzed different models and methods (frequentist, Bayesian, parametric and non-parametric) to evaluate the LR for the rare type match case. They are based on quite diverse assumptions and data reduction, and deserve a comparative framework to compare such contributions both theoretically, discussing their rationales, and empirically, by assessing their performances through some validation experiments and appropriate metrics. This is realized by tailoring to the rare type match problem the ECE (Empirical Cross Entropy) plots, a graphical tool based on information theory that allows to study the accuracy of each method according to their discrimination power and calibration.
Cecilia Viscardi.
Title: Approximate Bayesian computation: methodological developments and novel applications
Abstract: Approximate Bayesian computation (ABC) is a class of simulation-based methods for drawing Bayesian inference when the likelihood function is unavailable or computationally demanding to evaluate. ABC methods dispense with exact likelihood computation as they only require the availability of a simulator model — a computer program which takes parameter values as input, performs stochastic calculations, and returns simulated data. In the simplest form, ABC algorithms draw parameter proposals from the prior distribution, run the simulator with those values as inputs, and retain proposals such that the simulated data are sufficiently close to the observed data. Despite ABC algorithms having had a tremendous evolution in the last 20 years, most of them still suffer from shortcomings related to i) the waste of computational resources due to the typical rejection step; ii) the inefficient exploration of the parameter space; iii) the computational cost of the simulator. During this talk, I will outline some methodological developments motivated by the above mentioned problems, as well as possible applications in the civil engineering, epidemiological and forensic fields.
Alberto Cassese.
Title: Long story short: 11 years of (my) research summarized in 30 minutes
Abstract: In this welcome seminar I will show a general overview of the research projects I worked on (and I am still working on). In the first half, I will focus on my work in the field of Bayesian analysis, specifically on methods for the analysis of high dimensional data and Bayesian non-parametrics. In the second half I will focus on more recent work on studying two-way interaction by means of biclustering and optimization of research study designs in reliability and agreement studies.

Referente: Monia Lupparelli

Torna alla lista dei seminari archiviati


25/11/2022 - The Seminar will be available also online: https://datascience.unifi.it/index.php/event/seminar-of-the-d2-seminar-series-florence-center-for-data-science-4/

D2 Seminar Series
Monica Bianchini: A gentle introduction to Graph Neural Networks
Giulio Bottazzi: Persistence in firm growth: inference from conditional quantile transition matrices

Doppio seminario FDS:
Monica Bianchini (Department of Information engineering and mathematics of the University of Siena) & Giulio Bottazzi (Institute of Economics of the Sant’Anna School of Advanced Studies of Pisa)

Monica Bianchini:
This talk will introduce Graph Neural Networks, which are a powerful deep learning tool for processing graphs in their entirety. Indeed, considering graphs as a whole allows to take into account the essential sub-symbolic information contained in the relationships described by the arcs (as well as the symbolic information collected in the node labels), also enabling alternative learning frameworks based on information diffusion. Some real-world applications, in which graphs are the most natural way to represent data, will be presented, ranging from image processing to the prediction of drug side-effects.
Giulio Bottazzi:
We propose a new methodology to assess the degree of persistence in firm growth, based on Conditional Quantile Transition Probability Matrices (CQTPMs) and well-known indexes of intra-distributional mobility. Improving upon previous studies, the method allows for exact statistical inference about TPMs properties, at the same time controlling for spurious sources of persistence due to confounding factors such as firm size, and sector-, country- and time-effects. We apply our methodology to study manufacturing firms in the UK and four major European economies over the period 2010-2017. The findings reveal that, despite we reject the null of fully independent firm growth process, growth patterns display considerable turbulence and large bouncing effects. We also document that productivity, openness to trade, and business dynamism are the primary sources of firm growth persistence across sectors. Our approach is flexible and suitable to wide applicability in firm empirics, beyond firm growth studies, as a tool to examine persistence in other dimensions of firm performance.

Torna alla lista dei seminari archiviati


15/11/2022

Connectivity Problems on Temporal Graphs

Ana Shirley Ferreira da Silva (Universidade Federal do Ceará UFC, Brasil & visiting DISIA )

Abstract: A temporal graph is a graph that changes in time, meaning that, at each timestamp, only a subset of the edges is active. This structure models all sorts of real life situations, from social networks to public transportation, having been used also for contact tracing during the COVID pandemic. Despite its broad applicability, and despite being around for more than two decades, only recently this structure has received more attention from the community. In this talk, we will discuss how to bring some connectivity concepts to the temporal context, and we will learn about the state of the art of complexity results of the related problems. Additionally, we will see various possible adaptations of Menger’s Theorem, only a few of which also hold on temporal graphs.
Biosketch: Ana Silva is Associate Professor at the Mathematics Department of Universidade Federal do Ceará, Brazil, and is currently a Visiting Professor at the Universitá degli Studi di Firenze (Italy). She obtained her PhD degree in Mathematics and Computer Science by the Université de Grenoble (France) in November 2010 under the supervision of Frédéric Maffray. She was head of the Math Department at UFC from 2013 to 2015, and was a member of the Gender Committee of the Brazilian Mathematics Society from 2020 to 2021. In 2014, she received the L'Óreal/UNESCO/ABC Prize for Women in Science, and in 2021 was elected affiliated member of the ABC (Academia Brasileira de Ciências), a position that she will occupy until December 2025. Her work concerns mainly graph problems, in particular coloring problems and convexity problems, and lately she has been interested in Temporal Graphs.

Referente: Andrea Marino

Torna alla lista dei seminari archiviati


11/11/2022 - The seminar will be held both on-site and on line: https://datascience.unifi.it/index.php/event/seminar-of-the-d2-seminar-series-florence-center-for-data-science-3/

D2 Seminar Series
Bet: Detecting anomalies in geometric networks
Panzera: Density estimation for circular data observed with errors

Doppio seminario FDS:
Gianmarco Bet (DIMAI, UniFI) & Agnese Panzera (DISIA, UniFI)

Gianmarco Bet:
Recently there has been an increasing interest in the development of statistical techniques and algorithms that exploit the structure of large complex-network data to analyze networks more efficiently. For this talk, I will focus on detection problems. In this context, the goal is to detect the presence of some sort of anomaly in the network, and possibly even identify the nodes/edges responsible. Our work is inspired by the problem of detecting so-called botnets. Examples are fake user profiles in a social network or servers infected by a computer virus on the internet. Typically a botnet represents a potentially malicious anomaly in the network, and thus it is of great practical interest to detect its presence and, when detected, to identify the corresponding vertices. Accordingly, numerous empirical studies have analyzed botnet detection problems and techniques. However, theoretical models and algorithmic guarantees are missing so far. We introduce a simplified model for a botnet, and approach the detection problem from a statistical perspective. More precisely, under the null hypothesis we model the network as a sample from a geometric random graph, whereas under the alternative hypothesis there are a few botnet vertices that ignore the underlying geometry and simply connect to other vertices in an independent fashion. We present two statistical tests to detect the presence of these botnets, and we show that they are asymptotically powerful, i.e., they correctly distinguish the null and the alternative with probability tending to one as the number of vertices increases. We also propose a method to identify the botnet vertices. We will argue, using numerical simulations, that our tests perform well for finite networks, even when the underlying graph model is slightly perturbed. Our work is not limited in scope to botnet detection, and in fact is relevant whenever the nature of the anomaly to be detected is a change in the underlying connection criteria. Based on joint work with Kay Bogerd (TU/e), Rui Pires da Silva Castro (TU/e) and Remco van der Hofstad (TU/e).

Agnese Panzera:
Density estimation represents a core tool in statistics for both exploring data structures and as a starting task in more challenging problems. We consider nonparametric estimation of circular densities, which are periodic probability density functions having the unit circle as their support. Starting from the basic idea of kernel estimation of circular densities, we present some related methods for the case where data are observed with errors.

Torna alla lista dei seminari archiviati


10/11/2022

Finding the needle by modelling the haystack: pulmonary embolism in an emergency patient with cardiorespiratory manifestations

Davide Luciani (IRCCS Istituto di Ricerche Farmacologiche Mario Negri, Milano)

Abstract. A Bayesian Network (BN) was developed to perform a diagnosis covering 129 acute cardiopulmonary disorders in patients admitted to emergency departments, given an observable domain of 235 clinical, laboratory and imaging manifestations. Once the network was given a causal structure, the BN inferences could be deemed aligned to a medical reasoning framed in hundreds of pathophysiological and pathogenic related events. The structure was anticipated by experts in pneumology, cardiology and coagulations disorders, while 1,417 model parameters were estimated, via Markov chain Monte Carlo, from data of 282 records collected at the main hospital of Bergamo. The BN structure was refined until precision of diagnostic inferences improved, as long as medical literature supported any enforced structural change. Diagnostic performance was assessed by looking at the precision of predictions concerning six diagnoses, given testing findings collected from 284 records in six hospitals not including the hospital of Bergamo. Thanks to its large-size domain, the model addresses rare disorders even in patients complaining of generic symptoms. However, the size and the complexity of the model involved serious methodological challenges: to what extent causal knowledge was useful to exploit data as noisy but rich of medical information as clinical records? Was the BN causal structure faithful to the process underlying the generation of sampled data? The main lessons learned from answering these questions are introduced by taking an interdisciplinary perspective, at the intersection of knowledge engineering, evidence-based medicine, and Bayesian statistics.
Biosketch. Davide Luciani (1966) received his medical degree from the University of Bologna in 1995. After a few years of medical practice, he devoted his medical background to research and formalization of clinical judgement and decision making in particular. In these regards, he took advantage of the collaboration with several academic experts in Statistics and Computer Science, as well as of the supervision of Phil Dawid at the University College of London, and Finn V. Jensen at the Department of Computer Science in Aalborg. Since 2005, he is responsible for the Unit of Clinical Knowledge Engineering at the Mario Negri Institute in Milan, where he worked with real medical data to develop probabilistic expert systems based on graphical models, Bayesian Markov chain Monte Carlo estimation techniques and knowledge acquisition methods.
Indirizzo webinar. Password: dL101122

Referente: Alessandro Magrini

Torna alla lista dei seminari archiviati


03/11/2022

Sera: Extended two-stage designs for the evaluation of the short-term health effects of environmental hazards.
Severi: New approaches to the study of individual susceptibility, lifestyle and the environment and their role in human health.

Welcome seminar: Francesco Sera & Gianluca Severi

Francesco Sera.
Title: Extended two-stage designs for the evaluation of the short-term health effects of environmental hazards.
Abstract: Extended two-stage designs for the evaluation of the short-term health effects of environmental hazards. The two-stage design has become a standard tool in environmental epidemiology to model short-term effects with multi-location data giving valuable information for preventive public health strategies. In the seminar, I illustrate multiple design extensions of the classical two-stage method. These are based on improvements of the standard two-stage meta-analytic models along the lines of linear mixed-effects models, by allowing location-specific estimates to be pooled through flexible fixed and random-effects structures. This permits the analysis of associations characterised by combinations of multivariate outcomes, hierarchical geographical structures, repeated measures, and/or longitudinal settings. The design extensions will be illustrated in examples using data collected by the Multi-Country Multi-City research network.
Biosketch: Francesco Sera is a Research Fellow at the University of Florence. Francesco is a statistician and epidemiologist and he has worked on several epidemiological projects with more than 180 publications. His current research interests focus on short-term health effects of environmental exposures such as temperature and air pollution, and related methodological aspects, such as time series models, and pooling results from multi-centre studies. Working with colleagues of the Multi- Country Multi-City MCC Collaborative Research Network contributed to increasing the evidence on environmental exposure health-impact with papers published in high-impact journals.

Gianluca Severi.
Title: New approaches to the study of individual susceptibility, lifestyle and the environment and their role in human health.
Abstract: The term exposome has been coined to describe the multiple, often interacting dimensions of our behaviours as well as the environmental and socio-economic context in which we live. The concept of human exposome may be helpful to build more realistic models to answer key questions such as how diet, physical activity and environmental exposures affect our health but the implementation of the "exposome approach" poses several challenges. In this seminar I will discuss some of these challenges using examples of research I conduct with my team on the human exposome and its influence on health and disease, focusing in particular on chronic diseases such as cancer. In particular, I will draw examples from studies nested within prospective cohorts such as the Melbourne Collaborative Cohort Study, the familial E3N-E4N cohort, EPIC and Constances in which we use concepts such as exposome, biological fingerprint and molecular signature to better characterize risk or protective behaviours, quantify environmental exposures, explore pathological mechanisms and improve risk prediction.
Biosketch: I am an Associate Professor of biostatistics and epidemiology at the University of Florence and a Research Director at Inserm where I lead the "Exposome and Heredity" group (CESP U1018). After an initial career as biostatistician at the European Institute of Oncology in Milan, I completed a PhD in cancer studies at the University of Birmingham and pursued a career as a molecular epidemiologist working mainly on cancer. After almost 10 years in Melbourne, Australia as Deputy Director of the Cancer Epidemiology Centre of the Cancer Council Victoria, in 2013 I moved back to Europe to take up the role of Director of the Italian Institute for Genomic Medicine in Turin (aka HuGeF) and to further its development before moving to my current research and teaching positions. My main research interest is the use of innovative tools to study the exposome and its related biological fingerprints (e.g. epigenetic marks) and to identify the key physiological systems and health outcomes affected by the exposome.

Referente: Andrea Marino

Torna alla lista dei seminari archiviati


20/10/2022

Hierarchical normalized finite point process: predictive structure and clustering

Raffaele Argiento (Università degli Studi di Bergamo)

Almost surely discrete random probability measures have received close attention in the Bayesian nonparametric community. They have been used to model populations of individuals or latent parameters (in the mixture model setting) composed of unfixed species with unknown proportions. In this framework, data are usually assumed to be exchangeable. However, the latter assumption is not appropriate when data are divided in multiple groups which may share the same species. If so, partially exchangeability accommodates the dependence across populations.

Referente: Francesco Claudio Stingo

Torna alla lista dei seminari archiviati


14/10/2022 - The seminar will be held in Aula 205 (ex 32) (DISIA – Viale Morgagni 59)
The Seminar will be available also online.
https://datascience.unifi.it/index.php/event/seminar-of-the-d2-seminar-series-florence-center-for-data-science/

D2 Seminar Series
Durastanti: Spherical Poisson Waves
Viscardi: Likelihood-free Transport Monte Carlo - Joint with Dr Dennis Prangle (University of Bristol)

Doppio seminario FDS:
Claudio Durastanti (Department of Basic and Applied Sciences for Engineering (SBAI) of Sapienza University) & Cecilia Viscardi (Department of Statistics, Computer Science, Applications “G. Parenti” from the University of Florence)

Claudio Durastanti:
During this talk, we will discuss a model of Poisson random waves defined in the sphere, to study Quantitative Central Limit Theorems when both the rate of the Poisson process (that is, the expected number of the observations sampled at a fixed time) and the energy (i.e., frequency) of the waves (eigenfunctions) diverge to infinity. We consider finite-dimensional distributions, harmonic coefficients and convergence in law in functional spaces, and we investigate carefully the interplay between the rates of divergence of eigenvalues and Poisson governing measures.

Cecilia Viscardi:
Approximate Bayesian computation (ABC) is a class of methods for drawing inferences when the likelihood function is unavailable or computationally demanding to evaluate. Importance sampling and other algorithms using sequential importance sampling steps are state-of-art methods in ABC. Most of them get samples from tempered approximate posterior distributions defined by considering a decreasing sequence of ABC tolerance thresholds. Their efficiency is sensitive to the choice of an adequate proposal distribution and/or forward kernel function. We present a novel ABC method addressing this problem by combining importance sampling steps and optimization procedures. We resort to Normalising Flows (NFs) to optimize proposal distributions over a family of densities to transport particles drawn at each step towards the next tempered target. Therefore, the combination of sampling and optimization steps allows tempered distributions to get efficiently closer to the target posterior. Finally, we show the performance of our method on examples that are a common benchmark for likelihood-free inference.

Torna alla lista dei seminari archiviati


07/10/2022 - https://datascience.unifi.it/index.php/event/seminar-of-the-special-guest-seminar-series-kosuke-imai/

Statistical Inference for Heterogeneous Treatment Effects and Individualized Treatment Rules Discovered by Generic Machine Learning in Randomized Experiments

Kosuke Imai (Harvard University)

Researchers are increasingly turning to machine learning (ML) algorithms to estimate heterogeneous treatment effects (HET) and develop individualized treatment rules (ITR) using randomized experiments. Despite their promise, ML algorithms may fail to accurately ascertain HET or produce efficacious ITR under practical settings with many covariates and small sample size. In addition, the quantification of estimation uncertainty remains a challenge. We develop a general approach to statistical inference for estimating HET and evaluating ITR discovered by a generic ML algorithm. We utilize Neyman's repeated sampling framework, which is solely based on the randomization of treatment assignment and random sampling of units.  Unlike some of the existing methods, the proposed methodology does not require modeling assumptions, asymptotic approximation, or resampling methods. We extend our analytical framework to a common setting, in which the same experimental data is used to both train ML algorithms and evaluate HET/ITR. In this case, our statistical inference incorporates the additional uncertainty due to random splits of data used for cross-fitting.

Torna alla lista dei seminari archiviati


03/10/2022

Social background inequality in academic track enrolment: How the role of individual competencies, teachers’ assessments and family decisions varies across Italian provinces

Moris Triventi e Emanuele Fedeli (Università degli Studi di Trento)

We aim to understand the main sources of social background inequalities in academic track enrolment in Italy and whether their relative importance varies across provinces. Italy is a well-suited case study since it is characterized by low educational attainment rates, high levels of educational inequalities and strong geographical divides in school outcomes. We distinguish between three main general channels by which social inequalities in educational transitions are reproduced, the so-called ‘primary’, ‘secondary’, and ‘tertiary effects’ (Boudon 1974; Esser 2016). They refer respectively to the role of individual competencies, teachers’ assessments and family decisions. We compiled a student population panel dataset from the Invalsi-SNV, following 1,344 million students from five cohorts (2013 – 2017) enrolled in the 8th grade of lower secondary school (untracked) to the 10th grade of upper secondary education (tracked). We use binomial logistic regression models to measure social background inequality and the KHB method to decompose it into the three channels (Karlson et al. 2012). We find that families’ choices, irrespective of students’ abilities and teachers’ evaluations, are the prevalent source of reproduction of inequalities in academic track enrolment, followed by tertiary and then primary effects. Interestingly, we find more geographical heterogeneity in the channels by which educational inequalities are reproduced than in the total inequality by social background, a novel finding in the literature. With this work we complement the cross-national literature and provide new evidence that heterogeneity across contexts does not only refer to the level of social disparities but also to how inequalities are (re)produced.

Referente: Valentina Tocchioni

Torna alla lista dei seminari archiviati


29/09/2022

L'indagine ITA.LI: Istruzioni per l'uso. Dalla raccolta all'analisi dei dati.

Mario Lucchini e Carlotta Piazzoni (Università degli Studi di Milano-Bicocca)

Il presente seminario è dedicato alla presentazione dei principi e delle procedure riguardanti il disegno di ricerca e il disegno di campionamento di Italian lives (ITA.LI), un’indagine longitudinale panel che si propone di raccogliere una serie di informazioni di base sulle condizioni attuali dei soggetti e delle famiglie italiane e di studiare il mutamento sociale in riferimento alla mobilità residenziale, al percorso scolastico, alla carriera lavorativa, alle forme di convivenza familiare e alla nuzialità. Tali fenomeni possono essere studiati a partire dai dati di corso di vita, ossia ricostruendo le traiettorie a livello individuale nella loro interezza. Le informazioni raccolte lungo l’asse del tempo, nella forma di episodi (spells) e di occasioni ripetute (repeated occasions), consentono il passaggio da un’immagine statica a una rappresentazione dinamica dei fenomeni, nonché l’applicazione di strategie analitiche a elevato potere investigativo. La ricchezza dei dati panel unitamente alle tecniche impiegate per l’analisi dei corsi di vita offrono la possibilità di descrivere le traiettorie individuali di mutamento entro specifici contesti spaziali e temporali; di approssimarsi a stime di effetti causali controllando per l’eterogeneità non osservata; di valutare il tempo che occorre prima che si verifichi un evento e i fattori che ne sono responsabili; di discernere l’effetto coorte dall’effetto età; di verificare le interdipendenze tra eventi che appartengono a domini differenti; di studiare in profondità le interazioni e le reciproche influenze tra i corsi di vita dei membri di uno stesso nucleo familiare.

Referente: Raffaele Guetto

Torna alla lista dei seminari archiviati


29/06/2022 - Dettagli: https://datascience.unifi.it/index.php/event/seminar-of-the-special-guest-seminar-series-iavor-bojinov/

Design & Analysis of Dynamic Panel Experiments

Iavor Bojinov (Harvard Business School)

Over the past few years, firms have begun to transition away from the static single intervention A/B testing into dynamic experiments, where customers’ treatments can change over time within the same experiment. This talk will present the design-based foundations for analyzing such dynamic (or sequential experiments), starting with the extreme case of running an experiment on a single unit—what’s known as time-series experiments. Next, motivated by my work to understand if humans or algorithms are better at executing large financial trades, I will lay out a framework for designing and analyzing switchback experiments, a special case of time-series experiments. Then, I will explain how to extend this framework to multiple units and what happens when these units are subject to population interference (the setting where one unit’s treatment can impact another’s outcomes). Finally, if time allows, I will conclude with a brief discussion of an empirical study that leveraged over 1,000 experiments conducted at LinkedIn to quantify the additional benefits of adopting dynamic experimentation.

Torna alla lista dei seminari archiviati


20/06/2022 - Dettagli: https://datascience.unifi.it/index.php/event/disia-seminar-exploring-the-educational-paradox-on-preterm-births-in-colombia/

Exploring the Educational Paradox on Preterm Births in Colombia

Harold Mera León (Universitat Pompeu Fabra, Barcelona)

Why could mothers with higher education be more prone to preterm births? Preterm birth (PTB) is widely recognized as a primary causal connection to birth and early childhood losses. We build on Bronfenbrenner's bioecological approach and assess the effect of a mother's education level on PTB odds. Combining Bronfenbrenner's framework with empirical population observations, we analyze data from the National Health Statistics Surveys (NHSS), the National Centre of Historical Memory (NCHM), the 2012 Poverty Mission, and the Information System of Victims Unit. We fit a logistic model to explore the paradoxical relation between mothers with higher education and the odds of PTB (Mera, 2021) by estimating the moderation effect of higher education over regional violence. We argue that during 2002, pregnant women who could complete university level before labor were more prone to give PTB (under 38 weeks of gestational time) due to the high levels of unemployment and violence. However, when considering the interaction between regional violence and a mother's education level, the odds of PTB increase when mothers cannot reach university level, and the effect of violence over the dyad reduces for mothers who could complete university. Hence, even though a pregnant woman with university-level living in regions with high levels of violence and unemployment is more likely to experience stress, the education level operates as a shielding factor, moderating the harmless effect of violence only in specific cases and regions where unemployment is not that high.

Referente: Leonardo Grilli

Torna alla lista dei seminari archiviati


17/06/2022 - Dettagli: https://datascience.unifi.it/index.php/event/seminar-of-the-special-guest-seminar-series-florence-center-for-data-science/

Hypothesis tests with a repeatedly singular information matrix

Dante Amengual (CEMFI, Madrid)

We study score-type tests in likelihood contexts in which the nullity of the information matrix under the null is larger than one, thereby generalizing earlier results in the literature. Examples include multivariate skew-normal distributions, Hermite expansions of Gaussian copulas, purely non-linear predictive regressions, multiplicative seasonal time series models, and multivariate regression models with selectivity. Our proposal, which involves higher-order derivatives, is asymptotically equivalent to the likelihood ratio but only requires estimation under the null. We conduct extensive Monte Carlo exercises that study the finite sample size and power properties of our proposal and compare it to alternative approaches.

Torna alla lista dei seminari archiviati


27/05/2022 - Per partecipare: https://datascience.unifi.it/index.php/event/fds-seminar-pedro-j-gutierrez-diez/

Analysis of the epigenetic changes in the breast after pregnancy

Pedro J. Gutiérrez Diez (Department of Economic Theory / Mathematical Research Institute (IMUVa) - University of Valladolid)

Full-term pregnancy at an early age (FFTP) confers long-term protection against breast cancer, being a guide for research on cancer prevention. The correct design of strategies based on this protective effect of pregnancy requires the characterization of its genomic consequences. In this respect, published literature suggests that pregnancy causes a specific transcriptomic profile controlling chromatin remodeling after pregnancy, therefore implying multiple and complex changes in gene expressions. In this research we analyze from several perspectives the modifications in the gene expression after FFTP, concluding that, independently of the changes in the gene expression at the individual level usually considered, there are significant changes in gene-gene interactions and gene cluster behaviors.

Torna alla lista dei seminari archiviati


13/05/2022 - The seminar will be on-line: https://datascience.unifi.it/index.php/event/17th-seminar-of-the-d2-seminar-series-florence-center-for-data-science/

D2 Seminar Series
Papadogeorgou: Unmeasured spatial confounding
Antonelli: Heterogeneous causal effects of neighborhood policing in New York City with staggered adoption of the policy

Doppio seminario FDS:
Georgia Papadogeorgou (Department of Statistics, University of Florida) & Joseph Antonelli (Department of Statistics, University of Florida)

Georgia Papadogeorgou:
Spatial confounding has different interpretations in the spatial and causal inference literature. I will begin this talk by clarifying these two interpretations. Then, seeing spatial con-founding through the causal inference lens, I discuss two approaches to account for unmeasured variables that are spatially structured when we are interested in estimating causal effects. The first approach is based on the propensity score. We introduce the distance adjusted propensity scores (DAPS) that combine spatial distance and propensity score difference of treated and control units in a single quantity. Treated units are then matched to control units if their corresponding DAPS is low. We can show that this approach is consistent, and we propose a way to choose how much matching weight should be given to unmeasured spatial variables. In the second approach, we aim to bridge the spatial and causal inference literature by estimating causal effects in the presence of unmeasured spatial variables using outcome modeling tools that are popular in spatial statistics. Motivated by the bias term of commonly-used estimators in spatial statistics, we propose an affine estimator that addresses this deficiency. I will discuss that estimation of causal parameters in the presence of unmeasured spatial confounding can only be achieved under an untestable set of assumptions. We provide one such set of assumptions that describe how the exposure and outcome of interest relate to the unmeasured variables.

Joseph Antonelli:
In New York City, neighborhood policing was adopted at the police precinct level over the years 2015-2018, and it is of interest to both (1) evaluate the impact of the policy, and (2) understand what types of communities are most impacted by the policy, raising questions of heterogeneous treatment effects. We develop novel statistical approaches that are robust to unmeasured confounding bias to study the causal effect of policies implemented at the community level. We find that neighborhood policing decreases discretionary arrests in certain areas of the city, but has little effect on crime or racial disparities in arrest rates.

Torna alla lista dei seminari archiviati


22/04/2022 - The Seminar will be on-line: https://datascience.unifi.it/index.php/event/16th-seminar-of-the-d2-seminar-series-florence-center-for-data-science/

D2 Seminar Series
Barucci: Exploring Egyptian Hieroglyphs with Convolutional Neural Networks
Mattei: Selecting Subpopulations for Causal Inference in Regression Discontinuity Designs (Joint work with Laura Forastiere and Fabrizia Mealli)

Doppio seminario FDS:
Andrea Barucci (IFAC-CNR Institute of Applied Physics) & Alessandra Mattei (DiSIA, UniFI)

Andrea Barucci:
Deep Learning is expanding in every domain of knowledge, allowing specialists to build tools to support their work in fields apparently unrelated to information technology. In this study, we exploit this opportunity by focusing on Egyptian hieroglyphic texts and inscriptions. We investigate the ability of several convolutional neural networks (CNNs) to segment glyphs and classify images of ancient Egyptian hieroglyphs derived from various image datasets. Three well-known CNN architectures (ResNet-50, Inception-v3, and Xception) were considered for classification and trained on the supplied pictures. Furthermore, we constructed a specifically devoted CNN, termed Glyphnet, by changing the architecture of one of the prior networks and customizing its complexity to our classification goal. The suggested Glyphnet outperformed the others in terms of performance, ease of training, and computational savings, as judged by established measures. The hieroglyphs segmentation was faced in parallel, using a deep neural network architecture known as Mask-RCNN. This work shows how the ancient Egyptian hieroglyphs identification task can be supported by the Deep Learning paradigm, laying the foundation for developing novel information tools for automatic documents recognition, classification and, most importantly, the language translation task.

Alessandra Mattei:
The Brazil Bolsa Famı́lia program is a conditional cash transfer program aimed to reduce short-term poverty by direct cash transfers and to fight long-term poverty by increasing human capital among poor Brazilian people. Eligibility for Bolsa Famı́lia benefits depends on a type of cutoff formula, which classifies the Bolsa Famı́lia study as a regression discontinuity (RD) design. Extracting causal information from RD studies is challenging. Following Li, Mattei and Mealli (2015) and Branson and Mealli (2019), we formally describe the Bolsa Famı́lia RD design as a local randomized experiment within the potential outcome approach. Under this framework, causal inference concerns Brazilian families belonging to some subpopulation where a local overlap assumption, a local SUTVA and a local ignorability assumption hold. We first discuss the potential advantages of this framework, in settings were assumptions are judged plausible, over local regression methods based on continuity assumptions, namely a) it generates treatment effects for subpopulation members rather than local average treatment effects for those at the cutoff only, making the results more easily generalizable; b) it avoids modeling assumptions on the relationship between the running variable and the outcome; c) it allows the treatment assignment mechanism to be random rather than deterministic as in typical RD analyses, so that finite population inference can be used; d) it allows to easily account for discrete running variables. A critical issue of the approach is how to choose subpopulations for which we can draw valid causal inference. We propose to use a Bayesian model-based finite mixture approach to clustering to classify observations into subpopulations where the RD assumptions hold and do not hold on the basis of the observed data. This approach has important advantages: a) it allows to account for the uncertainty about the subpopulation membership, which is typically neglected; b) it does not impose any constraint on the shape of the subpopulation (bandwidth); c) it can be used as a design phase of any analysis; d) it is scalable to high-dimensional settings; e) and it allows to account for rare outcomes. We apply the framework to assess causal effects of the Borsa Famı́lia program on leprosy incidence in 2009, which is a rare outcome, using information on a large sample of Brazilian families who registered in the Single Registry in 2007-2008 for the first time.

Torna alla lista dei seminari archiviati


08/04/2022 - The Seminar will be on-line: https://datascience.unifi.it/index.php/event/15th-seminar-of-the-d2-seminar-series-florence-center-for-data-science/

D2 Seminar Series
Schoen: Clustering for Optimization, Optimization for Clustering
Panunzi e Gregori: Towards action concepts identification through unsupervised and semi-supervised clustering on a multimodal cross-linguistic ontology

Doppio seminario FDS:
Fabio Schoen (Department of Information Engineering, UniFI) & Alessandro Panunzi e Lorenzo Gregori (Department of Humanities, UniFI)

Fabio Schoen:
In this tak I will present two fundamental problems in data science: the global optimization problem (i.e., how to find globally optimal solutions to a mathematical programming problem) and the problem of clustering multi-dimensional data (i.e. how to efficiently group data according to ismilarity). The aim of this talk is to present the connections between these fundamental problems and to show how each of them can be used to improve the performance of the other one. For Global Optimization problems, the idea of clustering dates back to the 80’s, when researchers used clustering techniques to recognize the regions of attraction of local optima, in the search for the global one. Due to reasons that I will be explaining during the seminar, those approaches were abandoned, however we have shown that, provided some modifications are introduced, they might prove very interesting for modern global optimization. On the other side, clustering high dimensional data is clearly an optimization problem, as we would like to group points so that a measure of similarity within groups is maximized. Recent computational approaches have been developed in which classical clustering techniques, like, e.g., K-means, are used as local optimization tools which, when embedded in a higher level global optimization strategy, can produce significantly better clusters.
This talk is partly based on research done in collaboration with dr. Luca Tigli, PhD, and dr. Pierluigi Mansueto.

Alessandro Panunzi e Lorenzo Gregori:
This work presents the steps performed on IMAGACT ontology of action to identify cognitively consistent action concepts through machine learning methods. IMAGACT contains a set of 1,010 actions, represented by video scenes, and enriched with linguistic data in 14 languages. Each scene is linked to the full set of verbs that can be used to refer the depicted action, in every language. Starting from these data, an automatic clustering of scenes has been performed, using the linked lexical items as a feature set, following the idea that similar actions can be referred by a similar group of verbs. In order to obtain an evaluation of the clusters, a wide set of surveys have been set up, and action similarity judgements from human raters have been collected. These data have been analyzed together with automatic clustering metrics to evaluate the clustering and to tune the algorithm. The presentation will also focus on similarity evaluation issues emerging from a task that involves human perception and cognitive processing.

Torna alla lista dei seminari archiviati


08/04/2022 - In presenza previa prenotazione su https://tinyurl.com/mrx654pn oppure online al link https://tinyurl.com/yc8a78zm

A new programming interface for Gaussian process regression

Giacomo Petrillo (Dipartimento di Statistica, Informatica, Applicazioni, Università degli Studi di Firenze)

A Gaussian process is a multivariate Normal distribution over a space of functions. Gaussian processes are commonly used as a prior in a Bayesian setting to infer an unknown function without specifying a finitely parameterized model. (In non-Bayesian contexts, this is known as kriging.) This technique is very flexible, but at the same time allows to provide strong prior information, when available, which would be difficult to encode in a model, like the degree of smoothness of the function or its periodicity. From the point of view of non-statisticians or applied statisticians, Gaussian processes are used through a pre-written program, much like most statistical methods. I will present a Python module designed for the task which introduces a new kind of interface to define the structure of the problem and manipulate the information, focused on maintaining a high degree of flexibility while keeping the user code as short and readable as possible. I will show how the program improves on existing implementations, then I will continue with some ideas for its future development, trying to fill in what is missing in other programs.

Documenti: Locandina   

Referente: Fabio Corradi

Torna alla lista dei seminari archiviati


06/04/2022 - In presenza previa prenotazione su https://tinyurl.com/mrx654pn oppure online al link https://tinyurl.com/5f7tuh5u

Blockchain: what it is and why it matters

Laura Ricci & Damiano di Francesco Maesa (Dipartimento di Informatica, Università di Pisa)

A blockchain protocol is employed to implement a tamper-free distributed ledger that stores transactions created by the nodes of a P2P network and agreed upon through a distributed consensus algorithm, avoiding the need for a central authority. Blockchain technology has a great potential to radically change our socio-economical systems by guaranteeing secure transactions between untrusted entities, reducing their cost, and simplifying many processes. Such technology is going to be exploited in many areas like IoT, social networking, health care, electronic voting and so on. This talk will introduce the basic principles of this new, disruptive technology, highlighting a set of "killer applications". We will also show the innovative potential for research in such a field, showing results on the study of blockchain transaction graphs to characterize users' behaviors.

Documenti: Locandina   

Slides: Introduzione   Seminario   

Referente: Andrea Marino

Torna alla lista dei seminari archiviati


25/03/2022 - The Seminar will be on-line: https://datascience.unifi.it/index.php/event/14th-seminar-of-the-d2-seminar-series-florence-center-for-data-science/

D2 Seminar Series
Liseo: ABCC: Approximate Bayesian Conditional Copulae (with Clara Grazian and Luciana Dalla Valle)
De Vito: Understanding Neural Networks with Reproducing Kernel Banach Spaces

Doppio seminario FDS:
Brunero Liseo (Department of Methods and Models for Economics, Territory, and Finance of the Sapienza University) & Ernesto De Vito (Department of Mathematics of the University of Genova)

Brunero Liseo:
Copula models are flexible tools to represent complex structures of dependence for multivariate random variables. According to Sklar’s theorem (Sklar, 1959), any d-dimensional absolutely continuous density can be uniquely represented as the product of the marginal distributions and a copula function that captures the dependence structure among the vector components. In real data applications, the interest of the analyses often lies on specific functionals of the dependence, which quantify aspects of it in a few numerical values. A broad literature exists on such functionals, however, extensions to include covariates are still limited. This is mainly due to the lack of unbiased estimators of the copula function, especially when one does not have enough information to select the copula model. Recent advances in computational methodologies and algorithms have allowed inference in the presence of complicated likelihood functions, especially in the Bayesian approach, whose methods, despite being computationally intensive, allow us to better evaluate the uncertainty of the estimates. In this work, we present several Bayesian methods to approximate the posterior distribution of functionals of the dependence, using nonparametric models which avoid the selection of the copula function. These methods are compared in simulation studies and in two realistic applications, from civil engineering and astrophysics.

Ernesto De Vito:
Characterizing the function spaces corresponding to neural networks can provide a way to understand their properties. The talk is devoted to showing how the theory of reproducing kernel Banach spaces can be used to characterize the function spaces corresponding to neural networks. In particular, I will show a representer theorem for a class of reproducing kernel Banach spaces, which includes one hidden layer neural network of possibly infinite width. Furthermore, I will prove that, for a suitable class of ReLU activation functions, the norm in the corresponding reproducing kernel Banach space can be characterized in terms of the inverse Radon transform of a bounded real measure. The talk is based on on joint work with F. Bartolucci, L. Rosasco and S. Vigogna.

Torna alla lista dei seminari archiviati


11/03/2022 - The Seminar will be on-line: https://datascience.unifi.it/index.php/event/13th-seminar-of-the-d2-seminar-series-florence-center-for-data-science/

D2 Seminar Series
Paths and flows for centrality measures in networks

Daniela Bubboloni (Department of Mathematics and Computer Science “Ulisse Dini”, University of Florence)

Consider the number of paths that must pass through a subset X of vertices of a capacitated network N in a maximum sequence of arc-disjoint paths connecting two vertices y and z. Consider then the difference between the maximum flow value from y to z in N and the maximum flow value from y to z in the network obtained by N by setting to zero the capacities of all the arcs incident to X. When X is a singleton, those quantities are involved in defining and computing the flow betweenness centrality and are commonly identified without any rigorous proof justifying the identification. That surprising gap in the literature is the starting point of our research. On the basis of a deep analysis of the interplay between paths and flows, we prove that, when X is a singleton, those quantities coincide. On the other hand, when X has at least two elements, those quantities may be different from each other. By means of the considered quantities, two conceptually different group centrality measures, respectively based on paths and flows, can be naturally defined. Such group centrality measures both extend the flow betweenness centrality to groups of vertices and satisfy a desirable form of monotonicity.

Torna alla lista dei seminari archiviati


25/02/2022 - The Seminar will be on-line: https://datascience.unifi.it/index.php/event/12th-seminar-of-the-d2-seminar-series-florence-center-for-data-science/

D2 Seminar Series
(MP) Making a housing market agent-based model learnable
(FM) Combining counterfactual outcomes and ARIMA models for policy evaluation

Doppio seminario FDS:
Marco Pangallo (Sant’Anna School of Advanced Studies) & Fiammetta Menchetti (DiSIA, UniFI)

Marco Pangallo:
Agent-Based Models (ABMs) are used in several fields to study the evolution of complex systems based on micro-level assumptions. Often, some of their micro-level variables cannot be observed in empirical data. These latent variables make it difficult to initialize an ABM in order to use it to track and forecast empirical time series. In this paper, we propose a protocol to learn the latent variables of an ABM. We show how a complex ABM can be reduced to a probabilistic model, characterized by a computationally tractable likelihood. This reduction can be abstracted into two general design principles: balance of stochasticity and data availability, and replacement of unobservable discrete choices with differentiable approximations. We showcase our protocol by applying it to an ABM of the housing market, in which agents with different incomes bid higher prices to live in high-income neighborhoods. We show that the obtained model preserves the general behavior of the ABM, and at the same time it allows the estimation of latent variables through the optimization of its likelihood. In synthetic experiments, we show that we can learn the latent variables with good accuracy, and that our estimates make out-of-sample forecasting more precise compared to alternative benchmarks. Our protocol can be seen as an alternative to black-box data assimilation methods, forcing the modeler to lay bare the assumptions of the model, think about the inferential process, and identify potential identification problems.

Fiammetta Menchetti:
The Rubin Causal Model (RCM) is a framework that allows to define the causal effect of an intervention as a contrast of potential outcomes.In recent years, several methods have been developed under the RCM to estimate causal effects in time series settings. None of these makes use of ARIMA models, which are instead very common in the econometrics literature. We propose a novel approach, Causal-ARIMA (C-ARIMA), to define and estimate the causal effect of an intervention in observational time series settings under the RCM. We first formalize the assumptions enabling the definition, the estimation and the attribution of the effect to the intervention. In the empirical application, we use C-ARIMA to assess the causal effect of a permanent price reduction on supermarket sales. The Causal-ARIMA R package provides an implementation of our proposed approach.

Torna alla lista dei seminari archiviati


11/02/2022 - The Seminar will be on-line: https://datascience.unifi.it/index.php/event/11th-seminar-of-the-d2-seminar-series-florence-center-for-data-science/

D2 Seminar Series
(FC) Learning the two parameters of the Poisson-Dirichlet distribution with a forensic application
(MB) Combining and comparing regional epidemic dynamics in Italy: Bayesian meta-analysis of compartmental models and model assessment via Global Sensitivity Analysis

Doppio seminario FDS:
Fabio Corradi (DiSIA, UniFI) & Michela Baccini (DiSIA, UniFI)

Fabio Corradi:
This contribution is motivated by the rare type match problem, a relevant forensic issue. There, difficulties arise to evaluate the likelihood ratio comparing the defense and the prosecution hypotheses since the specific matching characteristic from the suspect and the crime scene is not in the reference database. A recently proposed solution approximates the likelihood ratio by plugging in the parameters MLE of a Poisson Dirichlet distribution, a Bayesian nonparametric prior modeling probability mass function showing a power-law behavior in the infinite dimensional space. We instead consider how to learn the parameters of a Posson-Dirichlet and we propose two sampling schemes: Monte Carlo Markov Chain and Approximate Bayesian Computation. We demonstrate that the previously employed plug-in solution produces a systematic bias that Bayesian inference avoids entirely. Finally, we evaluate the method using a database of Y-chromosome haplotypes.

Michela Baccini:
During autumn 2020, Italy faced a second important SARS-CoV-2 epidemic wave. We explored the time pattern of the instantaneous reproductive number R0(t), and estimated the prevalence of infections by region from August to December calibrating SIRD models on COVID19-related deaths, fixing at values from literature Infection Fatality Rate (IFR) and infection duration. A Global Sensitivity Analysis (GSA) was performed on the regional SIRD models. Then, we used Bayesian meta-analysis and meta-regression to combine and compare the regional results and investigate their heterogeneity. The meta-analytic R0(t) curves were similar in the Northern and Central regions, while a less peaked curve was estimated for the South. The maximum R0(t) ranged from 2.61 (North) to 2.15 (South) with an increase following school reopening and a decline at the end of October. Average temperature, urbanization, characteristics of family medicine and health care system, economic dynamism, and use of public transport could partly explain the regional heterogeneity. The GSA indicated the robustness of the regional R0(t) curves to different assumptions on IFR. The infectious period turned out to have a key role in determining the model results, but without compromising between-region comparisons.

Torna alla lista dei seminari archiviati


28/01/2022

D2 Seminar Series
(LS) Predicting Multiple Future Trajectories for Safe Self-Driving Cars
(FC) The use of neural networks for the resolution of Partial Differential Equations

Doppio seminario FDS:
Lorenzo Seidenari (Dept. Information Engineering, UniFI) & Francesco Calabrò (Dept. Mathematics and Applications “Renato Caccioppoli”, University of Naples “Federico II”)

Lorenzo Seidenari
Autonomous navigating agents are becoming a reality. Pedestrians and drivers are expected to safely navigate complex urban environments along with several non-cooperating agents. Autonomous vehicles will soon replicate this capability. Agents must learn a representation of the world and must make decisions ensuring safety for themselves and others. Apart from sensing objects, knowing, and abiding traffic regulations a driving agent must plan a safe path. This requires predicting motion patterns of observed agents for a far enough future. Moreover, with the rise of autonomous cars, a lot of attention is also drawn by the explainability of machine learning models for self-driving cars. In this talk, I will go over our recent contributions in the field of self-driving systems. I will present our recent works on multimodal trajectory prediction exploiting a novel use of memory augmented neural networks. Finally, we will look at simple explainable models for driving and trajectory prediction.

Francesco Calabrò
In this talk, we present the construction of a Physics-Informed method for the resolution of stationary Partial Differential Equations. Our method relies on the construction of a Feedforward Neural Network (FNN) with a single hidden layer and sigmoidal transfer functions randomly generated, the so-called Extreme Learning Machines (ELM). We use ELM random projection networks as discrete space where to look for the solution of PDEs. Free parameters (N external weights) are fixed by imposing exactness on M (eventually located randomly) points via collocation. In order to obtain accurate solutions, we underdeterminate the collocation equations (N>M). For linear PDEs, the weights are computed by a one-step least-square solution of the linear system. The least-square solution is capable of automatically selecting the important features, i.e. the functions in the space that are more influent for the solution. This leads to a one-shot automatic method and there is no need for adaptive procedures or tuning of the parameters as done when learning in other methods based on FNN. We present results for elliptic benchmark problems both in the linear case [1] and for the resolution and construction of bifurcation diagrams of nonlinear problems [2]. The results are obtained in collaboration with Gianluca Fabiani and Costantinos Siettos.
[1] Calabrò, F., Fabiani, G., & Siettos, C. (2021). Extreme learning machine collocation for the numerical solution of elliptic PDEs with sharp gradients. Computer Methods in Applied Mechanics and Engineering, 387, 114188.
[2] Fabiani, G., Calabrò, F., Russo, L. & Siettos, C. (2021). Numerical solution and bifurcation analysis of nonlinear partial differential equations with extreme learning machines. J Sci Comput 89, 44

Torna alla lista dei seminari archiviati


16/12/2021 - The participation on site is restricted and you need to register here https://labdisia.disia.unifi.it/reserve205/. The Lecture will be available also online. Please register here to participate https://us02web.zoom.us/webinar/register/WN_0QB9L_b3RlS9Zp5Rix30_g

Bayesian Models for Microbiome Data with Variable Selection

Marina Vannucci (Rice University, Houston (USA))

I will describe Bayesian models developed for understanding how the microbiome varies within a population of interest. I will focus on integrative analyses, where the goal is to combine microbiome data with other available information (e.g. dietary patterns) to identify significant associations between taxa and a set of predictors. For this, I will describe a general class of hierarchical Dirichlet-Multinomial (DM) regression models which use spike-and-slab priors for the selection of the significant associations. I will also describe a joint model that efficiently embeds DM regression models and compositional regression frameworks, in order to investigate how the microbiome may affect the relation between dietary factors and phenotypic responses, such as body mass index. I will discuss advantages and limitations of the proposed methods with respect to current standard approaches used in the microbiome community, and will present results on the analysis of real datasets. If time allows, I will also briefly present extensions of the model to mediation analysis.

Torna alla lista dei seminari archiviati


10/12/2021 - The Seminar will be on-line: https://datascience.unifi.it/index.php/event/9th-seminar-of-the-d2-seminar-series-florence-center-for-data-science

D2 Seminar Series
(RG) Italy’s lowest-low fertility in times of uncertainty
(AM) Approximating the Neighborhood Function of (Temporal) Graphs

Doppio seminario FDS:
Raffaele Guetto (DiSIA) & Andrea Marino (DiSIA)

Raffaele Guetto:
The generalized and relatively homogeneous fertility decline across European countries in the aftermath of the Great Recession poses serious challenges to our knowledge of contemporary low fertility patterns. The rise of economic uncertainty has often been identified, in the sociological and demographic literature, as the main cause of this state of affairs. The forces of uncertainty have been traditionally operationalized through objective indicators of individuals’ actual and past labour market situations. However, this presentation argues that the role of uncertainty needs to be conceptualized and operationalized taking into account that people use works of imagination, producing their own narrative of the future, also influenced by the media. To outline such an approach, I review contemporary drivers of Italy’s lowest-low fertility, placing special emphasis on the role of uncertainty fueled by labour market deregulations and – more recently – the Covid-19 pandemic. I discuss the effects of the objective (labour-market related) and subjective (individuals’ perceptions, including future outlooks) sides of uncertainty on fertility, based on a set of recent empirical findings obtained through a variety of data and methods. In doing so, I highlight the potential contribution of so-called “big data” and techniques of media content analysis and Natural Language Processing for the analysis of the effects of media-conveyed narratives of the economy.

Andrea Marino:
The average distance in graphs (like, for instance, the Facebook friendship network and the Internet Movie Database collaboration network), often referred to as degrees of separation, has been largely investigated. However, if the number of nodes is very large (millions or billions), computing this measure needs prohibitive time and space costs as it requires to compute for each node the so-called neighbourhood function, i.e. for each vertex v and for each h, how many nodes are within distance h from v. Temporal graphs are a special kind of graphs where edges have temporal labels, specifying their occurring times, in the same way as the connections of the public transportation network of a city are available only at scheduled times. Here, paths make sense only if they correspond to sequences of edges with strictly increasing labels. A possible notion of distance between two nodes in a temporal network is the earliest arrival time of the temporal paths connecting the two nodes. In this case, the temporal neighbourhood function is defined as the number of nodes reachable from a given one in a given time interval, and it is also expensive to compute. We introduce probabilistic counting in order to approximate the size of sets and we show how both plain and temporal neighbourhood functions can be approximated by plugging this technique into a simple dynamic programming algorithm.

Torna alla lista dei seminari archiviati


26/11/2021 - The Seminar will be on-line: https://datascience.unifi.it/index.php/event/8th-seminar-of-the-d2-seminar-series-florence-center-for-data-science/

D2 Seminar Series
(GR) State Space Model to Detect Cycles in Heterogeneous Agents Models (joint work with Filippo Gusella)
(MB) High quality video experience using deep neural networks

Doppio seminario FDS:
Giorgio Ricchiuti (DISEI) & Marco Bertini (DINFO)

Giorgio Ricchiuti:
We propose an empirical test to depict possible endogenous cycles within Heterogeneous Agent Models (HAMs). We consider a 2-type HAM into a standard small-scale dynamic asset pricing framework. On the one hand, fundamentalists base their expectations on the deviation of fundamental value from market price expecting a convergence between them. On the other hand, chartists, subject to self-fulling moods, consider the level of past prices and relate it to the fundamental value acting as contrarians. These pricing strategies, by their nature, cannot be directly observed but can cause the response of the observed data. For this reason, we consider the agents' beliefs as unobserved state components from which, through a state space model formulation, the heterogeneity of fundamentalist-chartist trader cycles can be mathematically derived and empirically tested. The model is estimated using the S&P500 index, for the period 1990-2020 at different time scales, specifically, daily, monthly, and quarterly.

Marco Bertini:
Lossy image and video compression algorithms are the enabling technology for a large variety of multimedia applications, reducing the bandwidth required for image transmission and video streaming. However, lossy image and video compression codecs decrease the perceived visual quality, eliminate higher frequency details and in certain cases add noise or small image structures. There are two main drawbacks of this phenomenon. First, images and videos appear much less pleasant to the human eye, reducing the quality of experience. Second, computer vision algorithms such as object detectors may be hindered and their performance reduced. Removing such artefacts means recovering the original image from a perturbed version of it. This means that one ideally should invert the compression process through a complicated non-linear image transformation. In this talk, I’ll present our most recent works based on the GAN framework that allows us to produce images with photorealistic details from highly compressed inputs.

Torna alla lista dei seminari archiviati


25/11/2021 - Per partecipare in presenza (max 20 persone) prenotarsi su https://labdisia.disia.unifi.it/reserve205/. Per partecipare a distanza: meet.google.com/aio-dhut-nah

Mortality and morbidity drivers of the global distribution of health

Inaki Permanyer (Centre d’Estudis Demogràfics, Barcelona)

Increasing life expectancy (LE) and reducing its variability across countries (the so-called “International Health Inequality”, IHI) are progressively prominent goals in global development agendas. Yet, LE is composed of two components: the number of years individuals are expected to live in “good” and in “less-than-good” health. While the first component (“Health-adjusted life expectancy”, HALE) is normatively desirable, the second one (“Unhealthy life expectancy”, UHLE, or LE – HALE) is highly controversial because of the high personal, social, and economic costs often associated with the presence of disease or disability – an issue that can muddy the waters when interpreting global health dynamics and calls for new conceptual approaches. Here we document how the evolution of HALE and UHLE between 1990 and 2019 have shaped (i) the trends and composition of LE both at the country, regional and the global levels, and (ii) the levels and trends in IHI. Our findings indicate that UHLE has tended to grow at a faster rate than HALE, thus leading to an expansion of morbidity in 75% of world countries. IHI increases until year 2000 and starts declining from that year onwards – a trend that is mostly determined by the evolution of HALE across countries. While still minoritarian, UHLE is a non-negligible and increasingly relevant factor determining the levels of international health inequality (IHI). These findings and ideas are useful to understand the role that the healthy and unhealthy components of LE are playing in contemporary health dynamics and for the elaboration of policies aiming at tackling health inequalities both across and within countries.

Torna alla lista dei seminari archiviati


24/11/2021 - Per partecipare in presenza, in aula 205 (ex 32), max 20 persone, prenotarsi su https://labdisia.disia.unifi.it/reserve205/. Per partecipare a distanza: meet.google.com/bas-vwed-bcv

Innovation on methods for the evaluation of the short-term health effects of environmental hazards.

Francesco Sera (DiSIA) e Dominic Royé (University of Santiago de Compostela)

Numerous studies have evaluated the short-term effects on health of environmental exposures (e.g. pollutants, temperature) giving valuable information for preventive public health strategies. In this seminar we’ll discuss some advances in the methodology and in the definition of the exposures and the outcomes that allows a better description of the epidemiological association.
The two-stage design has become a standard tool in environmental epidemiology to model short-term effects with multi-location data. We’ll illustrate multiple design extensions of the classical two-stage method. These are based on improvements of the standard two-stage meta-analytic models along the lines of linear mixed-effects models, by allowing location-specific estimates to be pooled through a flexible fixed and random-effects structures. This permits the analysis of associations characterised by combinations of multivariate outcomes, hierarchical geographical structures, repeated measures, and/or longitudinal settings. The designs extensions will be illustrated in examples using data collected by the Multi-country Multi-city research network.
In terms of new exposures we defined new indices that emphasize the importance of hot nights, which may prevent necessary nocturnal rest. Recently we use hot-night duration and excess to predict daily cause-specific mortality in summer, using multiple cities across Southern Europe. We found positive but generally non-linear associations between relative risk of cause-specific mortality and duration and excess of hot nights.
Another aspect is that most of the studies use daily counts of deaths or hospitalisations as health outcomes, although they are the ones at the top of the health impact pyramid reflecting only a limited proportion of patients with the most severe cases. In a recent study we evaluate the relationship between short-term exposure to the daily mean temperature and medication prescribed for the respiratory system in five Spanish cities.
The proposed innovation on the methodology and in the definition of the exposures and the outcomes provide new evidence on the short-term health effects of environmental hazards that could improve decision-making for preventive public health strategies.

Torna alla lista dei seminari archiviati


12/11/2021 - The Seminar will be on-line: https://datascience.unifi.it/index.php/event/7th-seminar-of-the-d2-seminar-series-florence-center-for-data-science/

D2 Seminar Series
(LB) Recent advances in bibliometric indexes and their implementation
(VB) Fisher’s Noncentral Hypergeometric Distribution for the Size Estimation of Unemployed Graduates in Italy (joint work with Brunero Liseo, University Sapienza, Roma)

Doppio seminario FDS:
Luigi Brugnano (DIMAI) & Veronica Ballerini (DiSIA)

Luigi Brugnano:
Bibliometric indexes are nowadays very commonly used for assessing scientific production, research groups, journals, etc. It must be stressed that such indexes cannot substitute to enter the merit of the specific research but, nonetheless, they can provide a gross evaluation of its impact on the scientific community. That premise, the currently used indexes often have drawbacks and/or sensibly vary for different subjects of investigation. For this reason, in [1] an alternative index has been proposed, based on an idea akin to that of the Google PageRank. Its actual implementation has been recently done in the Scopus database [2]. In this talk, the basic facts and results of this approach will be recalled. [1] P.Amodio, L.Brugnano. Recent advances in bibliometric indexes and the PaperRank problem. Journal of Computational and Applied Mathematics 267 (2014) 182-194. http://doi.org/10.1016/j.cam.2014.02.018. [2] P.Amodio, L.Brugnano, F.Scarselli. Implementation of the PaperRank and AuthorRank indices in the Scopus database. Journal of Infometrics 15 (2021) 101206. https://doi.org/10.1016/j.joi.2021.101206.

Veronica Ballerini:
To quantify unemployment among those who have never been employed is often tough. The lack of an administrative data flow attributable to such individuals makes them an elusive population. Hence, one must rely on surveys. However, individuals’ response rates to questions on their occupation may differ according to their employment status, implying a not-at-random missing data generation mechanism. We exploit the underused Fisher’s noncentral hypergeometric distribution (FNCH) to solve such a biased urn experiment. FNCH has been underemployed in the statistical literature mainly because of the computational burden given by its probability mass function. Indeed, as the number of draws and the number of different categories in the population increases, any method involving the evaluation of the likelihood is practically unfeasible. Firstly, we present a methodology that allows the approximation of the posterior distribution of the population size via MCMC and ABC methods. Then, we apply such methodology to the case of graduated unemployed in Italy, exploiting information from different data sources.

Torna alla lista dei seminari archiviati


29/10/2021 - The Seminar will be on-line: https://datascience.unifi.it/index.php/event/6th-seminar-of-the-d2-seminar-series-florence-center-for-data-science/

D2 Seminar Series
(GI) Performance-based research funding: Evidence from the largest natural experiment worldwide
(MF) Consensus-based optimization

Doppio seminario FDS: Giulia Iori (Department of Economics of the City University of London) & Massimo Fornasier (Department of Mathematics of the Technical University of Munich)

Giulia Iori:
The Research Excellence Framework (REF) is the main UK government policy on public research in the last 30 years. The primary aim of this policy is to promote and reward research excellence through competition for scarce research resources. Surprisingly, and despite the severe criticisms, little has been done to systematically evaluate its effects. In this paper, we evaluate the impact of the REF 2014. We exploit a large database that contains all publications in Economics, Business, Management, and Finance available in Scopus since 2001. We use a synthetic control method to compare the performance of each of the universities from the UK with counter-factual similar units in terms of past research constructed using data for US universities. We find a significant positive increase, relative to the control group, in the number of published papers, and in the proportion of papers published in highly ranked journals within the Economics/Econometrics area and the Business, Management and Finance area. Both Russell and non-Russell Group universities benefited from the REF, with the Russell Group universities experiencing an overall significant increase in the number of publications and number of publications in top journals, and the non-Russell group experiencing a significant increase in the proportion of publications in top journals in all areas. Interestingly, the non-Russell group experienced a comparatively stronger increase in the proportion of top publications in Economics/Econometrics while the Russell Group experienced a comparatively stronger increase in the proportion of top publications in Business, Management and Finance. However, we see an insignificant effect when we focus on per-author output measures indicating that growth in output was mostly achieved by an increase in the number of research active academics rather than an overall increase in research productivity.

Massimo Fornasier:
Consensus-based optimization (CBO) is a multi-agent metaheuristic derivative-free optimization method that can globally minimize nonconvex nonsmooth functions and is amenable to theoretical analysis. In fact, optimizing agents (particles) move on the optimization domain driven by a drift towards an instantaneous consensus point, which is computed as a convex combination of particle locations, weighted by the cost function according to Laplace’s principle, and it represents an approximation to a global minimizer. The dynamics are further perturbed by a random vector field to favor exploration, whose variance is a function of the distance of the particles to the consensus point. Based on an experimentally supported intuition that CBO always performs a gradient descent of the squared Euclidean distance to the global minimizer, we show a novel technique for proving the global convergence to the global minimizer in mean-field law for a rich class of objective functions. We further present formulations of CBO over compact hypersurfaces. We conclude the talk with a few numerical experiments, which show that CBO scales well with the dimension and is extremely versatile.

Torna alla lista dei seminari archiviati


28/10/2021 - Per partecipare in presenza (max 20 persone) prenotarsi su https://labdisia.disia.unifi.it/reserve205/. Per partecipare a distanza: https://meet.google.com/atb-qtxf-nvb

Uncertainty across the “Contact Line”: armed conflict, COVID-19, and perceptions of fertility decline in Eastern Ukraine

Brienna Perelli-Harris (University of Southampton)

While economic uncertainty has been a key explanation for very low fertility throughout Europe, few studies have conducted in-depth investigations into social and political sources of uncertainty. Here we study Ukraine, which has recently experienced fertility rates around 1.3. In 2014, armed conflict wracked Ukraine’s eastern regions, producing around 1.7 million internally displaced persons (IDPs). To better understand how continuing crises shape Ukrainians’ perceptions of childbearing, we analyse 16 online focus groups conducted in July 2021. We compare IDPs and locals and residents of the Government and non-Government Controlled Areas. The discussions reveal how Covid and political instability have intensified uncertainties, discouraging couples from having more than one child. Participants mentioned poverty, insecurity, and uncertain relationships as reasons to curtail childbearing. Some blamed the government or delved into conspiracy theories. Nonetheless, others were optimistic and planned to have more children, indicating mixed reactions to uncertainty.

Torna alla lista dei seminari archiviati


15/10/2021 - The Seminar will be on-line: https://datascience.unifi.it/index.php/event/5th-seminar-of-the-d2-seminar-series-florence-center-for-data-science/

D2 Seminar Series
(LB) Endogenous and Exogenous Volatility in the Foreign Exchange Market (with G. Cifarelli)
(CM) Artificial Intelligence in Neuroimaging

Doppio seminario FDS: Leonardo Bargigli (DISEI) & Chiara Marzi (IFAC – CNR)

Leonardo Bargigli:
We study two sources of heteroscedasticity in high-frequency financial data. The first, endogenous, source is the behaviour of bounded rational market participants. The second, exogenous, source is the flow of market relevant information. We estimate the impact of the two sources jointly by means of a Markov switching (MS) SVAR model. Following the original intuition of Rigobon (2003), we achieve identification for all coefficients by assuming that the structural errors of the MS-SVAR model follow a GARCH-DCC process. Using transaction data of the EUR/USD interdealer market in 2016, we firstly detect three regimes of endogenous volatility. Then we show that both kinds of volatility matter for the transmission of shocks, and that the exogenous information is channelled to the market mostly through price variations. This suggests that, on the FX market, liquidity providers are better informed than liquidity takers, who act mostly as feedback traders. The latter are able to profit from trade because, unlike noise traders, they respond immediately to the informative price shocks.

Chiara Marzi:
Life sciences data coupled with Artificial Intelligence (AI) techniques can help researchers accurately pinpoint novel biomarkers. AI can propose new indices as potential biomarkers while simultaneously aiding in searching for hidden patterns among “well-established” indices. In this webinar, we will take a brief journey through some applications of Machine Learning in neuroimaging. In the first part of the webinar, we will talk about the not so easy “marriage” between AI and clinical data, focusing on Big Data from Radiology Imaging and related issues. In the second part, we will see how we can transfer mathematical, physical, and statistical ideas to the Neuroimaging domain and how AI can help this transfer. An example is the study of the structural complexity of the brain starting from MRI images, using fractal analysis. The Fractal Dimension (FD) can be considered a measure of morphological changes due to healthy ageing and/or the onset of neurological diseases. The use of ML techniques can promote the candidature of FD as a biomarker for many neurological diseases.

Torna alla lista dei seminari archiviati


04/10/2021

Business Excellence 5.0

Gabriele Arcidiacono (DIIE - Università Guglielmo Marconi, Roma)

Il webinar è l’occasione per fruire di un osservatorio privilegiato sul tema della Business Excellence: l’eccellenza a tutto tondo definita anche Global Excellence, che permette di conseguire risultati e benefici sostenibili nel tempo. Tale approccio opera nell’ambito tecnico dei processi (Process Excellence), sulla corretta conoscenza dei comportamenti e delle dinamiche interpersonali (Human Excellence) e sul più idoneo utilizzo della tecnologia informatica e digitale (Digital Excellence). La sinergia di questi tre campi d’azione permette di strutturare un percorso di Miglioramento Continuo che includa processi, persone e digitalizzazione e conduca al raggiungimento di risultati sempre più sfidanti.

Documenti: Scheda   

Video: video   

Torna alla lista dei seminari archiviati


30/09/2021 - La partecipazione in presenza (Aula 205 - ex 32) è possibile per un max di 15 persone, contattare raffaele.guetto@unifi.it. Per partecipare a distanza: https://meet.google.com/hfw-mdvm-ity

Ethnicity and Governing Party Support in Africa

Carlos G. Rivero (Valencia University & Centre for International and Comparative Politics, Stellenbosch University)

Historically, ethnicity has been considered to play a fundamental role in voting behaviour in Africa. However, researchers on the issue have found contradictory conclusions. The most recent research concludes that the African voter is more rational than expected. Overall ethnicity seems to be less influential than theory used to suggest. Against this background, this paper analyses vote for governing party in Africa and presents evidence that the method and data set used will have an important influence upon the final result. The research takes form of a quantitative analysis making extensive use of survey data from 2005 to 2019. Results indicate that ethnicity, although not exclusively, is still an explanatory factor. At a glance, African vote is rationally ethnic.

Torna alla lista dei seminari archiviati


23/09/2021 - La partecipazione in presenza (Aula 205 - ex 32) è possibile per un max di 15 persone, contattare raffaele.guetto@unifi.it. Per partecipare a distanza: https://meet.google.com/ezd-dhpt-uvh

Incentivizing family savings as way to fight educational inequality: Preliminary results from an RCT in Italy

Loris Vergolini (University of Bologna & FBK-IRVAPP)

The paper deals with the programme WILL “Educare al Futuro”, a policy experimentation aimed at implementing a Children Saving Accounts and at evaluating its effects on a set of school-related outcomes. The program, launched in 2019 in four Italian cities (Cagliari, Florence, Teramo and Torino), targets 9th grade students from low-income families, offering them the opportunity to regularly save small amounts of money on a bank account that provides a generous multiplier (x4 if the money is spent on proven school expenses). In addition to the savings account, the beneficiaries are entitled to access to: financial education courses, to educational support and to a guidance programme. The impacts of the program will be assessed using a randomized controlled trial by 2024. In this paper we present a set of preliminary findings emerging from the first follow-up survey carried out in Spring 2021. We found that families in the treatment group show an increase in both general and for school activities savings and this is not at the expenses of other spending. Moreover, the treated families used the saving accounts to buy ICT equipment (a computer, a tablet or an internet connection) to allow their children attend online learning during the Covid-19 pandemic. Our preliminary results also suggest that WILL has no effects on parental involvement, parental aspirations and expectations as well as children school performances and children socio-emotional well-being. The lack of effects on these dimensions is discussed in terms of theory of change and of implementation issues due to the pandemic.

Torna alla lista dei seminari archiviati


22/09/2021

Affidabilità per componenti e sistemi (RELIA)

Marcantonio Catelani (Dipartimento di Ingegneria dell'informazione)

Prendendo spunto dagli esempi di "criticità" e "punti di attenzione" che caratterizzano l’argomento, obiettivo dell’incontro è fornire alcuni spunti di riflessione e discussione a partire da semplici concetti di base, metodologici e sperimentali, dell’affidabilità.
Il seminario proposto, "Affidabilità per componenti e sistemi", costituisce un primo evento a cui seguiranno ulteriori approfondimenti utili, a parere dei relatori, per comprendere il più ampio contesto dei requisiti e prestazioni RAMS - Reliability, Availability, Maintanability and Safety - tipico di alcuni scenari industriali.

Documenti: Scheda   

Video: video   

Torna alla lista dei seminari archiviati


15/07/2021

Disegno degli esperimenti e progettazione robusta (DoE-for-RD)

Rossella Berni (Dipartimento di Statistica, Informatica, Applicazioni "Giuseppe Parenti" - UniFI)

Obiettivo dell’incontro è fornire alcuni spunti di riflessione e discussione a partire da semplici concetti di base, di statistica e del disegno sperimentale, per la progettazione robusta (robust design).
Il webinar proposto, "Disegno degli esperimenti e progettazione robusta", costituisce un primo evento a cui seguirà un secondo approfondimento, utile per chiarire quali devono essere le competenze e gli strumenti necessari per una ottimizzazione robusta di processo, ampiamente utilizzata, talvolta in modo non del tutto completo e aggiornato, in ambito tecnologico e industriale.

Documenti: Scheda   Locandina

Video: video   

Torna alla lista dei seminari archiviati


08/07/2021 - Link to join the seminar on Meet: meet.google.com/psr-ggem-mhv

Algoritmi di riduzione della dimensionalità e di estrazione di ranking per sistemi multidimensionali di indicatori ordinali e dati parzialmente ordinati

Marco Fattore (Università degli Studi di Milano-Bicocca)

Il problema della costruzione di indici sintetici e di ranking a partire da sistemi multidimensionali di indicatori ordinali è sempre più diffuso, nell’ambito della statistica socio-economica e a supporto di processi di valutazione e di multi-criteria decision/policy making; ciononostante, l’apparato metodologico per l’analisi di dati ordinali a più dimensioni è ancora limitato e in larga parte mutuato dall’analisi statistica di variabili quantitative. Obiettivo del seminario è mostrare come sia invece possibile impostare una “analisi ordinale dei dati”, importando nella metodologia statistica le corrette strutture matematiche, a partire dalla Teoria delle Relazioni d’Ordine, una parte della matematica discreta dedicata alle proprietà degli insiemi parzialmente ordinati e quasi-ordinati. In particolare, prendendo le mosse dal problema della misurazione della povertà multidimensionale, del benessere o della sostenibilità, il seminario affronta il tema della riduzione della dimensionalità e dell’estrazione di ranking per sistemi di dati ordinali e parzialmente ordinati, introducendo ai più recenti sviluppi metodologici, illustrando sia gli algoritmi già disponibili che quelli in corso di sviluppo e discutendo i loro punti di forza e di debolezza, con particolare attenzione agli aspetti computazionali. Il seminario si conclude, con un’illustrazione delle linee di ricerca, teoriche e applicate, in corso di sviluppo, fornendo una mappa dei problemi risolti e di quelli ancora aperti, nell’ambito dell’analisi multidimensionale ordinale di dati socio-economici.

Torna alla lista dei seminari archiviati


02/07/2021 - The Seminar will be on-line: https://datascience.unifi.it/index.php/event/4th-seminar-of-the-d2-seminar-series-florence-center-for-data-science/

D2 Seminar Series
(AG) Circular data, conditional independence & graphical models
(CC) Penalized hyperbolic-polynomial splines

Doppio seminario FDS: Anna Gottard (DiSIA) & Costanza Conti (DIEF)

Anna Gottard:
Circular variables, arising in several contexts and fields, are characterized by periodicity. Models for studying the dependence/independence structure of circular variables are under-explored. We will discuss three multivariate circular distributions, the von Mises, the Wrapped Normal and the Inverse Stereographic distributions, focusing on their properties concerning conditional independence. For each of these distributions, we examine the main properties related to conditional independence and introduce suitable classes of graphical models. The usefulness of the proposal is shown by modelling the conditional independence among dihedral angles characterizing the three-dimensional structure of some proteins.

Costanza Conti:
The advent of P-splines, first introduced by Eilers and Marx in 2010 (see [4]), has led to important developments in data regression through splines. With the aim of generalizing polynomial P-splines, in [1] we have recently defined a model of penalized regression spline, called HP-spline, in which polynomial B-spline functions are replaced by Hyperbolic-Polynomial bell-shaped basis functions. HP-splines are defined as a solution to a minimum problem characterized by a discrete penalty term. They inherit from P-splines the advantages of the model, like the separation of the data from the spline nodes, so avoiding the problems of overfitting and the consequent oscillations at the edges. HP-splines are particularly interesting in different applications that require analysis and forecasting of data with exponential trends. Indeed, the starting idea is the definition of a polynomial-exponential smoothing spline model to be used in the framework of the Laplace transform inversion. We present some recent results on the existence, uniqueness, and reproduction properties of HP-splines, also with the aim of extending their usage to data analysis.

Torna alla lista dei seminari archiviati


18/06/2021 - The Seminar will be on-line: https://datascience.unifi.it/index.php/event/3rd-seminar-of-the-d2-seminar-series-florence-center-for-data-science/

D2 Seminar Series
(ER) From algebra to biology: what does the math of ensemble averaging methods can tell us
(GC) Big data from space. Recent Advances in Remote Sensing Technologies

Doppio seminario FDS:
Enrico Ravera & Gherardo Chirici (University of Florence)

Enrico Ravera:
Our work aims at a quantitative comparison of different methods for reconstructing conformational ensembles of biological macromolecules integrating molecular simulations and experimental data. This field has evolved over the years reflecting the evolution of computational power and sampling schemes, and a plethora of different methods have been proposed. These methods can vary extensively in terms of how the prior information from the simulation is used to reproduce the experimental data, but can be coarsely attributed to two categories: Maximum Entropy or Maximum Parsimony. In any case, the problem is severely underdetermined and therefore additional information needs to be provided on the basis of the chemical knowledge about the system under investigation. Maximum entropy looks for the minimal perturbation of the prior distribution, whereas Maximum Parsimony looks for the smallest possible ensemble that can explain in full the experimental data. On these grounds, one can expect radically different solutions in the reconstruction, but surprises are still possible – and can be justified by a rigorous geometrical description of the different methods.

Gherardo Chirici:
Since the 1970s, remote sensing technologies for terrestrial observation have generated a constant flow of data from different platforms, in different formats and with different purposes. From these, through successive steps, spatial information is generated to support the Earth resources monitoring and planning. Indispensable in various sectors: from urban planning to geology, from agriculture to forest monitoring and, more generally, any type of information to support environmental monitoring. For this reason, remotely sensed information is recognized as a typical example of big data ante litteram. Today the new cloud computing technologies (such as Google Earth Engine) allow facing the complex problem of data management and processing of big data from remote sensing with new strategies that have revolutionized these data sources are used. From experiments on small areas, today we have moved to the possibility of operationally processing vast multidimensional and multitemporal datasets on a global scale. The increased availability of information from space is exemplified by the numerous services offered by the European Copernicus program. The presentation, starting from a brief introduction to remote sensing techniques, illustrates some examples of applications developed within the geoLAB – Geomatics Laboratory of the Department of Agriculture, Food, Environment and Forestry (DAGRI) and the UNIFI COPERNICUS Research Unit.

Torna alla lista dei seminari archiviati


10/06/2021 - Link to join the seminar on Meet: meet.google.com/bmh-anmz-jms

Same-Sex Marriage, Relationship Dynamics and Well-being among Sexual Minorities in the UK

Diederik Boertien (Centre d’Estudis Demogràfics, Barcelona)

Changing laws and attitudes have reduced obstacles for sexual minorities to parts of family life including marriage and parenthood. The question arises to what extent these changes affected the relationship dynamics of sexual minorities and which obstacles to achieving family-related outcomes remain. In this presentation, a closer look is taken at the impact of the legalization of same-sex marriage in the UK (2014). To what extent did legalizing same-sex marriage have an impact on the relationships of sexual minorities? Beyond the uptake of marriage, are there any changes in union formation, separation and partnering? What has the impact been on relationship satisfaction among same-sex couples? Data from Understanding Society is used which allows for a longitudinal study of the relationship dynamics of individuals in same-sex couples, but also includes information on sexual identity, which allows studying how the relationship dynamics of single individuals identifying as a sexual minority changed over time.

Torna alla lista dei seminari archiviati


04/06/2021 - The Seminar will be on-line: https://datascience.unifi.it/index.php/event/2nd-seminar-of-the-d2-seminar-series-florence-center-for-data-science/

D2 Seminar Series
(CB) Scattered data: surface reconstruction and fault detection
(LB) Game-based education promotes sustainable water use

Doppio seminario FDS:
Cesare Bracco & Leonardo Boncinelli (University of Florence)

Cesare Bracco:
We will consider two aspects concerning scattered data approximation. The first is reconstructing a (parametrized) surface from a set of scattered points: the lack of structure in the data requires approximation methods that automatically adapt to the distribution and shape of the data themselves. We will discuss an effective approach to this issue based on hierarchical spline spaces, which can be locally refined, and therefore naturally lead to adaptive algorithms. The reconstruction problem contains another interesting problem: detecting the discontinuities the surface may have in order to reproduce them. Finding the discontinuity curves, usually called faults (or gradient faults when gradient discontinuities are considered), is actually an important issue in itself, with several applications, for example in image processing and geophysics. I will present a method to determine which points in the scattered data set lie close to a (gradient) fault, based on indicators obtained by using numerical differentiation formulas.

Leonardo Boncinelli:
In this study, we estimate the impact of a game-based educational program aimed at promoting sustainable water usage among 2nd-4th grade students and their families living in the municipality of Lucca, Italy. To this purpose, we exploited unique data from a quasi-experiment involving about two thousand students, one thousand participating (the treatment group), and one thousand not participating (the control group) in the program. Data were collected by means of a survey that we specifically designed and implemented for collecting students’ self-reported behaviours. Our estimates indicate that the program has been successful: the students in the program reported an increase in efficient water usage and an increase in the frequency of discussions with their parents about water usage; moreover, positive effects were still observed after six months. Our findings suggest that game-based educational programs can be an effective instrument to promote sustainable water consumption behaviors in children and their parents.

Torna alla lista dei seminari archiviati


21/05/2021 - The Seminar will be on-line: https://datascience.unifi.it/index.php/event/1st-seminar-of-the-d2-seminar-series-florence-center-for-data-science/

D2 Seminar Series
(FM) Assessing causality under interference
(AB) Lifelong Learning at the end of the (new) Early Years

Doppio seminario FDS:
Fabrizia Mealli & Andrew Bagdanov (University of Florence)

Fabrizia Mealli:
Causal inference from non-experimental data is challenging; it is even more challenging when units are connected through a network. Interference issues may arise, in that potential outcomes of a unit depend on its treatment as well as on the treatments of other units, such as their neighbours in the network. In addition, the typical unconfoundedness assumption must be extended—say, to include the treatment of neighbours, and individual and neighbourhood covariates—to guarantee identification and valid inference. These issues will be discussed, new estimands introduced to define treatment and interference effects and the bias of a naive estimator that wrongly assumes away interference will be shown. A covariate-adjustment method leading to valid estimates of treatment and interference effects in observational studies on networks will be introduced and applied to a problem of assessing the effect of air quality regulations (installation of scrubbers on power plants) on health in the USA.

Andrew Bagdanov:
Lifelong learning, also often referred to as continual or incremental learning, refers to the training of artificially intelligent systems able to continuously learn to address new tasks from new data while preserving knowledge learned from previously learned ones. Lifelong learning is currently enjoying a sort of renaissance due to renewed interest from the Deep Learning community. In this seminar, I will introduce the overall framework of continual learning, discuss the fundamental role played by the stability-plasticity dilemma in understanding catastrophic forgetting in lifelong learning systems, and present a broad panorama of recent results in class-incremental learning. I will conclude the discussion with a look at current trends, open problems, and low-hanging opportunities in this area.

Torna alla lista dei seminari archiviati


20/05/2021

Modelling international migration flows in Europe by integrating multiple data sources

Emanuele Del Fava (MPIDR, Germany)

Although understanding the drivers of migration is critical to enacting effective policies, theoretical advances in the study of migration processes have been limited by the lack of data on flows of migrants, or by the fragmented nature of these flows. In this work, we build on existing Bayesian modeling strategies to develop a statistical framework for integrating different types of data on migration flows. We offer estimates, as well as associated measures of uncertainty, for inflows and outflows among 31 European countries, by combining administrative and household survey data from 2002 to 2018. Methodologically, we demonstrate the added value of survey data when there is lack of administrative data or they show poor quality. Substantively, we document the historical impact of the EU enlargement and the free movement of workers in Europe on migration flows.

Video: Recorded seminar   

Torna alla lista dei seminari archiviati


29/04/2021 - Link to join the seminar on Meet: meet.google.com/bmh-anmz-jms

In-work poverty in Europe. Trends and determinants in longitudinal perspective

Stefani Scherer (Università degli Studi di Trento)

Employment remains among the most important factors to protect individuals and their families from economic poverty. However, recent years have witnessed an alarming increase of in-work poverty (IWP) in some contexts, thus of being poor notwithstanding employment. This paper investigates trends and determinants of in-work poverty in Europe, using EU-SILC data for the period 2004-2015. The contribution is threefold. First, we provide an analysis of the risk and the persistence of IWP for different social groups and household employment patterns and discuss potential implications for social stratification dynamics. Second, we look into the (causal) dynamics of poverty and test for the presence of genuine state dependence (GSD) and the role of unobserved heterogeneity in shaping the accumulation of economic disadvantages over time. Third, we adopt a comparative perspective across countries (and time periods) analysing how different institutional features affect exposure to and dynamics of economic disadvantages. We show that one income is often no longer enough to keep families out of poverty and thus confirm the importance of a second earner, and the need to have at least one non-low-pay source of income. This is particularly true for Europe’s South and for the less privileged social groups. We find no evidence for GSD, which comes with important implications to combat poverty in terms of activation policies, rather than through transfers. Finally, notwithstanding different levels of exposure to and “stickiness” of IWP, the drivers turn out to be pretty similar across European countries.

Torna alla lista dei seminari archiviati


01/04/2021 - Link to join the seminar on Meet: meet.google.com/bmh-anmz-jms

The intersection of partnership and fertility histories among immigrants and their descendants in the United Kingdom: A multistate approach

Julia Mikolai (University of St. Andrews, UK)

We study the interrelationship between partnership and fertility histories of immigrants and their descendants in the UK using data from the UK Household Longitudinal Study. Previous studies have either focused on immigrants’ fertility or their partnership experiences. However, no studies have analysed the intersection of these two interrelated life domains among immigrants and their descendants. Using multistate event history models, we analyse the outcomes of 1) unpartnered women (cohabitation, marriage, or childbirth), 2) cohabiting women (marriage, separation, or childbirth), and 3) married women (separation or childbirth). Our innovative modelling strategy allows us to jointly analyse repeated partnership and fertility transitions and to incorporate duration in the models. We also study how the interrelationship between these processes has changed across birth cohorts. We find distinct patterns by migrant groups. Among unpartnered native and European/Western women, cohabitation is the most likely outcome followed by marriage and childbirth. Unpartnered women from other origins show different patterns. South Asians tend to marry and have low risks of childbirth and cohabitation. Caribbean women are most likely to have a child, followed by cohabitation and marriage. Among cohabiting native and European/Western women, marriage is the most likely outcome followed by separation and childbirth. We find few significant differences between the outcomes of cohabiting women from all other origin groups. Married women in all migrant origin groups are most likely to have a child and much less likely to separate. Separation risks are the lowest among South Asian and the highest among Caribbean women. We find the largest change across birth cohorts among native and European/Western women whereas the same patterns persist among women from all other migrant groups.

Torna alla lista dei seminari archiviati


25/03/2021 - Seminar via Webex

The later life consequences of natural disasters: earthquakes, tsunami, and radioactive fall-out

Marco Cozzani (European University Institute (EUI))

TBA

Torna alla lista dei seminari archiviati


24/03/2021 - Seminar via Webex

The early life consequences of natural disasters: earthquakes and hurricanes

Marco Cozzani (European University Institute (EUI))

TBA

Torna alla lista dei seminari archiviati


11/03/2021 - Seminar via Webex

Air pollution and human health

Risto Conte Keivabu (European University Institute (EUI))

TBA

Torna alla lista dei seminari archiviati


10/03/2021 - Seminar via Webex

Climate change and temperature extremes: why does it matter for humans and especially for the poor?

Risto Conte Keivabu (European University Institute (EUI))

TBA

Torna alla lista dei seminari archiviati


25/02/2021

Female-breadwinner families

Agnese Vitali (Università degli Studi di Trento)

In this seminar I will reflect on the drivers leading couples into female breadwinning, and on (some of) the consequences that female breadwinning can have for couples.
After reviewing theoretical explanations based on the gender revolution, the reversal of the gender gap in education and macro-economic change, I will present results based on data from the Luxembourg Income Study database showing how partners’ relative incomes have changed across four decades for a selection of developed countries.
The seminar will reflect on the importance of distinguishing between dual-earner couples with women as main earners from ‘pure’ female-breadwinner couples – as these are characterized by essentially different drivers and socio-economic backgrounds. When moving to study consequences of female breadwinning, distinguishing among the two subgroups also appears important: the “female-breadwinning” penalty frequently found in previous studies on a series of outcomes such as wellbeing, the risk of union dissolution and time spent on domestic and care work, appears reduced for couples with women as main earners compared to ‘pure’ female-breadwinner couples.

Video: Recorded seminar   

Torna alla lista dei seminari archiviati


29/05/2020 - Il seminario si svolge on-line.

Sustainability, Climate Change and Hazardous events: Statistical Measures, Challenges and Innovations

Angela Ferruzza (ISTAT, Roma)

Sustainable development Goals, Climate Change and Hazardous events frameworks will be presented, considering their interactions. Challenges and innovations in the process of improving building capacity for increasing the statistical measures will be considered.

Referente: prof.ssa Alessandra Petrucci

Torna alla lista dei seminari archiviati


05/05/2020 - Il seminario si svolge on line.

La statistica per lo sviluppo sostenibile

Gianluigi Bovini (Fondazione Unipolis, ASviS)

TBA

Torna alla lista dei seminari archiviati


19/02/2020

From logit to linear regression and back

Giovanni Maria Marchetti (DiSIA)

For binary variables, I show necessary and sufficient conditions for the parameter equivalence of linear- and logit-regression coefficients, discussing consequences for some Ising models.

Torna alla lista dei seminari archiviati


18/02/2020

On Spatial Lag Models estimated using crowdsourcing, web-scraping or other unconventionally collected Big Data

Giuseppe Arbia (Catholic University of the Sacred Heart Milan – Rome)

The Big Data revolution is challenging the state-of-the-art statistical and econometric techniques not only for the computational burden connected with the high volume and speed with which data are generated, but even more for the variety of sources through which data are collected (Arbia, 2019). This paper concentrates specifically on this last aspect. Common examples of non traditional Big Data sources are represented by crowdsourcing (data voluntarily collected by individuals) and web scraping (data extracted from websites and reshaped into structured datasets). A common characteristic to these unconventional data collections is the lack of any precise statistical sample design, a situation described in statistics as “convenience sampling”. As it is well known, in these conditions no probabilistic inference is possible. To overcome this problem, Arbia et al. (2018) proposed the use of a special form of post-stratification (termed “post-sampling”), with which data are manipulated prior their use in an inferential context. In this paper we generalize this approach using the same idea to estimate a Spatial Lag Econometric Model (SLM). We start showing through a Monte Carlo study that using data collected without a proper design, parameters’ estimates can be biased. Secondly, we propose a post sampling strategy to tackle this problem. We show that the proposed strategy indeed achieves a bias-reduction, but at the price of a concomitant increase in the variance of the estimators. We thus suggest an MSE-correction operational strategy. The paper also contains a formal derivation of the increase in variance implied by the post-sampling procedure and concludes with an empirical application of the method in the estimation of a hedonic price model in the city of Milan using web scraped data.

Torna alla lista dei seminari archiviati


20/01/2020

Statistical scalability of approximate likelihood inference

Helen Ogden (Mathematical Sciences - University of Southampton)

In cases where it is not possible to evaluate the likelihood function exactly, an alternative is to find a numerical approximation to the likelihood, then to use this approximate likelihood in place of the true likelihood to do inference about the model parameters. Approximate likelihoods are typically designed to be computationally scalable, but the statistical properties of these methods are often not well understood: fitting the model may be fast, but is the resulting inference any good? I will describe conditions which ensure that the approximate likelihood inference retains good statistical properties, and discuss the statistical scalability of inference with an approximate likelihood, in terms of how the cost of conducting statistically valid inference scales as the amount of data increases. I will demonstrate the implications of these results for a particular family of approximations to the likelihood used for inference on an Ising model, and for Laplace approximations to the likelihood used for inference in mixed-effects models

Torna alla lista dei seminari archiviati


15/01/2020

Modern Modeling: Guidelines for Best Practice and Useful Innovations

Todd D. Little (Texas Tech University)

In this talk, I will highlight an approach to modeling interventions that puts a premium on avoiding Type II error. It relies on latent variable structural equation modeling using multiple groups and other innovations in SEM modeling. These innovations along with modern principled approaches to modeling allows effect evaluations of intervention effects. I will highlight the advantages as well as discuss potential pitfalls.

Torna alla lista dei seminari archiviati


10/12/2019

Multinomial Probit: Probability Simulators and Identification

Alessandro Palandri

The paper introduces four unbiased probability-simulators which produce continuous (simulated) log-likelihood functions with almost everywhere continuous derivatives. Identification conditions are derived which show that when an intercept is present in every latent utility equation then every element of the shocks' variance-covariance matrix must be fixed in order to attain exact-identification. Monte Carlo simulations are used to illustrate unbiasedness and relative efficiency of the proposed simulators and to confirm the predicted under- and exact-identification of representative parameterizations. A previous study on the relation between movie attendance and violent crimes is reconsidered in the multinomial probit framework.

Torna alla lista dei seminari archiviati


03/12/2019

Learning the two parameters of the Poisson Dirichlet distribution with forensic applications

Giulia Cereda (University of Leiden, Netherlands)

The "rare type match problem" is the situation in which the suspect's DNA profile, matching the DNA profile of the crime stain, is not in the database of reference. The evaluation of this match in the light of the two competing hypotheses (the crime stain has been left by the suspect or by another person) is based on the calculation of the likelihood ratio and depends on the population proportions of the DNA profiles, that are unknown. We propose a Bayesian nonparametric method that uses a two-parameter Poisson Dirichlet distribution as a prior over the ranked population proportions and discards the information about the names of the different DNA profiles. The Poisson Dirichlet seems to be appropriate for data coming from European Y-STR DNA profiles, but the likelihood ratio turned out to depend on the posterior of the two parameters, treated as random variables, given observed data. Inference on the Poisson Dirichlet parameters has not received much attention in Bayesian nonparametric literature. In the previous work, we have explored the use of MLE estimators for the two parameters of Poisson Dirichlet distribution, and brutally plug them into the LR evaluation. Now, we are investigating the approximation of the posterior of the Poisson Dirichlet parameters via MCMC and ABC methods. The three approaches are discussed and compared, both in controlled situations where the true parameters are known. Applications to real caseworks are also proposed.

Torna alla lista dei seminari archiviati


02/12/2019

Machine learning methods for estimating the employment status in italy

Roberta Varriale (ISTAT)

In recent decades, National Statistical Institutes have focused on producing official statistics by exploiting multiple sources of information (multi-source statistics) rather than a single source, usually a statistical survey. The growing importance of producing multi-source statistics in official statistics has led to increasing investments in research activities in this sector. In this context, one of the research projects addressed by the Italian National Statistical Institute (Istat) concerned the study of methodologies for producing estimates on employment rates in Italy through the use of multiple sources of information, survey data and administrative sources. The data comes from the Labour Force (LF) survey conducted by Istat and from several administrative sources that Istat regularly acquires from external bodies. The “quantity” of information is very different: those coming from administrative sources concern about 25 million individuals, while those coming from the LF survey refer to an extremely limited number (about 330,000) of individuals. The two measures do not agree on employment status for about 6% of the units from the LF survey. One proposed approach uses a Hidden Markov model to take into account the deficiencies in the measurement process of both survey and administrative sources. The model describes a measurement process as a function of a time-varying latent state (in this case the employment category), whose dynamics is described by a Markov chain defined over a discrete set of states. At present, the implementation phase for the production process of statistics on employment through the use of HM models is coming to an end in Istat. The present work describes the use of Machine Learning methods to predict the individual employment status. This approach is based on the application of decision tree and random forest models, that are predictive models, usually used to classify instances of large amounts of data. In the work, the obtained results will be described, together with their usefulness in this application context. The models have been applied through the use of the software R.

Torna alla lista dei seminari archiviati


28/11/2019

China's Age of Abundance: An Application of the National Transfer Accounts (NTA) Method

WANG Feng (University of California, Irvine, USA, and Fudan University, China)

In the past 15 years or so, the National Transfer Accounts (NTA) approach has extended to over 90 countries in the world, as an analytical tool to examine intergenerational economic transfers along with population changes. Using China as an example, this lecture briefly introduces the methodology and data sources required, and examines changes in the following four areas: 1) changing income and consumption age profiles along with rapid economic change; 2) aggregate life-cycle deficits (surpluses) and their projections; 3) inequalities in public transfers over time, and 4) projecting fiscal burdens due to population aging and welfare program expansion. The talk serves as an overview of China's historical transformations as seen from the NTA-based analytical approach, and a plan to explore areas for future studies and global comparisons.

Torna alla lista dei seminari archiviati


14/11/2019

Composite likelihood inference for simultaneous clustering and dimensionality reduction of mixed-type longitudinal data

Monia Ranalli (Sapienza Università di Roma)

This talk aims at introducing a multivariate hidden Markov model (HMM) for mixed-type (continuous and ordinal) variables. As some of the considered variables may not contribute to the clustering structure, a hidden Markov-based model is built such that discriminative and noise dimensions can be recognized. The variables are considered to be linear combinations of two independent sets of latent factors where one contains the information about the cluster structure, following an HMM, and the other one contains noise dimensions distributed as a multivariate normal (and it does not change over time). The resulting model is parsimonious, but its computational burden may be cumbersome. To overcome any computational issue, a composite likelihood approach is introduced to estimate model parameters. The model is applied to a real dataset derived from the first five waves of the Chinese Longitudinal Healthy Longevity Survey. The model is able to identify the discriminant variables and capture the cluster structure changing over time parsimoniously.

Torna alla lista dei seminari archiviati


07/11/2019

Automating the Process of Information Quality Assessment

Davide Ceolin (CWI and Vrije Universiteit Amsterdam)

Assessing the quality of online information is a challenging, yet crucial task to help users dealing with information overload. Experts (of diverse backgrounds) are often good examples of reliable assessors, but experts are a costly and limited resource. In this talk, I will present some advancements on scaling up the assessment process, involving the use of crowdsourcing and the definition of trust-related metrics. I will evaluate these methods on a corpus of online documents regarding the vaccination debate.

Torna alla lista dei seminari archiviati


04/11/2019

Living with Intelligent Machines

Nello Cristianini (University of Bristol)

Nello Cristianini is Professor of Artificial Intelligence (AI) at the University of Bristol. He is the co-author of two widely known books in machine learning, An Introduction to Support Vector Machines and Kernel Methods for Pattern Analysis, as well as a book in bioinformatics, Introduction to Computational Genomics. He is also a recipient of the Royal Society Wolfson Research Merit Award and a current holder of a European Research Council Advanced Grant. Currently he is working on social and ethical implications of AI.

Torna alla lista dei seminari archiviati


28/10/2019

Current Challenges in the Analysis of Brain Signals

Hernando Ombao (KAUST)

Advances in imaging technology has given unprecedented access for neuroscientists to examine various facets of how the brain “works”. Brain activity is complex. A full understanding of brain activity requires careful study of its multi-scale spatial-temporal organization (from neurons to regions of interest; and from transient events to long-term temporal dynamics). In this talk, the focus will be on modeling connectivity between many channels of a network of electroencephalogram (EEG) recordings. There are many challenges to analyzing brain connectivity. First, there is no unique measure that can completely characterize dependence between EEG channels in a network. Second, brain data is massive - these are recordings across many location and over long recording times. Third, it has a complex structure with non-stationary properties that evolve over space and time. Thus, these challenges present big opportunities for data scientists to develop new tools and models for addressing the current research in the neuroscience community.

Torna alla lista dei seminari archiviati


22/10/2019

Gender Bias in Academic Promotions, Myth or Reality? Evidence from a Factorial Survey Experiment

Nevena Kulic (European University Institute)

Given the relatively equal share of women and men at the PhD level, there is a dramatic decrease of women’s representation in higher stages of academic career. In this article, we study whether and how demand-side processes contribute to this decrease. We examine, in particular, bias occurring due to differences in the evaluation of network ties of men and women. The following questions are asked: Is there a gender bias in hiring/promotion due to different evaluation of networks of men and women in academia?(2) If yes, what is the role of strength of ties therein?(3) Does the status of a candidate’s tie matter in hiring/promotion and if yes, is the effect different for men and women? Factorial survey experiment on hiring/promotion of full professors was conducted in Italy on three broad disciplines: humanities (area 11), economics and statistics (area 13), social and political sciences (area 14). In this experiment, respondents evaluate a randomly assigned set of candidate’s profiles, which vary randomly in their gender and type of network ties. We hypothesized that women will be evaluated more positively than men if they rely on inter-connected contacts in closed networks, and borrow social capital from a high-status tie. On the contrary, we expected that men will be better evaluated than women when relying on open external networks, and will be less penalized than women for not relying on high status contacts. Results indicate that our hypotheses are partly confirmed: gender bias is found in the evaluation of external ties in competence evaluation: women tend to be evaluated less well when their ties are external (weak), and the bias originates in the sample of male respondents. There is, however, no confirmation on the gendered effect of status of ties. Our article has important policy relevance and it helps unravel the demand side mechanisms of gender inequality in academia. A better understanding of these processes would indicate whether and how academic institutions could adjust their procedures in order to bring more parity, and prevent the loss of human capital due to lower presence of highly qualified women.

Torna alla lista dei seminari archiviati


23/09/2019

Identification of good educational practices in schools with high added value using Data Mining Techniques

Fernando Martínez Abad (University of Salamanca)

The main objective of this Research Project, funded by BBVA Foundation, was the identification of factors associated with performance in schools with high added value for the production of a catalogue of good educational practices and its dissemination to the educational community. Based on the results of the Spanish sample from the PISA 2015 assessment, this objective was developed in 3 stages: 1) Application of hierarchical linear models for identifying schools with high and low effectiveness: To quantify the effectiveness of every school, we isolated the average performance of schools not attributable to the effect to the contextual variables in order to obtain an average residual. Thus, we select and characterize schools whose residual average performance is systematically maintained at higher (high effectiveness) and lower levels (low effectiveness); 2) Application of Data Mining techniques for identifying non-contextual factors associated with high and low effectiveness: We applied decision trees, which generate classifications based on their scores in the explanatory variables taking as a reference the given criterion (schools considered to have high and low effectiveness); 3) Design of a catalogue of good educational practices: this stage was developed with the aim of disseminating the results to different groups that may be interested in the project.

Torna alla lista dei seminari archiviati


27/06/2019

Dimensionality reduction via the identifications of the data intrinsic dimensions

Antonietta Mira (Università della Svizzera Italiana)

Even if they are defined on a space with a large dimension, data points usually lie onto a hypersurface, or manifold with a much smaller intrinsic dimension (ID). The recent TWO-NN method (Facco et al., 2017, Scientific Report) allows estimating the ID when all points lie onto a single manifold. TWO-NN makes only a fairly weak assumption, that the density of points is approximately constant in a small neighborhood around each point. Under this hypothesis, the ratio of the distances of a point from its first and second neighbour follows a Pareto distribution that depends parametrically only on the ID, allowing for an immediate estimation of the latter. We extend the TWO-NN model to the case in which the data lie onto several manifolds with different ID. While the idea behind the extension is simple (the Pareto distribution is just replaced by a mixture of K Pareto distributions), a non-trivial Bayesian scheme is required for correctly estimating the model and assigning each point to the correct manifold. Applying this method, which we dub Hidalgo (heterogeneous intrinsic dimension algorithm), we uncover a surprising ID variability in several real-world datasets such as FMRI, protein folding, financial data, gene expression and basketball data. The Hidalgo model obtains remarkable results, but its limitation consists in fixing a priori the number of component in the mixture. To adopt a fully Bayesian approach, a possible extension would be the specification of a prior distribution for the parameter K. Instead, with even greater flexibility, we let K go to infinity, using a Bayesian Nonparametric approach and model the data as an infinite mixture of Pareto distributions. This approach, at the same time, takes into account the uncertainty relative to the number of mixture components.

Torna alla lista dei seminari archiviati


26/06/2019

B. Arpino: Challenges of causal inference in demographic observational studies
R. Guetto: The growth of mixed unions in Italy: a marker of immigrant integration and societal openness?

Welcome seminar: Bruno Arpino & Raffaele Guetto

B.Arpino: I will start the seminar summarising my research interests and academic trajectory. Then I will present the following work-in-progress study that combines my main methodological and applied research interests:
Demographers are often confronted with the goal of establishing a causal link between demographic events (e.g., fertility, union formation and dissolution) and socio-economic, health and other types of outcomes. Since experiments are commonly not a feasible strategy, demographic research often relies on observational studies. Not being able to manipulate the treatment assignment, demographers have to deal with several issues, such as omitted variable bias and reverse causality. The aims of this paper are to review the methods commonly used by demographers to estimate causal effects in observational studies and to discuss strengths and limitations of these methods. Motivated by the estimation of the causal effect of grandparental childcare on health and using simulations mimicking the Survey of Health and Retirement in Europe (SHARE), I will compare propensity score matching, fixed effects and instrumental variables regression. The goal of these simulations is to highlight the consequences of violations of assumptions underlying each method depending also on different types of data available to the researcher.
I will conclude the seminar with my plans for future research in the short run.

R.Guetto: I will start the seminar presenting my academic background, my research interests and plans for future research. I will then present an overview of the results of the research on native-immigrant unions and immigrant socioeconomic integration that I carried out in the last years.
Mixed unions, i.e. unions between natives and immigrants, which in Italy primarily involve Italian men partnered with foreign women, are commonly understood as the height of immigrant integration and societal openness. However, consistent with the status exchange theory, such unions are more likely when less-educated, older native men marry better educated, younger immigrant women, especially when the latter originate from non-Western countries. I highlight the existence of a multiplicity of factors underlying such mating patterns. From the standpoint of the foreign partner, I discuss the relevance of immigrants’ economic circumstances and provide causal evidence on the role played by the possibility of obtaining Italian/EU citizenship through marriage. The analysis of the Italian partner’s perspective points to the increasing crowding out of low-educated men on the native marriage market. Cultural factors also need to be considered, as foreign women are usually more compliant with traditional gender roles. Overall, the results reject a simplistic interpretation of the growth of mixed unions as an indicator of increased immigrant integration and societal openness.

Torna alla lista dei seminari archiviati


14/06/2019

A causal inference approach to evaluate the health impacts of air quality regulations: The health benefits of the 1990 Clean Air Act Amendments

Rachel Nethery (Harvard University)

In evaluating the health effects of previously implemented air quality regulations, the US Environmental Protection Agency first predicts what air pollution levels would have been in a given year under the counterfactual scenario of no regulation and then inserts these predictions into a health impact function to predict the corresponding counterfactual number of various types of health events (e.g., death, cardiovascular events, etc...). These predictions are then compared to the number of health events predicted for the observed pollutant levels under the regulations. This procedure is generally carried out for each pollutant separately. In this paper, we develop a causal inference framework to estimate the number of health events prevented by air quality regulations via the resulting changes in exposure to multiple pollutants simultaneously. We introduce a causal estimand called the Total Events Avoided (TEA), and we propose both a matching method and a Bayesian machine learning method for estimation. In simulations, we find that both the matching and machine learning methods perform favorably in comparison to standard parametric approaches, and we evaluate the impacts of tuning parameter specifications. To our knowledge, this is the first attempt to perform causal inference in the presence of multiple continuous exposures. We apply these methods to investigate the health impacts of the 1990 Clean Air Act Amendments (CAAA). In particular, we seek to answer the question "How many mortality events, cardiovascular hospitalizations, and dementia-related hospitalizations were avoided in the Medicare population in 2001 thanks to CAAA-attributable changes in pollution exposures in the year 2000?". For each zipcode in the US, we have obtained (1) pollutant exposure levels with the CAAA in place in 2000, (2) the observed count of each health event in the Medicare population in 2001, and (3) estimated counterfactual pollutant exposure levels under a no-CAAA scenario in 2000. Without relying on modeling assumptions, our matching and machine learning methods use confounder-adjusted relationships between observed pollution exposures and health outcomes to inform estimation of the number of health events that would have occurred in the same population under the counterfactual, no-CAAA pollution exposure levels. The TEA is computed as the difference in the estimated no-CAAA counterfactual event count and the observed event count. This approach could be used to analyze any regulation, any set of pollutants, and any health outcome for which data are available. This framework improves on the current regulatory evaluation protocol in the following ways: (1) the causal inference approach clarifies the question under study, the statistical quantity being estimated, and the assumptions of the methods; (2) statistical models and resulting estimates are built on real health outcome data; (3) the results do not rely on dubious parametric assumptions; and (4) all pollutants are evaluated concurrently so that any synergistic effects are accounted for.

Torna alla lista dei seminari archiviati


07/06/2019

Sustainable development Goals, Climate Change and Hazardous events: statistical measures, challenges and innovations

Angela Ferruzza (ISTAT)

Sustainable development Goals, Climate Change and Hazardous events frameworks will be presented, considering their interactions. Challenges and innovations in the process of improving building capacity for increasing the statistical measurues will be considered. The 2019 SDGs Istat National Report will be presented.

Torna alla lista dei seminari archiviati


06/06/2019

Causal inference methods in environmental studies: challenges and opportunities

Francesca Dominici (Harvard University)

What if I told you I had evidence of a serious threat to American national security – a terrorist attack in which a jumbo jet will be hijacked and crashed every 12 days. Thousands will continue to die unless we act now. This is the question before us today – but the threat doesn't come from terrorists. The threat comes from climate change and air pollution. We have developed an artificial neural network model that uses on-the-ground air-monitoring data and satellite-based measurements to estimate daily pollution levels across the continental U.S., breaking the country up into 1-square-kilometer zones. We have paired that information with health data contained in Medicare claims records from the last 12 years, and for 97% of the population ages 65 or older. We have developed statistical methods and computational efficient algorithms for the analysis of over 460 million health records. Our research shows that short and long term exposure to air pollution is killing thousands of senior citizens each year. This data science platform is telling us that federal limits on the nation's most widespread air pollutants are not stringent enough. This type of data is the sign of a new era for the role of data science in public health, and also for the associated methodological challenges. For example, with enormous amounts of data, the threat of unmeasured confounding bias is amplified, and causality is even harder to assess with observational studies. These and other challenges will be discussed.

Torna alla lista dei seminari archiviati


03/06/2019

Efficient Data Processing in High Performance Big Data Platforms

Nicola Tonellotto (ISTI CNR)

Abstract: One of the largest and most used big data platforms used nowadays is represented by the Web search engines. With the ever-growing amount of data produced daily, all Web search companies must rely on distributed data storage and processing mechanisms hosted on computer clusters composed by thousands of processors and provided by large data centers. This distributed infrastructure allows to execute a vast amount of complex data processing that provide effective insights for the users, at near real-time with sub-second response times. Moreover, the most recent scientific advances in big data analytics exploit machine learning and artificial intelligence solutions. These solutions are particularly computationally expensive, and their energy consumption has a great impact on the overall energy consumption at a global scale. In this seminar we will discuss some recent investigations about (i) novel efficient algorithmic solutions to improve the usage of hardware resources (e.g., reducing response times and increasing the throughput) when complex machine learned models for processing large data collections and (ii) online management of computational load to reduce the energy consumption by automatically switching among their available CPU frequencies to adapt to external operational conditions.
Short bio: Dr. Nicola Tonellotto (http://hpc.isti.cnr.it/~khast/) is a researcher within the High Performance Computing Lab at Information Science and Technologies Institute of the National Research Council of Italy. His main research interests include high performance big data platforms and information retrieval, focusing on efficiency aspects of query processing and resource management. Nicola has co-authored more than 60 papers on these topics in peer reviewed international journal and conferences. He lectures on Computer Architectures for BSc students and Distributed Enabling Platforms for MSc students at the University of Pisa. He was co-recipient of the ACM’s SIGIR 2015 Best Paper Award for the paper entitled “QuickScorer: a Fast Algorithm to Rank Documents with Additive Ensembles of Regression Trees”.

Torna alla lista dei seminari archiviati


30/05/2019

Alternatives to Aging Alone?: "Kinlessness" and the Potential Importance of Friends

Christine Mair (University of Maryland)

Increasing numbers of older adults cross-nationally are without children or partners in later life and may have greater reliance on non-kin (e.g., friends), although these patterns likely vary by country context. This paper hypothesizes that those without traditional kin and who live in countries with a stronger emphasis on friendship will have more friends. While these hypothesized patterns are consistent with interdisciplinary literatures, they have not been tested empirically and remain overlooked in current narratives on "aging alone." This study combines individual-level data from the Survey of Health, Ageing, and Retirement in Europe (SHARE, Wave 6) with aggregate nation-level data to estimate multilevel negative binomial models exploring number of friends among those aged 50+ across 17 countries. Those who lack kin report more friends, particularly in countries with a higher percentage of people who believe that friends are "very important" in life. This paper challenges dominating assumptions about "aging alone" that rely on lack of family as an indicator of "alone." Future studies should investigate how friendship is correlated with lack of kin, particularly in wealthier nations. Previous research may have overestimated risk in wealthier nations, but underestimated risk in less wealthy nations and/or more family-centered nations.

Torna alla lista dei seminari archiviati


28/05/2019

S. Bacci: Developments in the context of Item Response Theory models
A. Magrini: Linear Markovian models with distributed-lags to assess the economic impact of investments

Welcome seminar: Silvia Bacci & Alessandro Magrini

S. Bacci: Latent variable models are a wide family of statistical models based on the use of unobservable (i.e., latent) variables for multiple aims, such as, measuring unobservable traits, accounting for measurement errors, representing unobserved heterogeneity that arises with complex data structures (e.g., multilevel and longitudinal data). The class of latent variable models includes the Item Response Theory (IRT) models, which are adopted to measure latent traits, when individual responses to a set of categorical items are available. In such a context, the multidimensional Latent Class IRT (LC-IRT) models extend traditional IRT models for multidimensionality (i.e., presence of multiple latent traits) and discreteness of latent traits, which allows us to cluster individuals in unobserved homogeneous groups (latent classes). The general formulation of the class of models at issue is illustrated with details concerning the model specification, the estimation, and the model selection. Furthermore, some useful extensions to deal with real data are discussed: (i) introduction of individual covariates, (ii) model formulation to account for multilevel data structures, and (iii) model formulation to deal with missing item responses. The estimation of models at issue may be accomplished through two specific R packages. Finally, some applications in the educational setting are illustrated.

A. Magrini: Linear regression with temporally delayed covariates (distributed-lag linear regression) is a standard approach to assess the impact of investments on economic outcomes through time. Typically, constraints on the lag shapes are required to meet domain knowledge. For instance, the effect of an investment may be small at first, then it may reach a peak before diminishing to zero after some time lags. Polynomial lag shapes with endpoint constraints are typically exploited to represent such feature. A deeper analysis is directed towards the decomposition of the 'overall' impact into intermediate contributions and considers several multiple outcomes. Linear Markovian models are proposed for this task, where a set of distributed-lag linear regressions are recursively defined according to causal assumptions of the problem domain. The theory and the software are presented, several real-world applications are illustrated, and open issues are discussed.

Torna alla lista dei seminari archiviati


23/05/2019

Couples' transition to parenthood in Finland: A tale of two recessions

Chiara Comolli (University of Lausanne)

The question of how fluctuations in the business cycles and fertility are linked resurfaced in the aftermath of the Great Recession of 2008-09, when birth rates started declining in many countries. Finland, although affected to a much lesser extent than other regions of Europe, is no exception to this decline. However, previous macro-level research on the much stronger recession in Finland in the 1990s shows that, contrary to other developed countries, the typical pro-cyclical behavior of fertility in relation to the business cycle was absent. The objective of this paper is to test how a typical feature of both recessions at the individual level, labor market uncertainty, is linked to childbearing risk in Finland. In particular, I focus on the transition to first birth and on the explicit period comparison between the 1990s and the 2000s. I use Finnish population registers (1988- 2013) and adopt a dyadic couple perspective to assess the association between each partner's employment status and the transition to parenthood. Finally, I investigate how, differently in the two periods, the latter relationship changes depending on aggregate labor market conditions to test whether there was a change over time from counter- to pro- cyclicality of fertility in Finland.

Torna alla lista dei seminari archiviati


13/05/2019

Simple Structure Detection Through Bayesian Exploratory Multidimensional IRT Models

Lara Fontanella (Università di Chieti-Pescara)

In modern validity theory, a major concern is the construct validity of a test, which is commonly assessed through confirmatory or exploratory factor analysis. In the framework of Bayesian exploratory Multidimensional Item Response Theory (MIRT) models, we discuss two methods aimed at investigating the underlying structure of a test, in order to verify if the latent model adheres to a chosen simple factorial structure. This purpose is achieved without imposing hard constraints on the discrimination parameter matrix to address the rotational indeterminacy. The first approach prescribes a 2-step procedure. The parameter estimates are obtained through an unconstrained MCMC sampler. The simple structure is, then, inspected with a post-processing step based on the Consensus Simple Target Rotation technique. In the second approach, both rotational invariance and simple structure retrieval are addressed within the MCMC sampling scheme, by introducing a sparsity-inducing prior on the discrimination parameters. Through simulation as well as real-world studies, we demonstrate that the proposed methods are able to correctly infer the underlying sparse structure and to retrieve interpretable solutions.

Torna alla lista dei seminari archiviati


08/05/2019

Incorporating information and assumptions to estimate validity of case identification in healthcare databases and to address bias from measurement error in observational studies: a research plan to su

Rosa Gini (Agenzia Regionale di Sanità della Toscana)

Observational studies based on healthcare databases have become increasingly common. While methods to address confounding have been tailored to the nature of this data, methods to address bias from measurement error are less developed. However, it is acknowledged that study variables are not measured exactly in databases, since data is collected for purposes other than research, and the variables are measured based on recorded information only. Estimating indices of validity of a measurement, such as sensitivity, specificity, positive and negative predictive value, is commonly recommended, but rarely accomplished. In this talk we introduce a methodology of measuring variables, called component strategy, that has the potential of addressing this problem when multiple sources of data are available. We illustrate the examples of a chronic disease (type 2 diabetes), an acute disease (acute myocardial infarction) and an infectious disease (pertussis) measured in multiple European databases and we describe the effect of different measurements. We introduce some formulas that allow to analytically obtain some validity indices based on other validity indices and on observed frequencies. We sketch a research plan based on this strategy that aims at understanding when partial information on validity can be exploited to provide a full picture of measurement error, at quantifying the dependence on assumptions, at measuring associated uncertainty, and at addressing bias produced by quantified measurement error. We introduce the ConcePTION project, a 5-year project funded by the Innovative Medicines Initiative, that aims at building an ecosystem for better monitoring safety of medicines use in pregnancy and breastfeeding, based on procedures and tools to transform existing data into actionable evidence, resulting in better and timely information. The aim of the talk is to discuss the research plan and foster a collaboration, that may profit, in particular, the ConcePTION project.

Torna alla lista dei seminari archiviati


04/04/2019

Statistical Learning with High-dimensional Structurally Dependent Data

Tapabrata Maiti (Michigan State University)

Rapid development of information technology is making it possible to collect massive amounts of multidimensional, multimodal data with high dimensionality in diverse fields of science and engineering. New statistical learning and data mining methods have been developing accordingly to solve challenging problems arising out of these complex systems. In this talk, we will discuss a specific type of statistical learning, namely the problem of feature selection and classification when the features are high dimensional and structured, specifically, they are spatio-temporal in nature. Various machine learning techniques are suitable for this type of problems although the underlying statistical theories are not well established. We will discuss some recently developed techniques in the context of specific examples arising in neuroimaging study.

Torna alla lista dei seminari archiviati


18/03/2019

Analysis and automatic detection of hate speech: from pre-teen Cyberbullying on WhatsApp to Islamophobic discourse on Twitter

Rachele Sprugnoli

The widespread use of social media yields a huge number of interactions on the Web. Unfortunately, social media messages are often written to attack specific groups of users based on their religion, ethnicity or social status. Due to the massive rise of hateful, abusive, offensive messages, platforms such as Twitter and Facebook have been searching for solutions to tackle hate speech. As a consequence, the amount of research targeting the detection of hate speech, abusive language and cyberbullying also shows an increase. In this talk we will present two projects in the field of hate speech analysis and detection: CREEP which aims at identifying and preventing the possible negative impacts of cyberbullying on young people and HateMeter, whose goal is to increase the efficiency and effectiveness of NGOs in preventing and tackling Islamophobia at EU level. In particular, we will describe the language resources and technologies under development in the two projects and we will show two demos based on Natural Language Processing tools.

Torna alla lista dei seminari archiviati


05/03/2019

Using marginal models for structure learning

Sung-Ho Kim (Dept of Mathematical Sciences, KAIST)

Structure learning for Bayesian networks has been made in a heuristic mode in search of an optimal model to avoid an explosive computational burden. A structural error which occurred at a point of structure learning may deteriorate its subsequent learning. In the talk, a remedial approach to this error-for-error process will be introduced by using marginal model structures. The remedy is made by fixing local errors in structure in reference to the marginal structures. In this sense, the remedy is called a marginally corrective procedure. A new score function is also introduced for the procedure which consists of two components, the likelihood function of a model and a discrepancy measure in marginal structures. The marginally corrective procedure compares favorably with one of the most popular algorithms in experiments with benchmark data sets.

Torna alla lista dei seminari archiviati


26/02/2019

Evidence of bias in randomized clinical trials of hepatitis C interferon therapies

Massimo Attanasio (Università degli Studi di Palermo)

Introduction: Bias may occur in randomized clinical trials in favor of the new experimental treatment because of unblinded assessment of subjective endpoints or wish bias. Using results from published trials, we analyzed and compared the treatment effect of hepatitis C antiviral interferon therapies experimental or control. Methods: Meta-regression of trials enrolling naive hepatitis C virus patients that underwent four therapies including interferon alone or plus ribavirin during past years. The outcome measure was the sustained response evaluated by transaminases and/or hepatitis C virus-RNA serum load. Data on the outcome across therapies were collected according to the assigned arm (experimental or control) and to other trial and patient-level characteristics. Results: The overall difference in efficacy between the same treatment labeled experimental or control had a mean of + 11.9% (p < 0.0001). The unadjusted difference favored the experimental therapies of group IFN-1 (+ 6%) and group IFN-3 (+ 10%), while there was no difference for group IFN-2 because of success rates from large multinational trials. In a meta-regression model with trial-specific random effects including several trial and patient-level variables, treatment and arm type remained significant (p < 0.0001 and p = 0.0009 respectively) in addition to drug-schedule-related variables. Conclusion: Our study indicates the same treatment is more effective when labeled ‘‘experimental’’ compared to when labeled ‘‘control’’ in a setting of trials using an objective endpoint and even after adjusting for patient and study-level characteristics. We discuss several factors related to design and conduct of hepatitis C trials as potential explanations of the bias toward the experimental treatment.

Torna alla lista dei seminari archiviati


14/02/2019

A new aspect of Riordan arrays - II

Gi-Sang Cheon (Department of Mathematics, Sungkyunkwan University (Korea))

Let (S,\star) be a semigroup. A semigroup algebra ${\mathbb K}[S]$ of $S$ over a field ${\mathbb K}$ is the set of all linear combinations of finitely many elements of $S$ with coefficients in ${\mathbb K}$: \begin{eqnarray*} {\mathbb K}[S]=\left\{\sum_{\alpha\in S}c_\alpha \alpha| c_\alpha\in {\mathbb K}\right\}. \end{eqnarray*} The ring of formal power series over a field ${\mathbb K}[[t]]$ together with convolution is an example of semigroup. It suggests that Riordan arrays over ${\mathbb K}[[t]]$ can be generalized to a semigroup algebra ${\mathbb K}[S]$. Furthermore, by using the fact that lattice is a partially ordered set and a semigroup, the notion of {\it semi-Riordan arrays} over a semigroup algebra will be introduced in connection with lattice and poset. Then we will see that a Riordan array is the semi-Riordan array over the semigroup algebra ${\mathbb K}[S]$ where $S=\{0,1,2,\ldots\}$ is a semigroup together with usual addition which is a totally ordered set.

Torna alla lista dei seminari archiviati


14/02/2019

A new aspect of Riordan arrays - I

Gi-Sang Cheon (Department of Mathematics, Sungkyunkwan University (Korea))

Let $R[[t]]$ be the ring of formal power series over a ring $R$. A Riordan array $(g,f)$ is an infinite lower triangular matrix constructed out of two functions $g,f\in R[[t]]$ with $f(0) = 0$ in such a way that its $k$th column generating function is $gf^k$ for $k\ge0$. The set of all invertible Riordan arrays forms a group called the {\it Riordan group}. In many contexts we see that the Riordan arrays are used as a machine to generate new approaches in combinatorics and its applications. Throughout this talk we will see the Riordan group and Riordan arrays from the several different points of view, e.g. group theory, combinatorics, graph theory, matrix theory, topology and Lie theory. In addition, we will see how Riordan arrays have been generalized and where they have been applied.

Torna alla lista dei seminari archiviati


13/02/2019

Statistica, nuovo empirismo e società nell’era dei Big Data

Giuseppe Arbia (Univ. Cattolica Sacro Cuore Roma, Univ. Svizzera Italiana, College of William & Mary di Williamsburg)

Gli ultimi decenni hanno visto un'esplosione formidabile nella raccolta dei dati e nella loro diffusione ed utilizzo in tutti i settori della società umana. Tale fenomeno è dovuto soprattutto alla accresciuta capacità di raccogliere ed immagazzinare informazioni in forma automatica attraverso fonti diversissime quali sensori di varia natura, satelliti, telefoni cellulari, internet, droni e molti altri ancora. È questo il fenomeno denominato la "rivoluzione dei big data".
Lo scopo del seminario è quello di descrivere, in termini accessibili anche ai non specialisti, il fenomeno dei big data e le sue possibili ripercussioni nella vita di ogni giorno. Inoltre si discuterà le conseguenze sulla Statistica: l'arte di conoscere la realtà e di prendere decisioni sulla base di dati empirico-osservazionali. Presentazione del libro "Statistica, nuovo empirismo e società nell'era dei Big Data" di Giuseppe Arbia, Edizioni Nuova Cultura (2018).

Torna alla lista dei seminari archiviati


01/02/2019

Cattuto: High-resolution social networks: measurement, modeling and applications
Paolotti: It takes a village - how collaborations in data science for social good can make a difference

Doppio seminario: Ciro Cattuto & Daniela Paolotti (ISI Foundation)

Ciro Cattuto: Digital technologies provide the opportunity to quantify specific human behaviors with unprecedented levels of detail and scale. Personal electronic devices and wearable sensors, in particular, can be used to map the network structure of human close-range interactions in a variety of settings relevant for research in computational social science, epidemiology and public health. This talk will review the experience of the SocioPatterns collaboration (www.sociopatterns.org), an international effort aimed at measuring and studying high-resolution human and animal social networks using wearable proximity sensors. I will discuss technology requirements and measurement experiences in diverse environments such as schools, hospitals and households, including recent work in low-resource rural settings in Africa. I will discuss the complex features found in empirical temporal networks and show how methods from network science and machine learning can be used to detect structures and to understand the role they play for dynamical processes, such as epidemics, occurring over the network. I will close with an overview of future research directions and applications.

Paolotti: The unprecedented opportunities provided by data science in all the areas of human knowledge become even more evident when applied to the fields of social innovation, international development and humanitarian aid. Using social media data to study malnutrition and obesity in children in developing countries, using mobile phones digital traces to understand women mobility for safety and security, harvesting search engine queries to study suicide among young people in India: these are only a few of the examples of how data science can be exploited to solve issues around many social problems and support global agencies and policymakers in implementing better and more impactful policies and interventions. Nevertheless, data scientists alone cannot be successful in this complex effort. Greater access to data, more collaboration between public and private sector entities, and an increased ability to analyze datasets are needed to tackle these society's greatest challenges. In this talk, we will cover examples of how actors from different entities can join forces around data and knowledge to create public value with an impact on global societal issues and set the path to accelerate the harnessing of data science for social good.

Torna alla lista dei seminari archiviati


24/01/2019

CONJUGATE BAYES FOR PROBIT REGRESSION VIA UNIFIED SKEW-NORMALS

Daniele Durante (Department of Decision Sciences, Bocconi University)

Regression models for dichotomous data are ubiquitous in statistics. Besides being useful for inference on binary responses, such methods are also fundamental building-blocks in more complex classification strategies covering, for example, Bayesian additive regression trees (BART). Within the Bayesian framework, inference proceeds by updating the priors for the coefficients, typically set to be Gaussians, with the likelihood induced by probit or logit regressions for the binary responses. In this updating, the apparent absence of a tractable posterior has motivated a variety of computational methods, including Markov Chain Monte Carlo (MCMC) routines and algorithms which approximate the posterior. Despite being routinely implemented, current MCMC strategies face mixing or time-inefficiency issues in large p and small n studies, whereas approximate routines fail to capture the skewness typically observed in the posterior. In this seminar, I will prove that the posterior distribution for the probit coefficients has a unified skew-normal kernel, under Gaussian priors. Such a novel result allows efficient Bayesian inference for a wide class of applications, especially in large p and small-to-moderate n studies where state-of-the-art computational methods face notable issues. These advances are outlined in a genetic study, and further motivate the development of a wider class of conjugate priors for probit models along with methods to obtain independent and identically distributed samples from the unified skew-normal posterior. Finally, these results are also generalized to improve classification via BARTs.

Torna alla lista dei seminari archiviati


22/01/2019

Heterogeneity in dynamics of risk accumulation: the case of unemployment

Raffaele Grotti (European University Institute)

The paper aims at providing a contribution to the study of socioeconomic risks. It studies different mechanisms that account for the stickiness of the unemployment condition and, relatedly, for the longitudinal accumulation of unemployment experiences. In particular, the paper disentangles two mechanisms: ‘genuine state dependence’ dynamics and unobserved characteristics at the individual level; and accounts both for their relative weight and for their possible interplay in shaping the accumulation of unemployment risks over time. Dynamics of accumulation are investigated, showing their distribution among different workforce segments defined both in terms of observable and unobservable characteristics. This is done applying correlated dynamic random-effects probit models and providing statistics for protracted unemployment exposure. The analysis makes use of EU-SILC data from 2003 to 2015 for four European countries (DK, FR, IT and UK). Empirical results indicate that both unobserved heterogeneity and genuine state dependence are relevant and partly independent factors in explaining the reiteration and accumulation of unemployment and long-term unemployment risks over time. The analysis shows how the weight of these two components varies at both macro and micro level, according to different labour market and institutional settings and depending on individual endowments. Finally, the paper discusses how the distinction between unobserved heterogeneity and genuine state dependence and the evaluation of their possible interplay can provide useful insights with respect to theories of cumulative advantages and with respect to an efficient design of policy measures aimed at contrasting the accumulation of occupational penalties over time.

Torna alla lista dei seminari archiviati


16/01/2019

Lupparelli: On log-mean linear regression graph models
Bocci: Statistical modelling of spatial data: some results and developments

Welcome Seminars: Monia Lupparelli & Chiara Bocci

Lupparelli: This talk aims to illustrate the log-mean linear parameterization for multivariate Bernoulli distributions which represents the counterpart for marginal modelling of the well-established log-linear parameterization. In fact, the log-mean transformation is defined by the same log-linear mapping applied on the space of the mean parameter (the marginal probability vector) rather than on the simplex (the space of the joint probability vector). The class of log-mean linear models, under suitable zero constraints, corresponds to the class of discrete bi-directed graph models as well as the class of log-linear models is used to specify discrete undirected graph models. Moreover, the log-mean linear transformation provides a novel link function in multivariate regression settings with discrete response variables. The resulting class of log-mean linear regression models is used for modelling regression graphs via a sequence of marginal regressions where the coefficients are linear functions of log-relative risk parameters. The class of models will be better illustrated in two different contexts: (i) for assessing the effect of HIV-infection on multimorbidity and (ii) to derive the relationship between marginal and conditional relative risk parameters in regression settings with multiple intermediate variables.

Bocci: TBA

Torna alla lista dei seminari archiviati


19/12/2018

Flexible and Sparse Bayesian Model-Based Clustering

Bettina Grün (Johannes Kepler University, Linz, Austria)

Finite mixtures of multivariate normal distributions constitute a standard tool for clustering multivariate observations. However, selecting the suitable number of clusters, identifying cluster-relevant variables as well as accounting for non-normal shapes of the clusters are still challenging issues in applications. Within a Bayesian framework we indicate how suitable prior choices can help to solve these issues. We achieve this considering only prior distributions that have the characteristics that they are conditionally conjugate or can be reformulated as hierarchical priors, thus allowing for simple estimation using MCMC methods with data augmentation.

Torna alla lista dei seminari archiviati


18/12/2018

Bayesian Structure Learning in Multi-layered Genomic Networks

Min Jin Ha (UT MD Anderson Cancer Center)

Integrative network modeling of data arising from multiple genomic platforms provides insight into the holistic picture of the interactive system, as well as the flow of information across many disease domains. The basic data structure consists of a sequence of hierarchically ordered datasets for each individual subject, which facilitates integration of diverse inputs, such as genomic, transcriptomic, and proteomic data. A primary analytical task in such contexts is to model the layered architecture of networks where the vertices can be naturally partitioned into ordered layers, dictated by multiple platforms, and exhibit both undirected and directed relationships. We propose a multi-layered Gaussian graphical model (mlGGM) to investigate conditional independence structures in such multi-level genomic networks. We use a Bayesian node-wise selection (BANS) framework that coherently accounts for the multiple types of dependencies in mlGGM, and using variable selection strategies, allows for flexible modeling, sparsity, and incorporation of edge-specific prior knowledge. Through simulated data generated under various scenarios, we demonstrate that BANS outperforms other existing multivariate regression-based methodologies. We apply our method to estimate integrative genomic networks for key signaling pathways across multiple cancer types, find commonalities and differences in their multi-layered network structures, and show translational utilities of these integrative networks.

Torna alla lista dei seminari archiviati


04/12/2018

"Exit this way": Persistent Gender & Race Differences in Pathways Out of In-Work Poverty in the US

Emanuela Struffolino (WZB - Berlin Social Science Center)

We analyze the differences by gender and race in long-term pathways out of in-work poverty. Such differences are understood as "pathway gaps", analogous with gender and racial income gaps studied in labor-market economics and sociology. We combine data from three high-quality data sources (NLSY79, NLSY97, PSID) and apply sequence analysis multistate models to 1) empirically identify pathways out of in-work poverty, 2) estimate the associations between gender and race with each distinct pathway, and 3) attempt to account for these gender and race differences. We identify five different pathways out from in-work poverty. While men and non-Hispanic whites are most likely to experience successful long-term transitions out of poverty within the labor market, women and African Americans are more likely to only temporarily exit in-work poverty, commonly by exiting the labor market. These "pathway gaps" persist even after controlling for selection into in-work poverty, educational attainment, and family demographic behavior.

Torna alla lista dei seminari archiviati


21/11/2018

Estimating Causal Effects On Social Networks

Laura Forastiere (Yale Institute for Network Science - Yale University)

In most real-world systems units are interconnected and can be represented as networks consisting of nodes and edges. For instance, in social systems individuals can have social ties, family or financial relationships. In settings where some units are exposed to a treatment and its effects spills over connected units, estimating both the direct effect of the treatment and spillover effects presents several challenges. First, assumptions on the way and the extent to which spillover effects occur along the observed network are required. Second, in observational studies, where the treatment assignment is not under the control of the investigator, confounding and homophily are potential threats to the identification and estimation of causal effects on networks. Here, we make two structural assumptions: i) neighborhood interference, which assumes interference to operate only through a function of the the immediate neighbors’ treatments, ii) unconfoundedness of the individual and neighborhood treatment, which rules out the presence of unmeasured confounding variables, including those driving homophily. Under these assumptions we develop a new covariate-adjustment estimator for treatment and spillover effects in observational studies on networks. Estimation is based on a generalized propensity score that balances individual and neighborhood covariates across units under different levels of individual treatment and of exposure to neighbors’ treatment. Adjustment for propensity score is performed using a penalized spline regression. Inference capitalizes on a three-step Bayesian procedure which allows taking into account the uncertainty in the propensity score estimation and avoiding model feedback. Finally, correlation of interacting units is taken into account using a community detection algorithm and incorporating random effects in the outcome model. All these sources of variability, including variability of treatment assignment, are accounted for in in the posterior distribution of finite-sample causal estimands.This is a joint work with Edo Airoldi, Albert Wu and Fabrizia Mealli.

Torna alla lista dei seminari archiviati


11/10/2018

Time-varying survivor average causal effects with semicompeting risks

Leah Comment (Department of Biostatistics, Harvard T.H. Chan School of Public Health)

In semicompeting risks problems, non-terminal time-to-event outcomes such as time to hospital readmission are subject to truncation by death. These settings are often modeled with parametric illness-death models, but evaluating causal treatment effects with hazard models is problematic due to the evolution of incompatible risk sets over time. To combat this problem, we introduce two new causal estimands: the time-varying survivor average causal effect (TV-SACE) and the restricted mean survivor average causal effect (RM-SACE). These principal stratum causal effects are defined among units that would survive regardless of assigned treatment. We adopt a Bayesian estimation procedure that is anchored to parameterization of illness-death models for both treatment arms but maintains causal interpretability. We outline a frailty specification that can accommodate within-person correlation between non-terminal and terminal event times, and we discuss potential avenues for adding model flexibility. This research is joint work with Fabrizia Mealli, Corwin Zigler, and Sebastien Haneuse.

Torna alla lista dei seminari archiviati


01/10/2018

Estimation of Multivariate Factor Stochastic Volatility Models by Efficient Method of Moments

Christian Muecher (University of Konstanz)

We introduce a frequentist procedure to estimate multivariate factor stochastic volatility models. The estimation is done in two steps. First, the factor loadings, idiosyncratic variances and unconditional factor variances are estimated by approximating the dynamic factor model with a static one. Second, we apply the Efficient Method of Moments with GARCH(1,1) as an auxiliary model to estimate the stochastic volatility parameters governing the dynamic latent factors and idiosyncratic noises. Based on various simulations, we show that our procedure outperforms existing approaches in terms of accuracy and efficiency and it has clear computational advantages over the existing Bayesian methods.

Torna alla lista dei seminari archiviati


09/07/2018

Graph Algorithms for Data Analysis

Andrea Marino (Università di Pisa)

Real world data can be very often modelled with networks whose aim is to represent relationships among real world entities. In this talk we will discuss efficient algorithmic tools for the analysis of big real world networks. We will overview some algorithms for the analysis of huge graphs focused on data gathering, degrees of separation computation, centrality measures, community discovery, and novel similarity measures among entities.

Torna alla lista dei seminari archiviati


18/06/2018

Data-driven transformations in small area estimation: An application with the R-package emdi

Timo Schimid (Freie Universität Berlin)

Small area models typically depend on the validity of model assumptions. For example, a commonly used version of the Empirical Best Predictor relies on the Gaussian assumptions of the error terms of the linear mixed regression model, a feature rarely observed in applications with real data. The present paper proposes to tackle the potential lack of validity of the model assumptions by using data-driven scaled transformations as opposed to ad-hoc chosen transformations. Different types of transformations are explored, the estimation of the transformation parameters is studied in detail under the linear mixed regression model and transformations are used in small area prediction of linear and non-linear parameters. Mean squared error estimation that accounts for the uncertainty due to the estimation of the transformation parameters is explored using bootstrap approaches. The proposed methods are illustrated using real survey and census data for estimating income deprivation parameters for municipalities in Mexico with the R-package emdi. The package enables the estimation of regionally disaggregated indicators using small area estimation methods and includes tools for (a) customized parallel computing, (b) model diagnostic analyses, (c) creating high quality maps and (d) exporting the results to Excel and OpenDocument Spreadsheets are included. Simulation studies and the results from the application show that using carefully selected, data-driven transformations can improve small area estimation.

Torna alla lista dei seminari archiviati


14/06/2018

A brief history of linear quantile mixed models and recent developments in nonlinear and additive regression

Marco Geraci (University of South Carolina)

What follows is a story that began about sixteen years ago in Viale Morgagni (with some of the events taking place in a cottage of the Montalve’s estate). In this talk, I will retrace the steps that led me to develop linear quantile mixed models (LQMMs). These models have found application in public health, preventive medicine, virology, genetics, anesthesiology, immunology, ophthalmology, orthodontics, cardiology, pharmacology, biochemistry, biology, marine biology, environmental, climate and marine sciences, psychology, criminology, gerontology, economics and finance, linguistic and lexicography. Supported by a grant from the National Institute of Child Health and Human Development, I recently extended LQMMs to nonlinear and additive regression. I will present models, estimation algorithms and software, along with a few applications.

Torna alla lista dei seminari archiviati


18/05/2018

Transcompiling and Analysing Firewalls

Letterio Galletta (IMT Lucca)

Configuring and maintaining a firewall configuration is notoriously hard. On the one hand, network administrators have to know in detail the policy meaning, as well as the internals of the firewall systems and of their languages. On the other hand, policies are written in low-level, platform-specific languages where firewall rules are inspected and enforced along non trivial control flow paths. Further difficulties arise from Network Address Translation (NAT), an indispensable mechanism in IPv4 networking for performing port redirection and translation of addresses. In this talk, we present a transcompilation pipiline that helps system administrators reason on policies, port a configuration from a system to another and perform refactoring, e.g., removing useless or redundant rules. Our pipeline and its correctness are based on IFCL, a generic configuration language equipped with a formal semantics. Relying on this language we decompile a real firewall configuration into an abstract specification, which exposes the meaning of the configuration and enables us to carry out analysis and recompilation.

Torna alla lista dei seminari archiviati


18/05/2018

Empirical Bayes Estimation of Species Distribution with Overdispersed Data

Fabio Divino (Dipartimento di Bioscienze e Territorio, Università del Molise)

The estimation of species distributions is fundamental in the assessment of biodiversity and in monitoring of environmental conditions. In this work we present preliminary results in Bayesian inference of multivariate discrete distributions with applications in biomonitoring. In particular, we consider the problem of the estimation of species distributions when collected data are affected by overdispersion. Ecologists have often to deal with data which exhibit a variability that differs from what they expect on the basis of the model assumed to be valid. The phenomenon is known as overdispersion if the observed variability exceeds the expected variability or underdispersion if it is lower than expected. Such differences between observed and expected variation in the data can be interpreted as failures of some of the basic hypotheses of the model. The problem is very common when dealing with counts data for which the variability is directly connected with the magnitude of the phenomenon. Overdispersion is more common than underdispersion and can be originated by several causes. Among them, the most important and relevant in ecolgy is the overdispersion by heterogeneity of the population with respect to the assumed model. An interesting approach to account for the overdispersion by heterogeneity is the method based on compound models. The idea of compound models originated during the 20s and it concerns the possibility to mix a model of interest with a mixing probability measure. Therefore, the mixture resulting by the integration generates a new model that allows to account for larger variation than that one included in the reference model. A typical application of this approach is the well known Gamma-Poisson compound model that generates the Negative Binomial model. In this work we present the use of a double compound model, a multivariate Poisson distribution combined with Gamma and Dirichlet models, in order to account for the presence of overdispersion . Some results of simulations will compare the Gamma-Dirichlet-Poisson model with the reference Multinomial model. Further, some applications in biomonitoring of aquatic environments are presented. Acknowledgement This work is part of a joint research with Salme Karkkainen, Johanna Arje and Antti Penttinen (University of Jyväskylä) and Kristian Meissner (SYKE, Finland).

Torna alla lista dei seminari archiviati


11/05/2018

From sentiment to superdiversity on Twitter

Alina Sirbu (Dip. Informatica, Universita' di Pisa)

Superdiversity refers to large cultural differences in a population due to immigration. In this talk we introduce a superdiversity index based on Twitter data and lexicon based sentiment analysis, using ideas from epidemic spreading and opinion dynamics models. We show how our index correlates with official immigration statistics available from the European Commission’s Joint Research Center, and we compare it with various other measures computed from the same Twitter data. We argue that our index has predictive power in regions where exact data on immigration is not available, paving the way for a nowcasting model of immigration.

Torna alla lista dei seminari archiviati


18/04/2018

Convolution Autoregressive Processes and Excess Volatility

Umberto Cherubini (University of Bologna)

We discuss the economic model and the econometric properties of the Convolution Autoregressive Process of order 1 (C-AR(1)), with focus on the simplest gaussian case. This is a first order autoregressive property in which the innovations are dependent of the lagged value of the process. We show that the model may be generated by the presence of extrapolative bias in expectations. Extrapolative expectations bring about excess volatility and excess persistence in the dynamics of the variable. While excess volatility cannot be identified if one only observes the time series of the variable, identification can be achieved if the expectations of the variable are observed in the forward markets. We show that the model is well suited to generate the excess variance of long maturity prices documented in the Giglio-Kelly variance ratio test. We finally discuss possible extensions of the model beyond the gaussian case both by changing the specification of the expectations model and setting a non linear data generating process for the fundamental process.

Torna alla lista dei seminari archiviati


16/03/2018

Heterogeneous federated data center for research: from design to operation

Sergio Rabellino (University of Torino)

Bringing a large datacenter from design to operation is complex in all the world. In Italy it is a challenge that starts playing a snakes and ladders game against bureaucracy and colleagues for shaping the European tenders, the government board, the price list, the access policy and eventually all the software needed to operate the datacenter. The talk review all these aspects with experience of designing the University of Torino Competency Center in Scientific Computing serving over 20 departments with its 1M€ OCCAM platform, and writing the HPC4AI project charter recently funded with 4.5M€ by Piedmont Region and serving over 10 departments in two universities (University and Technical University of Turin).

Torna alla lista dei seminari archiviati


16/03/2018

Designing a heterogeneous federated data center for research

Marco Aldinucci (Computer Science Department, University of Torino)

The advance of high-speed networks and virtualization techniques make it possible to leverage on economy of scale and consolidate all servers of a large organisation in few, powerful, energy-efficient data centers, which also synergically work with public cloud. In this, research organisations as universities exhibit distinguishing features of both technical and sociological nature. Firstly, it hardly exists a compute workload and resource usage pattern that are dominant across many different disciplines and departments. This makes the design of the datacenter and the access policy so delicate to require to address open research questions. Secondly, scientists are generally inclined to complain about all that is not total freedom to do what they want to urgently do, including strange, greedy and dangerous behaviours for the data and the system themselves.

Torna alla lista dei seminari archiviati


22/02/2018

Job Instability and Fertility during the Economic Recession: EU Countries

Isabella Giorgetti (Dep. of Economics and Social Sciences, Università Politecnica delle Marche)

The trends of decline in TFR varied widely across EU countries. Exploiting individual data from the longitudinal EU-SILC dataset (2005-2013), this study investigates the cross-country effect of job instability on the couple’s choice of having one (more) child. I build job instability measure for both partners by the first-order lag of own economic activity status in labour market (holding temporary, permanent contract, or being unemployed). In order to account for the unobserved heterogeneity and potential presence of endogeneity, I estimate, under sequential moment restriction, a Two Stage Least Square Model (2SLS) in first differences. Thus, I group European countries according to six different welfare regimes and I estimate the heterogeneous effects of instability in the labour market on childbearing in a comparative framework. The principal result is that the cross-country average effect of job instability on couples’ fertility decisions has not statistical relevance due to the huge countryspecific fixed effects. Distinguishing between welfare regimes, institutional settings and social active policies reveal a varying family behaviour in fertility. In low-fertility countries, however, it is confirmed that the impact of parents’ successful labour market integration might be ambiguous and it might be due to the scarcity of child care options and/or cultural norms.

Torna alla lista dei seminari archiviati


15/02/2018

Data Science and Our Environment

Francesca Dominici (Harvard T.H. Chan School of Public Health)

What if I told you I had evidence of a serious threat to American national security--a terrorist attack in which a jumbo jet will be hijacked and crashed every 12 days. Thousands will continue to die unless we act now. This is the question before us today--but the threat doesn’t come from terrorists. The threat comes from climate change and air pollution. We have developed an artificial neural network model that uses on-the-ground air-monitoring data and satellite-based measurements to estimate daily pollution levels across the continental U.S., breaking the country up into 1-square-kilometer zones. We have paired that information with health data contained in Medicare claims records from the last 12 years, and for 97% of the population ages 65 or older. We have developed statistical methods and computational efficient algorithms for the analysis over 460 million health records. Our research shows that short and long term exposure to air pollution is killing thousands of senior citizens each year. This data science platform is telling us that federal limits on the nation’s most widespread air pollutants are not stringent enough. This type of data is the sign of a new era for the role of data science in public health, and also for the associated methodological challenges. For example, with enormous amounts of data, the threat of unmeasured confounding bias is amplified, and causality is even harder to assess with observational studies. These and other challenges will be discussed.

Torna alla lista dei seminari archiviati


17/01/2018

Seminario di presentazione: principali linee di ricerca e risultati ottenuti

Francesca Giambona (DiSIA)

Nel corso del seminario verranno presentate le principali linee di ricerca e le principali tematiche oggetto di analisi e di ricerca del percorso scientifico. In particolare verranno descritte le principali metodologie statistiche utilizzate per analizzare i fenomeni oggetto di studio e presentati i più importanti risultati empirici ottenuti. Infine, verranno brevemente introdotti i temi di ricerca attualmente oggetto di studio.

Torna alla lista dei seminari archiviati


17/01/2018

Latent variable modelling for dependent observations: some developments

M. Francesca Marino (DiSIA)

When dealing with dependent data, standard statistical methods cannot be directly used for the analysis as they may produce biased results and lead to misleading inferential conclusions. In this framework, latent variables are frequently used as a tool for capturing dependence and describing association. During the seminar, some developments in the context of latent variable modelling will be presented. Three main area of research will be covered, namely quantile regression, social network analysis, and small area estimation. The main results will be presented and some hints on current research and future developments will be given

Torna alla lista dei seminari archiviati


12/01/2018

SAM Based Analysis of the Impact of VAT Rate Cut on the Economic System of China after the Tax Reform

Ma Kewei (Shanxi University of Finance and Economics)

China completed on May 2016 the tax reform, and from a bifurcated system based on business tax (BT) and Value Added Tax (VAT), moved to a VAT entirely based system. The effects of this reform are still under analysis, regarding both the alleviation of tax burden on industry and service sectors and the economic improvements. Currently, it seems there is a common agreement that, except some minor cases worth of further deepening, in both respects these effects are positive. It is interesting to analyze the impact that in a VAT based system regime would have a reduction of the VAT rate on some key industry. To our knowledge, no analysis in this direction has been performed so far. In this paper, we try to fill this gap and analyze the effect of VAT rate cuts on selected industries on the whole economic system using an impact multiplier model based on a purposively elaborated Social Accounting Matrix (SAM) 2015 for China, which also allows to conduct a preliminary analysis of the structure of China’s economic system. (Joint work with Guido Ferrari and Zichuan Mi)

Torna alla lista dei seminari archiviati


29/11/2017

Probabilistic Distance Algorithm: some recent developments in data clustering

Francesco Palumbo (Università di Napoli Federico II)

Distance-based clustering, part of the not model based methods, optimize a global criterion based on the distance among clusters. The most widely known distance-based method is k-means clustering (MacQueen, 1967) and several extensions of this method have recently been proposed (Vichi and Kiers, 2001; Rocci et al., 2011; Timmerman et al., 2013). These extensions overcome issues arising from the correlation between variables. However, k-means clustering, and extensions thereof, can fail when clusters do not have a spherical shape and/or some extreme points are present, which tend to affect the group means. Iyigun (2007) and Ben-Israel and Iyigun (2008) propose a non-hierarchical distance-based clustering method, called probabilistic distance (PD) clustering, that overcomes these issues. Tortora et al. Tortora et al. (2016) propose a factor version of the method to deal with high-dimensional data. In PD-clustering, the number of clusters K is assumed to be a priori known, and a wide review on how to choose K can be found in. Given some random centres, the probability of any point belonging to a cluster is assumed to be inversely proportional to the distance from the centre of that cluster Iyigun (2007). The aim of the seminar is to illustrate the PD algorithm and some recent applications in the data clustering framework in the model-based (Rainey et al., 2017) and not model-based perspective. References

Torna alla lista dei seminari archiviati


15/11/2017

Bayesian Multilevel Latent Class Models for the Multiple Imputation of Nested Categorical Data

Davide Vidotto (Tilburg University)

Multiple imputation of multilevel data (i.e., data collected from different groups) does not only require to take correlations among variables into account, but also to consider possible dependencies between units coming from the same group. While a number of imputation models have been proposed in the literature for continuous data, existing methods for multilevel categorical data, such as the JOMO imputation method, still have limitations. For instance, JOMO only considers pairwise relationships between variables, and uses default priors that can affect the quality of the imputations in case of small sample sizes. With the present work, we propose using Multilevel Latent Class models to perform multiple imputation of missing multilevel categorical data. The model is flexible enough to retrieve original (complex) associations of the variables at hand while respecting the data hierarchy. The model is implemented under a Bayesian framework and estimated via Gibbs sampling, a natural choice for multiple imputation applications. After formally introducing the model, we will show the results of a simulation study in which model performance is assessed, and compared with the listwise deletion and JOMO methods. Results indicate that the Bayesian Multilevel Latent Class model is able to recover unbiased and efficient parameter estimates of the analysis model considered in our study.

Torna alla lista dei seminari archiviati


11/07/2017

Resilient and Secure Cyber Physical Systems: Matching the Present and the Future!

Andrea Bondavalli (DiMAI)

Un sistema cyber-fisico (Cyber Physical System) è un sistema in cui gli elementi computazionali interagiscono strettamente con le entità fisiche tramite sensori e attuatori, controllando così processi individuali, organizzativi o meccanici tramite l'utilizzo delle tecnologie dell'informazione e della comunicazione (computer, software e reti). Tali sistemi sono tipicamente automatizzati, intelligenti e collaborativi, e molti di essi richiedono elevati livelli di resilienza e di sicurezza che assicurino la sopravvivenza dei sistemi in presenza di anomalie casuali, attacchi deliberati e, in generale, eventi critici imprevisti.
Il seminario sarà organizzato in due parti:
- Nella prima parte saranno presentate le caratteristiche principali dei Cyber Physical System, soffermandosi in particolare sugli aspetti legati alla resilienza e sicurezza di tali sistemi, e saranno presentati esempi concreti di CPS in diversi domini applicativi, dall'Internet delle cose (IoT - Internet of Things), ai Sistemi di Sistemi, alle tematiche di Industria 4.0.
- Nella seconda parte sarà presentato il nuovo Curriculum in Resilient and Secure Cyber Physical Systems del Corso di Laurea Magistrale in Informatica, chiarendone gli aspetti caratterizzanti, gli obiettivi formativi e gli sbocchi professionali.

Torna alla lista dei seminari archiviati


21/06/2017

Stronger Instruments and Refined Covariate Balance in an Observational Study of the Effectiveness of Prompt Admission to the ICU

Luke Keele (Georgetown University)

Instrumental Variable (IV) methods, subject to appropriate identification assumptions, allow for consistent estimation of causal effects in observational data in the presence of unobserved confounding. Near-far matching has been proposed as one analytic method to improve inference by strengthening the effect of the instrument on the exposure and balancing observable characteristics between groups of subjects with low and high values of the instrument. However, in settings with hierarchical data (e.g. patients nested within hospitals), or where several covariate interactions must be balanced, conventional near-far matching algorithms may fail to achieve the requisite covariate balance. We develop a new matching algorithm, that combines near-far matching with refined covariate balance, to balance large numbers of nominal covariates while also strengthening the IV. This extension of near-far matching is motivated by a UK case study that aims to identify the causal effect of prompt admission to the Intensive Care Unit on 7-day and 28-day mortality.

Torna alla lista dei seminari archiviati


16/06/2017

Assessing the Efficacy of Intrapartum Antibiotic Prophylaxis for Prevention of Early-Onset Group B Streptococcal Disease through Propensity Score Design

Elizabeth R. Zell (Centers for Disease Control and Prevention)

Observational data can assist in answering hard questions about disease prevention. Early-onset neonatal group B streptococcal disease (EOGBS) can be prevented by intrapartum antibiotic prophylaxis (IAP). Clinical trials demonstrated efficacy of beta-lactam agents for a narrow population of women. Questions about effectiveness of antibiotic durations <4 hours, and agents appropriate for penicillin allergic women remain. We applied propensity score design methods to sample survey data on EOGBS cases and over 7000 non-cases from ten US states in 2003-2004, to match infants exposed to IAP with infants who were not exposed. Antibiotic efficacy was estimated for different antibiotic classes and durations before delivery. Our analysis supports the recommendation that Beta-lactam intrapartum prophylaxis of at least four hours before delivery remains the primary treatment; less than four hours and clindamycin prophylaxis are not as effective in preventing EOGBS.

Torna alla lista dei seminari archiviati


15/06/2017

Spatial Chaining of Price Indexes to Improve International Comparisons of Prices and Real Incomes

D.S. Prasada Rao (School of Economics, The University of Queensland, Brisbane, Australia )

The International Comparisons Program (ICP) compares the purchasing power of currencies and real income of almost all countries in the world. An ICP multilateral comparison uses as building blocks bilateral comparisons between all possible pairs of countries. These are then combined to obtain the overall global comparison. One problem with this approach is that some of the bilateral comparisons are typically of lower quality, and their inclusion therefore undermines the integrity of the multilateral comparison. Formulating multilateral comparisons as a graph theory problem, we show how quality can be improved by replacing bilateral comparisons with their shortest path spatially chained equivalents. We consider a number of different ways in which this can be done, and illustrate these methods using data from the 2011 round of ICP. We then propose criteria for comparing the performance of competing multilateral methods, and using these criteria demonstrate how spatial chaining improves the quality of the overall global comparison.

Torna alla lista dei seminari archiviati


15/06/2017

Netflix e Deep Learning

P. Crescenzi (Università di Firenze)

Seminario divulgativo su alcuni temi "caldi" dell'informatica:
• Come, tra decine di migliaia di film, qualcuno ci possa suggerire cosa vedere. E ci azzecca. Ovvero la magia dei sistemi di raccomandazione.
• Come si sia studiato il funzionamento dei neuroni del cervello umano per simularne il comportamento mediante le reti neurali e come queste reti, previo addestramento, possano comportarsi in modo intelligente. Ovvero la magia dell'intelligenza artificiale.

Torna alla lista dei seminari archiviati


09/06/2017

Exact P-values for Network Interference

Guido Imbens (Stanford)

We study the calculation of exact p-values for a large class of non-sharp null hypotheses about treatment effects in a setting with data from experiments involving members of a single connected network. The class includes null hypotheses that limit the effect of one unit's treatment status on another according to the distance between units; for example, the hypothesis might specify that the treatment status of immediate neighbors has no effect, or that units more than two edges away have no effect. We also consider hypotheses concerning the validity of sparsification of a network (for example based on the strength of ties) and hypotheses restricting heterogeneity in peer effects (so that, for example, only the number or fraction treated among neighboring units matters). Our general approach is to define an artificial experiment, such that the null hypothesis that was not sharp for the original experiment is sharp for the artificial experiment, and such that the randomization analysis for the artificial experiment is validated by the design of the original experiment. (with Susan Athey and Dean Eckles)

Torna alla lista dei seminari archiviati


27/04/2017

Introducing AWMA and its activities to promote mathematics among African women

Kifle Yirgalem Tsegaye (Department of Mathematics, Addis Ababa University, Ethiopia)

Archeological findings and their interpretations persuade us to think that Africa is not only the cradle of human kind, but along with ancient Mesopotamia and others, is also a birth place of mathematical and technological ideas. It is natural to ask - what happened to civilization, mathematics and technological sciences in contemporary Africa? A million-euro question, though this talk is about what African women in mathematics are doing to change, at least to improve their realities. Just like women around the world, African women, had always been denied access to important components of development like education, in particular mathematics and technological sciences, with excuses like - "these are strictly meant for men" kind of impositions and restrictions. Many had no choices but believe in it and collaborate in making only "good wives" or "attractive women" out of themselves. Things are changing in this regard, but not strong enough to liberate African women out of the stereotypical depiction of themselves and cope up in the world of science. It is believed that role models in every profession are important to convince youngsters that it is possible for them to be whoever they want. Having nation-wide, continent-wide or world-wide strong networks of women in the fields of mathematics and Technological Sciences, enables us destroy such stereotypes and open the gate wide enough for our girls to dance with all sciences, most of all in the fields of math and technological sciences. The talk is a brief introduction of: - African Women in Mathematics Association (AWMA) and its activities; - Ethiopian girls in the fields of Science, Mathematics, Engineering and Technology.

Torna alla lista dei seminari archiviati


29/03/2017

Le problematiche dei rischi estremi: modelli teorici e strumenti finanziario-attuariali

Marcello Galeotti (University of Florence)

1. Una definizione dinamica di rischio economico. Misure di rischio: VaR e Expected Shortfall. Il problema del calcolo delle misure di rischio. 2. La teoria dei valori estremi. Distribuzioni light e heavy tail. Tasso d’azzardo. Fluttuazioni di somme e massimi. L’eccesso medio. Distribuzione generalizzata di Pareto. 3. Strumenti finanziari innovativi per la gestione di rischi ambientali. Opzioni di progetto e Catastrophic bonds. Dinamiche interattive: possibilità di esiti “virtuosi” e di equilibri sub-ottimali. Un caso di studio: rischi di esondazione del fiume Arno nell’area e nella città di Firenze. 4. Un modello evolutivo per i rischi sanitari. Medicina difensiva, assicurazioni sanitarie, azioni legali. Un modello di gioco evolutivo. Il ruolo dei premi assicurativi. Comportamenti asintotici dipendenti dai principi di calcolo del premio.

Torna alla lista dei seminari archiviati


23/03/2017

Endogenous Significance Levels in Finance and Economics

Alessandro Palandri (DISiA - University of Florence)

This paper argues that rational agents who do not know the model's parameters and have to proceed to their estimation will treat the significance level as a choice variable. Calculating costs associated to errors of Types I and II, rational agents will choose significance levels that maximize expected utility. The misalignment of the investigators' standard statistical significance levels to those that are optimal for agents has profound implications for the empirical tests of the model itself. Specifically, empirical studies could reject models in terms of statistically significant misprices when in fact the models are true: the misprices are not significant for the agents, no expected prontable intervention, no force to bring the system back to equilibrium.

Torna alla lista dei seminari archiviati


23/03/2017

Bayesian methods in Biostatistics

Francesco Stingo (University of Florence)

In this talk I will review some Bayesian methods for bio-medical applications I have developed in the recent years. These methods can be classified in 4 research areas: 1) graphical models for complex biological networks, 2) hierarchical models for data integration (integromics and imaging genetics), 3) power-prior approaches for personalized medicine, 4) change-point models for cancer early detection.

Torna alla lista dei seminari archiviati


09/03/2017

Validity of case-finding algorithms for diseases in database and multi-database studies

Rosa Gini (Agenzia Regionale di Sanità della Toscana, Firenze)

In database studies in pharmacoepidemiology and health services research, variables that identify a disease are derived from existing data sources, by mean of data processing. The ‘true’ variables should be conceptualized as unobserved quantities, and the study variables entering the actual analysis as measurements, resulting from case-finding algorithms (CFA) applied to the original data. The validity of a CFA is the difference between its result and the true variable. The science of estimating the validity of CFAs is in development. Stemming from the methodology of validation of diagnostic algorithms, it nevertheless has specific hurdles and opportunities. In this talk we will introduce the concept of *component CFA*. We will show the results from a large validation study of CFAs for type 2 diabetes, hypertension, and ischaemic heart disease in Italian administrative databases, using primary care medical records as a gold standard, and propose more research to generalize and apply its results. We will then show how the component analysis may support the estimation of validity of CFAs in multi-database, multi-national studies.

Torna alla lista dei seminari archiviati


09/02/2017

Introduction to Riordan graphs

Gi-Sang Cheon (Department of Mathematics, Sungkyunkwan University (Korea))

There are many reasons to define new classes of graphs. More generally, considering the n*n symmetric Riordan matrix in modulo 2, we define Riordan graph RG(n) of order n. In this talk, we study some basic properties of Riordan graphs such as the number of edges, the degree sequence, the matching number, the clique number, the independence number and so on. Moreover, several examples of Riordan graphs are given. They include Pascal graphs, Catalan graphs, Fibonacci graphs and many others.

Torna alla lista dei seminari archiviati


09/02/2017

Pascal Graphs and related open problems

Gi-Sang Cheon (Department of Mathematics, Sungkyunkwan University (Korea))

In 1983, Deo and Quinn introduced Pascal graphs in their searching for a class of graphs with certain desired properties to be used as computer networks. One of the desired properties is that the design be simple and recursive so that when a new node is added, the entire network does not have to be reconfigured. Another property is that one central vertex be adjacent to all others. The third requirement is that there exist several paths between each pair of vertices (for reliability) and that some of these paths be of short lengths (to reduce communication delays). Finally, the graphs should have good cohesion and connectivity. A Pascal matrix PM(n) of order n is defined to be an n*n symmetric binary matrix where the main diagonal entries are all 0's and the lower triangular part of the matrix consists of the first n-1 rows of Pascal's triangle modulo 2. The graph corresponding to the adjacency matrix PM(n) is called the Pascal graph of order n.

Torna alla lista dei seminari archiviati


26/01/2017

Uncertainty in Propensity Score Estimation: Bayesian Methods for Variable Selection and Model-Averaged Causal Effects.

Corwin M. Zigler (Harvard University)

Causal inference with observational data frequently relies on the notion of the propensity score (PS) to adjust treatment comparisons for observed confounding factors. As comparative effectiveness research in the era of "big data" increasingly relies on large and complex collections of administrative resources, researchers are frequently confronted with decisions regarding which of a high-dimensional covariate set to include in the PS model in order to satisfy the assumptions necessary for estimating average causal effects. Typically, simple or ad-hoc methods are employed to arrive at a single PS model, without acknowledging the uncertainty associated with the model selection. We propose Bayesian methods for PS variable selection and model averaging that 1) select relevant variables from a set of candidate variables to include in the PS model and 2) estimate causal treatment effects as weighted averages of estimates under different PS models. The associated weight for each PS model reflects the data-driven support for that model’s ability to adjust for the necessary variables. We illustrate features of our proposed approaches with a simulation study, and ultimately use our methods to compare the effectiveness of treatments for brain tumors among Medicare beneficiaries.

Torna alla lista dei seminari archiviati


26/01/2017

Bayesian Effect Estimation Accounting for Adjustment Uncertainty

Corwin M. Zigler (Harvard University)

Model-based estimation of the effect of an exposure on an outcome is generally sensitive to the choice of which confounding factors are included in the model. We propose a new approach, which we call Bayesian adjustment for confounding (BAC), to estimate the effect of an exposure of interest on the outcome, while accounting for the uncertainty in the choice of confounders. Our approach is based on specifying two models: (1) the outcome as a function of the exposure and the potential confounders (the outcome model); and (2) the exposure as a function of the potential confounders (the exposure model). We consider Bayesian variable selection on both models and link the two by introducing a dependence parameter denoting the prior odds of including a predictor in the outcome model, given that the same predictor is in the exposure model. In the absence of dependence, BAC reduces to traditional Bayesian model averaging (BMA). In simulation studies, we show that BAC with dependence can estimate the exposure effect with smaller bias than traditional BMA, and improved coverage. We compare BAC with other methods, including traditional BMA, in a time series data set of hospital admissions, air pollution levels, and weather variables in Nassau, NY for the period 1999–2005. Using each approach, we estimate the short-term effects of PM2.5 on emergency admissions for cardiovascular diseases, accounting for confounding. This application illustrates the potentially significant pitfalls of misusing variable selection methods in the context of adjustment uncertainty.

Torna alla lista dei seminari archiviati


14/11/2016

La prova statistica nel processo penale

Benito Vittorio Frosini (Università Cattolica del Sacro Cuore)

Il contenuto del seminario sulla prova statistica nei processi penali costituisce una introduzione ai problemi che si incontrano usualmente nella trattazione (e nella letteratura) della c.d. Statistica Forense, con particolare riguardo ai processi penali; in realtà, anche nei processi civili si incontrano problematiche del tutto analoghe, che però ricevono meno attenzione nella letteratura specialistica. Le regole da seguire nella valutazione delle prove portate in un processo, e in particolare delle prove scientifiche e delle prove statistiche, sono previste praticamente in tutte le legislazioni; una esposizione abbastanza chiara e analitica è quella contenuta nelle c.d. Federal Rules of Evidence degli Stati Uniti, che giustamente non fanno distinzione fra processo civile e processo penale. Il modo generalmente raccomandato, in presenza di un evento E (evidenza processuale) che può essere prodotto da più cause, alle quali sono associabili probabilità (a priori), è il ricorso alla formula di Bayes, che trasforma le verosimiglianze (del tipo P(E|C)) nelle probabilità a posteriori delle possibili cause (del tipo P(C|E)). La correttezza di questo approccio implica una valutazione del tutto negativa della c.d. "fallacia del condizionale trasposto", che consiste nell'impiego di probabilità del tipo P(E|C) - quando sono molto piccole - in luogo delle probabilità P(C|E), come purtroppo è accaduto in vari processi. Viene inoltre discusso il problema dell'impiego delle c.d. "statistiche nude", e più in generale della c.d. "inferenza statistica nuda", consistente nell'impiego - nel particolare esempio discusso in un processo - di statistiche derivanti da una popolazione più o meno ampia, che contiene il caso processuale in discussione. Esempi rilevanti in questa discussione, analiticamente considerati, sono stati i processi concernenti Alfred Dreyfus (in Francia), Sacco e Vanzetti (negli Stati Uniti), Chedzey (Australia), Sally Clark (nel Regno Unito). Vengono anche esaminati alcuni tipi di processi (civili e penali) in cui l'approccio bayesiano è praticamente assente nella letteratura, mentre vengono usualmente impiegati i c.d. tests di significatività (Fisher-Neyman-Pearson).

Torna alla lista dei seminari archiviati


10/10/2016

A Practical Framework for Specification, Analysis and Enforcement of Attribute-based Access Control Policies

Andrea Margheri (University of Southampton)

Access control systems are widely used means for the protection of computing systems. They may take several forms, use different technologies and involve varying degrees of complexity. In this seminar, we present a fully-implemented Java-based framework for the specification, analysis and enforcement of attribute-based access control policies. The framework rests on FACPL, a formal language with a compact, yet expressive, syntax that permits expressing real-world access control policies. The analysis functionalities exploit a state-of-the-art SMT-based approach and support the automatic verifications of many properties of interest. We introduce FACPL and its supporting functionalities by means of a real-world case study from an e-Health application domain.

Torna alla lista dei seminari archiviati


29/09/2016

Reasoning about the trade-off between security and performance

Boris Köpf (IMDEA Madrid)

Today's software systems employ a wide variety of techniques for minimizing the use of resources such as time, memory, and energy. While these techniques are indispensable for achieving competitive performance, they can pose a serious threat to security: By reducing the resource consumption on average (but not in the worst case), they introduce variations that can be exploited by adversaries for recovering private information about users, or even cryptographic keys. In this talk I will give examples of attacks against a number of performance-enhancing features of software and hardware, and I will present ongoing work on techniques for quantifying the resulting threat and for choosing the most cost-effective defense.

Torna alla lista dei seminari archiviati


13/07/2016

Sensitivity Analysis for Differences-in-Differences

Luke Keele (Dept. of Political Science Penn State University)

In the estimation of causal effects with observational data, applied analysts often use the differences-in-differences (DID) method. The method is widely used since the needed before and after comparison of a treated and control group is a common situation in the social sciences. Researchers use this method since it protects against a specific form of unobserved confounding. Here, we develop a set of tools to allow analysts to better utilize the method of DID. First, we articulate the hypothetical experiment that DID seeks to replicate. Next, we develop form of matching that allows for covariate adjustment in under the DID identification strategy that is consistent with the hypothetical experiment. We also develop a set of confirmatory tests that should hold if DID is a valid identification strategy. Finally, we adapt a well known method of sensitivity analysis for hidden confounding to the DID method. We develop these sensitivity analysis methods for both binary and continuous outcomes. We show that DID is in a very real sense more sensitive to hidden confounders than a standard selection on observables identification strategy. We then apply our methods to three different empirical examples from the social sciences.

Torna alla lista dei seminari archiviati


28/06/2016

Sharing on recent collaborative research work between MSE of Wuhan University and SEEM of City University of Hong Kong in Remanufacturing System and Healthcare Management.

Xian-Kia Wang & Kwai-Sang Chin (Wuhan University, China & University of Hong Kong)

Competitive Strategy in Remanufacturing and the Effects of Government Subsidy This presentation aims to report the research ideas and initial result of a national research project in China about the effective remanufacturing systems and its integrated management. We consider a single-period model comprised of an original equipment manufacturer (OEM) who produces only new products and a remanufacturer who collects used products from consumers and produces remanufactured products. The OEM and the remanufacturer compete in the product market. We examine the effects of government subsidy as a means to promote remanufacturing activity, particularly with subsidy to remanufacturer and consumers respectively. It is found that the government subsidy to both remanufacturer and consumer increases remanufacturing activity, while the subsidy to remanufacturer shows better result. It is because the subsidy to remanufacturer is resulted in lower price of remanufactured products, thus leading to higher consumer surplus and social welfare. (b) Emergency Healthcare Operations Management This presentation aims to report partial outcome of a research project entitled “Delivering 21st Century Healthcare in Hong Kong—Building a Quality-and-Efficiency Driven System”, particularly in the Emergency Department of a hospital. Emergency Department (ED) is a pivotal component in the social healthcare system, particularly in a highly dense city like Hong Kong. The ED management in Hong Kong has been long challenged by the high patient demands, manpower shortage, and unbalanced ED staff utilization. These problems, if not properly addressed, will be creating negative impact on the patient experience as well as staff morale. Based on the support of an ED of a main public hospital in Hong Kong, we have studied the patient arrival pattern and developed a forecasting and simulation model for ED managers to evaluate the current ED performance and suggest alternative ED operations for the ever-changing ED environment and patient demand.

Torna alla lista dei seminari archiviati


22/06/2016

Sensitivity analysis with unmeasured confounding

Peng Ding (Dept. of Statistics, University of California Berkeley)

- Part I: Cornfield’s inequalities and extensions to unmeasured confounding in observational studies
- Part II: Cornfield’s inequalities and Rosenbaum’s design sensitivity in causal inference with observational studies

Torna alla lista dei seminari archiviati


07/06/2016

Randomization inference for treatment effect variation

Peng Ding (Dept. of Statistics, University of California Berkeley)

- Part I: Decomposing treatment effect variation in randomized experiments with and without noncompliance
- Part II: Randomization-based tests for idiosyncratic treatment effect variation

Torna alla lista dei seminari archiviati


18/05/2016

Regional poverty measurement in a border region

Ralf Münnich (Universität Trier)

Statistical outcome from National Statistical Institutes is generally produced using the national data available. For regional policies, however, not solely the information on the subregion might be important but also surrounding information. In general, this is solved using spatial modelling. In case of regions at national borders, the situation becomes more difficult since neighbouring information from other countries, in general, is not available. The presentation focuses on regional poverty measurement using the ARPR in the federal states Rhineland-Palatinate and Saarland on LAU1. Based on an area-level small area model, several approaches to handle the border statistics problem will be presented and discussed.

Torna alla lista dei seminari archiviati


18/05/2016

A multilevel Heckman model to investigate financial assets among old people in Europe

Omar Paccagnella (Department of Statistical Sciences, University of Padua)

Heckman sample selection models are often applied when the variable of interest is observed or recorded only if a (selection) condition applies. This may occur because of unit or item (survey) non-responses or because a specific product or condition is owned by a subsample of units and unobservable components affecting the inclusion in the subsample are correlated with unobservable factors influencing the variable of interest. The latter case may be well represented by the amount invested in financial and/or real assets: it may be observed and studied only if the unit has included that product in its own portfolio. Household portfolios have been widely investigated, particularly among the old people. Indeed, as earnings of the older people typically reflect income pensions, consumption in later life can be supported by spending down financial or real assets. Therefore, wealth is a key measure of the individual socio-economic status in an ageing society. On the one hand, this topic may be addressed to investigate the ownership patterns of financial or real assets and/or their invested amount. On the other hand, the interest may be focused on cross-country comparisons of ownership/amount of the assets. This paper aims at shedding more light on the behaviour of households across Europe by a joint study of their ownership patterns and invested amount in some financial assets (i.e. savings for short-term or long-term investments) among the old population. To this aim, in order to take into account the hierarchical nature of the data (households nested into countries) and the features of the variable of interest (the amount invested in a financial product is observed only if the household owns that asset), a multilevel Heckman model is the suggested methodological solution for the analysis. Such model is applied to data from SHARE (Survey of Health, Ageing and Retirement in Europe), which is an international survey on ageing, collecting detailed information on the socio-economic status of the European old population.

Torna alla lista dei seminari archiviati


26/04/2016

Women’s mid-life ageing in low and middle income countries

Tiziana Leone (London School of Economics)

To date there is little evidence on how the health needs and their ageing process of women aged 45-65 in Low and Middle Income Countries (LMICs), in particular when compared to other age groups as well as men. This is particularly significant given that this is the period when menopause occurs and the cumulative effects of multiple births or birth injuries can cause health problems across women’s life course. The aim of the project is to analyse inequalities across socio-economic groups in the ageing process of women between the ages of 45-65 and compare it to that of men within and between countries. Using data from wave 1 of SAGE in 4 countries (Ghana, Mexico, Russia and India) as well as longitudinal data from the Indonesian Longitudinal Family Survey, this study looks at the age pattern of physical and mental decline looking at objective measures of health such as grip strength, cognitive functions and walking speed as well as chronic diseases. Ultimately the study aims to understand what policy implications there are for the pace of ageing in LMICs. Results show shows a clear pattern of deterioriation of health in the middle age group significantly different from men who show a more linear pattern. This pattern is particularly prominent when we look at physical health rather than mental health. The study highlights the importance to shed more light into this field as women have health care needs beyond their reproductive life which are potentially neglected.

Torna alla lista dei seminari archiviati


14/04/2016

On Latent Change Model Choice in Longitudinal Studies

Tenko Raykov (Michigan State University)

This talk is concerned with two helpful aids in the process of choosing between models of change in repeated measure investigations in the behavioral and social sciences: (i) interval estimates of proportions explained variance in longitudinally followed variables, and (ii) individual case residuals associated with these variables. The discussed method allows obtaining confidence intervals for the R-squared indices of repeatedly administered measures, as well as subject-specific discrepancies between model predictions and raw data on the observed variables. In addition to facilitating evaluation of local model fit, the approach is useful for the purpose of differentiating between plausible models stipulating different patterns of change over time. This feature of the described method becomes particularly helpful in empirical situations characterized by (very) large samples and high statistical power, which are becoming increasingly more frequent in complex sample design studies in the behavioral, health, and social sciences. The approach is similarly applicable in cross-sectional investigations, as well as with general structural equation models, and extends the set of means available to substantive researchers and methodologists for model fit evaluation beyond the traditionally used overall goodness of fit indexes. The discussed method is illustrated using data from a nationally representative study of older adults.

Torna alla lista dei seminari archiviati


06/04/2016

Equivalence relations for ordinary differential equations

Mirco Tribastone (IMT - Lucca )

Ordinary differential equations (ODEs) are the primary means to modeling systems in a wide range of natural and engineering sciences. When the inherent complexity of the system under consideration is high, the number of equations required poses a serious detriment to our capability of performing effective analyses. This has motivated a large body of research, across many disciplines, into abstraction techniques that provide smaller ODE systems preserving the original dynamics in some appropriate sense. In this talk I will review some recent results that look at this problem from a computer-science perspective. I will present behavioral equivalences for nonlinear ODEs based on the established notion of bisimulation. These induce a quotienting of the set of variables of the original system such that in a self-consistent reduced ODE system each macro-variable represents the dynamics of an equivalence class. For a rather general class of nonlinear ODEs, computing the largest equivalences can be done using a symbolic partition-refinement algorithm that exploits an encoding into a satisfiability (modulo theories) problem. For ODEs with polynomial derivatives of degree at most two, covering the important cases of affine systems and chemical reaction networks, the partition refinement is based on Paige and Tarjan’s seminal solution to the coarsest refinement problem. This gives an efficient algorithm that runs in O(m n logn) time, where m is the number of monomials and n is the number of ODE variables. I will present numerical evidence using ERODE (http://sysma.imtlucca.it/tools/erode/), an ongoing project that implements such reductions algorithms, showing that our bisimulations can be effective in realistic models from chemistry, systems biology, electrical engineering, and structural mechanics. This is joint work with Luca Cardelli, Max Tschaikowski, and Andrea Vandin.

Torna alla lista dei seminari archiviati


25/02/2016

The role of active monitoring in reducing the execution times of public works. A Regression Discontinuity approach

Marco Mariani (IRPET)

The economic literature on the procurement of public works usually pays great attention to the stages of auctions and contract formation, much less to the later stages of contract enforcement where public buyers are expected to combat delays and cost escalations by monitoring executors. The opportunity to appraise the role of monitoring is offered to us by an Italian regional case study, where a local law obliges the regional government to assist smaller buyers in performing tighter-than-usual monitoring on projects that surpass a certain threshold in terms of financial size and receive co-financing by the regional government above a certain level. In order to estimate the causal effect of this higher level of monitoring on the time-to-completion of public works we resort to a sharp regression discontinuity approach, made unusual by the presence of multiple forcing variables and transposed in a discrete-time survival analysis setting that allows for non-proportional hazards. Cross-validation procedures for bandwidth selection, as well as the usual robustness and sensitivity checks of RDDs, are extended to a setting with multiple forcing variables. Results of local estimations show that tighter-than-usual monitoring speeds up either those projects that would have not lasted long anyway or, more interestingly, very persistent projects. (Joint work with G.F. Gori and P. Lattarulo, IRPET)

Torna alla lista dei seminari archiviati


16/12/2015

The role of stage at diagnosis in colorectal cancer racial/ethnic survival disparities: a causal inference perspective

Linda Valeri (Harvard Medical School)

Disparities in colorectal cancer survival and stage at diagnosis between White and Black patients are widely documented, whereby Black patients are more likely diagnosed at advanced stage and have poorer prognosis. Interest lies in understanding the importance of stage at diagnosis in explaining survival disparities. To this aim we propose to quantify the extent to which racial/ethnic survival disparities would be reduced if disparities in stage at diagnosis were eliminated. In particular, we develop a causal inference approach to assess the impact of a hypothetical shift in the distribution of stage at diagnosis in the black population to match that of the white population. We further develop sensitivity analysis techniques to assess the robustness of our results to the violation of the no-unmeasured confounding assumption and to selection bias due to stage at diagnosis missing not-at-random. Our results support the hypothesis that elimination of disparities in stage at diagnosis would contribute to the reduction in racial survival disparities in colorectal cancer. Important heterogeneities across the patients’ characteristics were observed and our approach easily accommodates for these features. This work illustrates how a causal inference perspective aids in identifying and formalizing relevant hypotheses in health disparities research that can inform policy decisions.

Torna alla lista dei seminari archiviati


07/10/2015

Some Perspectives about Generalized Linear Modeling

Alan Agresti (University of Florida)

This talk discusses several topics pertaining to generalized linear modeling. With focus on categorical data, the topics include (1) bias in using ordinary linear models with ordinal categorical response data, (2) interpreting effects with nonlinear link functions, (3) cautions in using Wald inference (tests and confidence intervals) when effects are large or near the boundary of the parameter space, (4) the behavior and choice of residuals for GLMs, (5) an improved way to use the generalized estimating equations (GEE) method for marginal modeling of a multinomial response, and (6) modeling nonnegative zero-inflated responses. I will present few new research results, but these topics got my attention while I was writing the book `Foundations of Linear and Generalized Linear Models,' recently published by Wiley.

Torna alla lista dei seminari archiviati


27/07/2015

Optimal Tests of Treatment Effects for the Overall Population and Two Subpopulations in Randomized Trials, using Sparse Linear Programming

Michael Rosenblum (Johns Hopkins Bloomberg School of Public Health)

We propose new, optimal methods for analyzing randomized trials, when it is suspected that treatment effects may differ in two predefined subpopulations. Such subpopulations could be defined by a biomarker or risk factor measured at baseline. The goal is to simultaneously learn which subpopulations benefit from an experimental treatment, while providing strong control of the familywise Type I error rate. We formalize this as a multiple testing problem and show it is computationally infeasible to solve using existing techniques. Our solution involves a novel approach, in which we first transform the original multiple testing problem into a large, sparse linear program. We then solve this problem using advanced optimization techniques. This general method can solve a variety of multiple testing problems and decision theory problems related to optimal trial design, for which no solution was previously available. In particular, we construct new multiple testing procedures that satisfy minimax and Bayes optimality criteria. For a given optimality criterion, our new approach yields the optimal tradeoff between power to detect an effect in the overall population versus power to detect effects in subpopulations. We demonstrate our approach in examples motivated by two randomized trials of new treatments for HIV.

Torna alla lista dei seminari archiviati


29/06/2015

Probabilistic Model Checking

Arpit Sharma (DiSIA)

Model checking is an automated veri?cation method guaranteeing that a mathematical model of a system satis?es a formally described property. It can be used to assess both qualitative and quantitative properties of complex software and hardware systems. This seminar is going to present the basic idea behind the verification of discrete-time Markov chains (DTMCs) against linear temporal logic specifications (LTL). We are also going to discuss some challenging directions for future research.

Torna alla lista dei seminari archiviati


28/05/2015

Modelling systemic risk in financial markets

Andrea Ugolini (DiSIA)

Three recent crises — the dot-com bubble and the subprime and European sovereign debt crises — have revealed the complex dynamics underpinning the global financial system and how rapidly risk is propagated across markets. Investors, regulators and researchers are thus keen to develop accurate measures of risk transmission between assets and markets. From the investors’ perspective of guaranteeing efficient portfolio diversification, quantify the risk of contagion is essential for their ongoing interest in changes in market linkages. From the regulators’ point of view, assess risk spillover is important to focalize attention on the maintenance and development of new financial regulatory and institutional rules such as circuit breakers, transaction taxes and short-sale rules. In fact, recognizing important shortcomings in financial supervision, the European Commission and European Central Bank (ECB) created the European Systemic Risk Board (ESRB) at the end of 2010 with the goal of monitoring, at the macro-prudential level, the European financial system and preventing and mitigating any propagation risk within the financial system. The literature contains many definitions of systemic risk. De Bandt and Hartmann (2000) defined it “as the risk of experiencing systemic events in the strong sense” where “strong sense” signifies the spread of news about an institution that has an adverse impact on one or more healthy institutions in a sequential manner. Billio et al. (2012) explained that “systemic risk can be realized as a series of correlated defaults among financial institutions, occurring over a short time span and triggering a withdrawal of liquidity and widespread loss of confidence in the financial system as a whole”. The above brief list of definitions points to the intricacy of the topic and the challenge faced by investors, regulators and researchers in attempting to measure the complexity and dynamics of systemic risk. Using the conditional Value-at-Risk (CoVaR) systemic risk measure (Adrian and Brunnermeier, 2011; Girardi and Ergün, 2013), we quantified systemic risk as the impact of the risky situation of a particular financial institution, market or system on the value-at-risk (VaR) of other financial institutions, markets or systems. We introduced a novel copula and vine copula approach to computing the CoVaR value, given that copula are flexible modellers of joint distribution and are particularly useful for characterizing the tail behaviour that provides such crucial information for the CoVaR.

Torna alla lista dei seminari archiviati


29/04/2015

Role of scientists to defend the integrity of science and public health policy

Kathleen Ruff (International Joint Policy Committee of the Societies of Epidemiology, IJPC-SE)

When the scientific evidence is distorted so as to serve vested interests and to endanger public health, what responsibility do scientists and scientific organisations bear? Those most at risk of harm from hazardous substances are frequently those segments of the population who lack economic and political influence and thus have little ability to demand the necessary health protections. Scientists have the credibility to demand evidence-based public health policy that protects the right to health of all citizens.

Torna alla lista dei seminari archiviati


29/04/2015

La governance del rischio e le procedure di VIS in Italia

Giancarlo Sturloni (International School for Advanced Studies, SISSA, Trieste)

Health Impact Assessment (HIA) is an important tool to defend local communities from health and environmental risks. Nevertheless, to keep its promise HIA should be carried on within an open, well-informed, inclusive and dialogical approach. Risk governance could help HIA in supporting the dialogue between the stakeholders and the open access to scientific information, so as to promote participative decision making processes.

Torna alla lista dei seminari archiviati


13/02/2015

Statistiche ufficiali, social media e big data

Stefano Iacus (Università degli Studi di Milano)

In questo seminario presenteremo alcuni recenti sviluppi dell'analisi di big data provenienti dai social media. Tenteremo anche di mostrare quali siano le possibili interazioni tra l'analisi di dati social e quelle delle tecniche di survey tradizionali evidenziando limiti e promesse di alcuni esperimenti. A titolo di esempio mostreremo due applicazioni recenti: #iHappy, ovvero l'indice di Twitter-Happyness (http://www.blogsvoices.unimi.it) e #WNI, il Wired Next Index (http://index.wired.it), un indicatore che cerca di catturare la capacità di ripresa del paese. In particolare, il #WNI, è la sintesi tra i risultati delle statistiche ufficiali in materia di economia e benessere e i risultati di tre indicatori "social" di fiducia personale, economica e politica, e sembra essere un buon strumento di nowcasting di ciò che pretende di misurare.

Torna alla lista dei seminari archiviati


11/02/2015

Quantile regression coefficients modeling

Paolo Frumento (Karolinska Institutet - Unit of Biostatistics)

Estimating conditional quantiles is of interest in many research areas, and quantile regression is foremost among the utilized methods. The coefficients of a quantile regression model depend on the order of the quantile being estimated. We present an approach to modeling the regression coefficients as parametric functions of the order of the quantile. This approach may have advantages in terms of parsimony, efficiency, and may expand the potential of statistical modeling. We describe a possible application of this method and its implementation in the qrcm R package.

Torna alla lista dei seminari archiviati


20/01/2015

Theory and Applications of Proper Scoring Rules

Monica Musio (Universita' degli Studi di Cagliari)

Suppose you have to quote a probability distribution Q for an unkown quantity X. When the value x of X is eventually observed, you will be penalised by an amount S(x;Q). The function S is a proper scoring rule if, when you believe X has distribution P, you expected penalty S(P;Q) is minimised by the honest quote Q = P. Proper scoring rules can be used, as above, to motivate you to assess your true uncertainty honestly, as well as to measure the quality of your past probability forecasts in the light of the actual outcomes. They also have many other statistical applications. We will discuss some characterisations, properties and specialisations of proper scoring rules, and describe some of their uses, including robust estimation and Bayesian model selection.

Torna alla lista dei seminari archiviati


09/12/2014

Methods for Scalable and Robust High Dimensional Graphical Model Selection

Bala Rajaratnam (Stanford University)

Learning high dimensional graphical models is a topic of contemporary interest. A popular approach is to use L1 regularization methods to induce sparsity in the inverse covariance estimator, leading to sparse partial covariance/correlation graphs. Such approaches can be grouped into two classes: (1) regularized likelihood methods and (2) regularized regression-based, or pseudo-likelihood, methods. Regression based methods have the distinct advantage that they do not explicitly assume Gaussianity. One gap in the area is that none of the popular methods proposed for solving regression based objective functions have provable convergence guarantees. Hence it is not clear if resulting estimators actually yield correct partial correlation/partial covariance graphs. To this end, we propose a new regression based graphical model selection method that is both tractable and has provable convergence guarantees. In addition we also demonstrate that our approach yields estimators that have good large sample properties. The methodology is illustrated on both real and simulated data. We also present a novel unifying framework that places various pseudo-likelihood graphical model selection methods as special cases of a more general formulation, leading to important insights. (Joint work with S. Oh and K. Khare)

Torna alla lista dei seminari archiviati


09/12/2014

Synergy, suppression and immorality

Joe Whittaker (Lancaster University)

Give a background on suppression in regression that begins with Horst (1941), and has generated a plethora of different types. Introduce forward differences of the entropy and define synergy in terms of explained information. Characterise and generalise suppression in terms of synergy. Specialise this result to correlation matrices. Relate this to immorality via conditional synergy. Give an empirical example and some small examples from graphical models. Make some concluding remarks.

Torna alla lista dei seminari archiviati


03/12/2014

Threshold free estimation of functional antibody titers for a group B Streptococcus opsonophagocytic killing assay

Luca Moraschini (Università Milano Bicocca and Novartis Vaccines Siena)

Opsonophagocytic killing assays (OPKA) are routinely used for the quantification of bactericidal antibodies against Gram-positive bacteria in clinical trial samples. The OPKA readout, the titer, is traditionally estimated using non-linear dose-response regressions as the highest serum dilution yielding a predefined threshold level of bacterial killing. Therefore, these titers depend on a specific killing threshold value and on a specific dose-response model. This talk describes a novel OPKA titer definition, the threshold free titer, which preserves biological interpretability whilst not depending on any killing threshold. First, a model-free version of this titer is briefly presented ad shown to be more precise than the traditional threshold-based titers when using simulated and experimental group B Streptococcus (GBS) OPKA experimental data. Second, a model-based threshold-free titer is introduced to automatically take into account the potential saturation of the OPKA killing curve. The posterior distributions of threshold-based and threshold-free titers is derived for each analysed sample using importance sampling embedded within a Markov chain Monte Carlo sampler of the coefficients of a 4PL logistic dose-response model. The posterior precision of threshold-free titers is again shown to be higher than that of threshold-based titers. The biological interpretability and operational characteristics demonstrated here indicate that threshold-free titers can substantially improve the routine analysis of OPKA experimental and clinical data.

Torna alla lista dei seminari archiviati


06/10/2014

A partial survey on model selection

Yosef Rinott (Hebrew University & LUISS)

I will discuss some model selection methods, old and relatively new, such as AIC, BIC, Lasso and more. I will explain their goals and 'philosophy', and some of their properties. Time permitting I will discuss my recent work in this area.

Torna alla lista dei seminari archiviati


22/09/2014

Nonparametric prediction of internet advertising conversion rates

Charles Taylor (University of Leeds)

In 2013 Google revenue exceeded 55 billion dollars, most of which came from advertising, and most of this from AdWords. Depending on the search term, companies will bid to have their site placed at the top of the list. Google then receive the bid price if the user clicks on an Ad link, but the company only benefits if they make an (extra) sale. Using data on conversion examples, this talk will explore a nonparametric approach to predicting success, when the data are in the form of short text strings. Different distance measures will be examined, as will a method for selecting a smoothing parameter.

Torna alla lista dei seminari archiviati


19/09/2014

Validating protein structure using kernel density estimates

Charles Taylor (University of Leeds)

Measuring the quality of determined protein structures is a very important problem in bioinformatics. Kernel density estimation is a well-known nonparametric method which is often used for exploratory data analysis. Recent advances, which have extended previous linear methods to multi-dimensional circular data, give a sound basis for the analysis of conformational angles of protein backbones, which lie on the torus. By using an energy test, which is based on interpoint distances, we initially investigate the dependence of the angles on the amino acid type. Then by computing tail probabilities which are based on amino-acid conditional density estimates, a method is proposed which permits inference on a test set of data. This can be used, for example, to validate protein structures, choose between possible protein predictions and highlight unusual residue angles.

Torna alla lista dei seminari archiviati


24/07/2014

A Spatially Nonstationary Fay-Herriot Model for Small Area Estimation

Hukum Chandra (Indian Agricultural Statistics Research Institute, New Delhi, India)

Small area estimates based on the widely-used area-level model proposed in Fay and Herriot (1979) assume that the area level direct estimates are spatially uncorrelated. In many cases, however, the underlying individual level data are spatially correlated. We propose an extension to the Fay-Herriot model that accounts for the presence of spatial nonstationarity in the area level data. We refer to the predictor based on this extended model as the nonstationary empirical best linear unbiased predictor (NSEBLUP). We also develop two different estimators for the mean squared error of the NSEBLUP. The first estimator uses approximations similar to those in Opsomer et al. (2008). The second estimator is based on the parametric bootstrapping approach of Gonzalez-Manteiga et al. (2008) and Molina et al. (2009). Results from model-based and design-based simulation studies using spatially nonstationary data indicate that the NSEBLUP compares favourably with alternative area-level predictors that ignore this spatial nonstationarity. In addition, both proposed methods of mean squared error estimation for the NSEBLUP seem to perform adequately.

Torna alla lista dei seminari archiviati


21/07/2014

Finite sample properties of estimators of dose-response functions based on the generalised propensity score

Michela Bia (CEPS/INSTEAD, Luxembourg)

In this seminar we propose three semiparametric estimators of the dose-response function based on kernel and spline techniques. In many observational studies treatment may not be binary or categorical. In such cases, one may be interested in estimating the dose response function in a setting with a continuous treatment. This approach strongly relies on the uncounfoundedness assumption, which requires the potential outcomes are independent of the treatment conditional on a set of covariates. In this context the generalized propensity score can be used to estimate dose-response functions (DRF) and marginal treatment effect functions. We evaluate the performance of the proposed estimators using Monte Carlo simulation methods. We also apply our approach to the problem of evaluating job training program for disadvantaged youth in the United States (Job Corps program). In this regard, we provide new evidence on the intervention effectiveness by uncovering heterogeneities in the effects of Job Corps training along the different lengths of exposure.

Torna alla lista dei seminari archiviati


13/05/2014

Multidimensional Inequality Measures on Finite Partial Orders

Marco Fattore (Università di Milano-Bicocca)

Multidimensional inequality indexes with ordinal variables are an open issue in socio-economic statistics. The issue is relevant, particularly in connection with material deprivation, well-being and quality-of-life studies, that often involve ordinal variables. While the axiomatic theory of multidimensional inequality indexes is well-developed for the cardinal case, it is almost unexplored when multidimensional ordinal information is involved. The few attempts that can be found in the literature do not attain fully satisfactory results yet. In this seminar, we outline a completely new approach to inequality measurement, based on partial order theory. We first show how a general theory of inequality measures can be developed for finite partial orders. We then restrict the approach to product orders and show how this leads to a theory of multidimensional ordinal inequality measures. In addition, we discuss how the important problem of attribute decomposition may be addressed drawing on particular decompositions of Hasse diagrams of product orders.

Torna alla lista dei seminari archiviati


06/05/2014

BIG DATA: economia, professioni, modalità di visualizzazione

Silvano Cacciari (Università di Firenze)

I Big data altro non sono che insiemi di dati raccolti in modo così complesso da essere molto differenti dai dati tradizionali. Il punto è che non stiamo parlando solo di software e di hardware ma di una materia grezza che è stimata poter valere l'8 per cento del PIL europeo negli anni '20. Una vera, ennesima, rivoluzione informatica, economica, tecnologica, cognitiva che va ancora studiata nelle sue reali conseguenze. Il seminario si prefigge di affontare tre temi che, nei loro lineamenti, si incrociano continuamente nel fenomeno big data. - L'economia dei big data. Il modo con il quale cambiano business e modelli macroeconomici. - Le professioni dei big data. Formazione e professionalizzazione interdisciplinare, affinità e differenze. - Modalità di visualizzazione dei big data. Antropologia visuale e processo decisionale in economia e nei percorsi formativi. I lineamenti di questo seminario sono propedeutici alla didattica, alla ricerca e alla progettazione.

Torna alla lista dei seminari archiviati


09/04/2014

On Enhancing Plausibility of MAR in Incomplete Data Analyses via Evaluating Response-Auxiliary Variable Correlations

Tenko Raykov (Michigan State University)

A procedure for evaluating candidate auxiliary variable correlations with response variables in incomplete data sets is outlined. The method provides point and interval estimates of the outcome-residual correlations with potentially useful auxiliaries, and of the bivariate correlations of outcome(s) with the latter variables. Auxiliary variables found in this way can enhance considerably the plausibility of the popular missing at random (MAR) assumption if included in ensuing maximum likelihood analyses, or can alternatively be incorporated in imputation models for subsequent multiple imputation analyses. The approach can be particularly helpful in empirical settings where violations of the MAR assumption are suspected, as is the case in many longitudinal studies, and is illustrated with data from cognitive aging research.

Torna alla lista dei seminari archiviati


29/01/2014

Penalising model component complexity: A principled practical approach to constructing priors

Håvard Rue (Department of Mathematical Sciences, Norwegian University of Science and Technology)

Selecting appropriate prior distributions for parameters in a statistical model is the Achilles heel of Bayesian statistics. Although the prior distribution should ideally encode the users prior knowledge about the parameters, this level of knowledge transfer seems to be unattainable in practice; often standard priors are used without much thought and with the implicit hope that the obtained results are not too prior sensitive. Despite the development of so-called objective priors, which are only available (due to mathematical issues) for a few selected and highly restricted model classes, the applied statistician has in practice few guidelines to follow when choosing the priors. An easy way out of this dilemma is to re-use prior choices of others, with an appropriate reference, to avoid further questions about this issue. In a Bayesian-software system like R-INLA, where models are build by adding up various model components to construct a linear predictor, we are facing a real and practical challenge. Default priors must be set for the parameters of a large number of model components which are supposed to work well in a large number of scenarios and situations. None of the (limited) guidelines in the literature seem to be able to approach such a task. In this paper we introduce a new concept for constructing prior distributions where we make use of the natural nested structure inherent to many model components. This nested structure defines a model component as a flexible extension of a base model, which allows us to define proper priors which penalise the complexity induced from the natural base model. Based on this observation, we can compute the prior distribution after the input of a user-defined (weak) scale-parameter for that model component. These priors are invariant to reparameterisations, have a natural connection to Jeffreys priors, are designed to support Occam’s razor and seem to have the excellent robustness properties, all which are highly desirable and allow us to use this approach to define default prior distributions. We will illustrate our approach on a series of examples using the R-INLA package for doing fast approximate Bayesian inference for the class of latent Gaussian models. The Student-t case will be discussed in detail as it is the simplest non-trivial example. Then we will discuss other cases, like classical unstructured random effect models, spline smoothing and disease mapping. This joint work with Thiago G. Martins, Daniel P. Simpson, Andrea Riebler (NTNU) and Sigrunn H. Sørbye (Univ. of Tromsø).

Torna alla lista dei seminari archiviati


20/01/2014

Providers profiling via Bayesian nonparametrics: a model based approach for assessing performances in public health

Francesca Ieva (Dipartimento di Matematica "Federigo Enriques", Università degli studi)

Within the seminar, an application of Bayesian nonparametric modeling of the random effects in a hierarchical generalized linear mixed model will be discussed. The goal of the modeling strategy is twofold: first, it aims at performing a model-based clustering of the grouping factor (e.g., hospital of admission of patients, who are the statistical units fro the application of interest), then the interest lies in the outcome prediction at patient level, properly accounting for the grouping factor effect. This approach is very general and may be applied in different contexts like school performance evaluation, among others. In the case of interest, a Bayesian semiparametric mixed effects model is presented for the analysis of strongly unbalanced binary survival data coming from a clinical survey on STEMI (ST segment Elevation Myocardial Infarction), where statistical units (i.e., patients) are grouped by hospital of admission. The idea is to exploit flexibility and potential of such models for carrying out model-based clustering of the random effects in order to profile hospitals according to their effect on patient's outcome. According to the twofold aim previously mentioned, the idea is to provide methods for clustering providers with similar effect on patients’ outcome, enabling the evaluation of providers’ performances with respect to clinical gold standards, and making predictions on outcomes at patient's level. According to the first issue, flexibility of Bayesian nonparametric models (Dirichlet process, Dependent Dirichlet process) suits well the overdispersed nature of data. The in-built clustering they provide is exploited to point out groups among random effect estimates. This is achieved by minimizing suitable loss functions on misclassification errors through linear integer programming, assigning different weights to wrong association of hospitals belonging to different clusters as well as to missed association of hospitals whose effects are similar, within the random partition. Concerning the second issue we propose a new method for classifying patients, starting from Bayesian posterior credibility interval estimates of survival probabilities. In terms of predictive power. It provides better results than other criteria usually adopted in literature, based on pointwise estimates. The methods have been applied to a real dataset from the STEMI Archive, a clinical survey on patients affected by STEMI and admitted to hospitals in Regione Lombardia.

Torna alla lista dei seminari archiviati


21/11/2013

Amenable mortality in the European Union: toward better indicators for the effectiveness of health systems (AMIEHS)

Rasmus Hoffmann (Erasmus Medical Center Rotterdam)

Objectives: There is a renewed interest in health system indicators. The indicator „avoidable/amenable mortality“ is based on the concept that deaths from certain causes should not occur in the presence of timely and effective healthcare. In the AMIEHS project, we introduce a new approach to the selection of indicators of amenable mortality by analyzing the association between innovations in medical care and mortality trends. Although the contribution of medical care to health has been studied extensively in clinical settings, much less is known about its contribution to population health. We examine how innovations in the management of four circulatory disorders, five cancers, HIV, peptic ulcer and renal failure have influenced trends in cause-specific mortality at the population level. We also created an interactive online mortality atlas presenting trends in mortality for 45 possible amenable causes of death in 30 European countries over the period 2001-2009. Methods: Based on predefined selection criteria and a broad review of the literature on the effectiveness of medical interventions, a first set of 14 potential indicators of amenable mortality (causes of death) was selected. The timing of the introduction of medical innovations was established through reviews and questionnaires sent to national experts from seven participating European countries. The preselected indicators were then validated by a Delphi-procedure. We combined data on the timing of these innovations and cause-specific mortality trends (1970–2005) from seven European countries. We used Joinpoint-models based on linear spline regression to identify associations between the introduction of innovations and favourable changes in mortality. Results: For most conditions, the Delphi panel could not reach consensus on the role of current mortality levels as measures of effectiveness of healthcare. For both ischaemic heart disease and cerebrovascular disease, the timing of medical innovations was associated with improved mortality. This suggests that innovation has impacted positively on mortality at the population level. For hypertension and heart failure, such associations could not be identified. For none of the five specific cancers, sufficient evidence for an association could be found. The highest association was found between the introduction of antiretroviral treatment and HIV mortality and no association could be found for renal failure and peptic ulcer. Conclusions: AMIEHS offers a rigorous new approach to the concept of amenable mortality that includes empirical validation. Only validated indicators can be successfully used to assess the quality of healthcare systems in international comparisons. Although improvements in cause-specific mortality coincide with the introduction of some innovations, this is not invariably true. This is likely to reflect the incremental effects of many interventions, the time taken for them to be adopted fully, the presence of contemporaneous changes in the incidence of disease or risk factors, diffusion and improved quality of interventions. Improvements in healthcare probably lowered mortality from many of the conditions that we studied but occurred in a much more diffuse way than we assumed in the study design. A better quantification of the contribution of healthcare to mortality would require adequate data on timing of innovation and trends in diffusion. Given the gaps in knowledge, between-country differences in levels of mortality from amenable conditions should not be used for routine surveillance of healthcare performance. The timing and pace of mortality decline from amenable conditions may provide better indicators of healthcare performance.

Torna alla lista dei seminari archiviati


12/09/2013

Local regression for circular data

Agnese Panzera (DISIA)

Local regression for circular data is a theme being developed only recently. We present methodological advances focusing on the kernel method, when the predictor and/or the response are circular. A number of applications are possible, from nonparametric estimation of trend in circular time series, to quantile estimation of circular distributions.

Torna alla lista dei seminari archiviati


09/07/2013

An Integrative Bayesian Modeling Approach to Imaging Genetics

Francesco Stingo (University of Texas, MD Anderson Cancer Center)

In this paper we present a Bayesian hierarchical modeling approach for imaging genetics, where the interest lies in linking brain connectivity across multiple individuals to their genetic information. We have available data from a functional magnetic resonance (fMRI) study on schizophrenia. Our goals are to identify brain regions of interest (ROIs) with discriminating activation patterns between schizophrenic patients and healthy controls, and to relate the ROIs’ activations with available genetic information from single nucleotide polymorphisms (SNPs) on the subjects. For this task we develop a hierarchical mixture model that includes several innovative characteristics: it incorporates the selection of ROIs that discriminate the subjects into separate groups; it allows the mixture components to depend on selected covariates; it includes prior models that capture structural dependencies among the ROIs. Applied to the schizophrenia data set, the model leads to the simultaneous selection of a set of discriminatory ROIs and the relevant SNPs, together with the reconstruction of the correlation structure of the selected regions. To the best of our knowledge, our work represents the first attempt at a rigorous modeling strategy for imaging genetics data that incorporates all such features. This is a joint work with Michele Guindani (MD Anderson Cancer Center), Marina Vannucci (Rice University) and Vince D. Calhoun (University of New Mexico).

Torna alla lista dei seminari archiviati


07/06/2013

Modeling Multivariate, Overdispersed Binomial Data with Additive and Multiplicative Random Effects

Emanuele Del Fava (I-BioStat, Hasselt University, Diepenbeek, Belgium & DONDENA Centre for Research on Social Dynam)

When modeling multivariate binomial data, it often occurs that it is necessary to take into consideration both clustering and overdispersion, the former arising from the dependence between data, and the latter due to the additional variability in the data not prescribed by the distribution. If interest lies in accommodating both phenomena at the same time, we can use separate sets of random effects that capture the within-cluster association and the extra variability. In particular, the random effects for overdispersion can be included in the model either additively or multiplicatively. For this purpose, we propose a series of Bayesian hierarchical models that deal simultaneously with both phenomena. The proposed models are applied to bivariate repeated prevalence data for hepatitis C virus (HCV) and human immunodeficiency virus (HIV) infection in injecting drug users in Italy from 1998 to 2007.

Torna alla lista dei seminari archiviati


27/05/2013

Model-free prediction intervals for regression and autoregression

Dimitris Politis (University of California)

The bootstrap is an invaluable tool for prediction intervals without having to assume normality of the data. However, even when all other model assumptions are correctly specified, bootstrap prediction intervals for regression (and autoregression) are well-known to be plagued by undercoverage; this is true even in the simplest case of linear regression. Furthermore, it can be the case that model assumptions are violated in which case any model-based inference will be invalid. In this talk, the problem of statistical prediction is revisited with a view that goes beyond the typical parametric/nonparametric dilemmas in order to reach a fully model-free environment for predictive inference, i.e., point predictors and predictive intervals. The `Model-Free (MF) Prediction Principle' of Politis (2007) is based on the notion of transforming a given set-up into one that is easier to work with, namely i.i.d. or Gaussian. The two important applications are regression and autoregression whether an additive parametric/nonparametric model is applicable or not.

Torna alla lista dei seminari archiviati


09/05/2013

Explorations on Wellbeing from Chinese Background

Zhanjung Xing (Shandong University)

Since the mid-1980s, researchers from mainland China began to explore quality of life (QOL) at macro level. Entering the new century, and especially since 2006, the QOL research related to public policy in China has been into a new stage. Well-being index has gradually become focus of attention from the academia, the media and the policy makers. This report will give a brief introduction to the QOL research related to public policy in mainland China, and provide a detailed introduction to some researches on wellbeing conducted by the Centre of QOL and Public Policy in Shandong University. Some problems in this research field will also be discussed.

Torna alla lista dei seminari archiviati


08/04/2013

Dynamic Factorial Analysis

Isabella Corazziari (ISTAT)

The Dynamic Factorial Analysis, proposed for the first time in the ‘70s by Coppi and Zannella, aims to analyse three-way array of data, where one of the three way is intrinsically ordered. For example the array can be units x variables x times. Firstly the four models proposed in the article of Coppi and Zannella will be described, followed by some methodological developments (Corazziari 1999, Phd thesis), improved in further works by Coppi, Blanco, Corazziari. Some considerations about best strategies of analysis to improve the interpretative power of the method will be addressed (Coppi, Blanco, Corazziari). A brief demonstration of the software implemented in xlisp by Corazziari will be proposed, both in the basic version than in the better strategy one.

Torna alla lista dei seminari archiviati


25/02/2013

A Hierarchical Bayesian Modeling Approach for Inferring and Identifying Relevant Copy Numbers

Alberto Cassese (Rice University)

Statistical models have been successfully developed in recent years for the analysis of single-platform high-throughput data, but only few methods integrate data from different platforms. In this paper we present a Bayesian hierarchical modeling approach for genetical genomic data. We look, in particular, at array CGH data, measuring DNA copy number changes, and DNA microarrays, measuring gene expression as mRNA abundance. Our interest lies in finding sets of CGH probes that possibly affect the expression of one or more genes and in inferring their copy number states. Our proposed modeling strategy starts with the formulation of a hierarchical model, integrating gene expression levels with genetic data, that includes measurement errors and mixture priors for variable selection. We couple this model with a hidden Markov model on the genetic covariates. Our approach utilizes prior distributions that cleverly incorporate dependencies among selected covariates. It also incorporates stochastic search variable selection techniques within an inferential scheme that allows to select associations among genomic and genetic variables while simultaneously inferring the copy number states. We show performances of our proposed model on simulated data. We also analyze a publicly available data set.

Torna alla lista dei seminari archiviati


28/01/2013

Gini's Mean Difference offers a response to Leamer's critique

Shlomo Yitzhaki (Dept of Economics, Hebrew University)

Gini's mean difference has decomposition properties that nest the decomposition of the variance as a special case. By using it one may reveal the implicit assumptions imposed on the data by using the variance. I argue that some of those implicit assumptions can be traced to be the causes of Leamer's critique. By requiring the econometrician to report whether those assumptions are violated by the data, we may be able to offer a response to Leamer's critique. This will reduce the possibility of supplying "empirical proofs" which in turn may increase the trust in econometric research.

Torna alla lista dei seminari archiviati