bannerUnifi Home Page DiSIA
logo UniFI ridotto
DISIA Dipartimento di Statistica, Informatica, Applicazioni 'Giuseppe Parenti'
Dipartimento di Statistica, Informatica, Applicazioni 'Giuseppe Parenti'
stampa

Seminari del Dipartimento di Statistica precedente al DiSIA

Abstract


13/12/2012

From the blackboard to the trading floor: how to construct and assess trading strategies using multivariate dynamic probabilistic models

Fabio Rigat (University of Warwick)

Algorithmic trading is now the main operational mode of large hedge funds, to the extent that the interaction among different semi-automatic trading strategies substantially affects the volatility of most equity markets. The exact form of the predictive models implemented by trading algorithms and the decision functions mapping predictions into trading actions are mostly unknown to the general public, for obvious strategic reasons. This talk illustrates the selection of multivariate dynamic models and score functions maximizing the cumulative returns of stock portfolios when the identity of the set of tradable stocks is fixed in advance. The main original contributions of this work consist of: 1. the formulation of different scoring functions based on one-step ahead point predictions of stock prices and on higher order moments of the one-step ahead predictive distribution; 2. the assessment of different trading strategies based on the null distribution of the cumulative portfolio returns and 3. the assessment of prediction-correction methods to improve the predictive accuracy of standard multivariate dynamic models. Having introduced the predictive models and score functions, a practical example will be illustrated in detail using weekly stock prices of seven major international pharmaceutical companies.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


26/11/2012

Missing data Analysis: Basic Concepts and Some Theory
Issues in Missing Data Problems: Sensitivity to Assumptions

Donald Rubin (Harvard)

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


15/11/2012

Conditional independence models for contingency tables: singularities and context specific restrictions

Antonio Forcina (University of Perugia)

There are independence models, even very simple ones, for whom, because of their underlying algebraic structure, the usual asymptotic approximations do not hold; in particular, the asymptotic distribution of the likelihood ratio is a mixture of chi-square variables. In the talk, after an elementary introduction to the subject and the discussion of a few examples, we outline an approach which could, intuitively, remove singularities by restricting certain conditional independence statements to hold only for a subset of the possible configurations of the conditioning variables. Without entering into the technical details required for a proof, we introduce certain conceptual tools which might be of interest on their own and be applied in other contexts, in particular: the mixed parameterisation of a discrete distribution, which combines marginal probabilities and log-linear parameters into a smooth mapping; an algorithm for the reconstruction of a joint distribution when an interaction defined in a previous marginal has to be constrained again; some basic results from the fixed point theory from numerical analysis which may be used to establish whether, under suitable conditions, an algorithm converges to a unique solution irrespective of the starting point.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


25/10/2012

Propensity Score Weighting with Multilevel Data

Fan Li (Duke University)

Propensity score methods are being increasingly used as a less parametric alternative to traditional regression to balance observed differences across groups in both descriptive and causal comparisons. Data collected in many disciplines often have analytically relevant multilevel (clustered) structure. The propensity score, however, has been developed and used primarily with unstructured data. We present and compare several propensity-score-weighted estimators for clustered data, including marginal, cluster-weighted and doubly-robust estimators. Using both analytical derivations and Monte Carlo simulations, we illustrate bias arising when the usual assumptions of propensity score analysis do not hold for multilevel data. We show that exploiting the multilevel structure, either parametrically or nonparametrically, in at least one stage of the propensity score analysis can greatly reduce these biases. These methods are applied to a study of racial disparities in breast cancer screening among beneficiaries in Medicare health plans. (This is a joint work with Alan Zaslavsky and Mary Beth Landrum.)

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


11/10/2012

Structural Hierarchical Approach to Longitudinal Modeling of Health Environmental Effects: Methodology and Applications

Michael Friger (Ben-Gurion University of the Negev, Israel)

We discuss a possible approach to modeling health environmental effects based on the time-series technique. There-withal, we try to use the system analysis of the considered problem, or in other words, to use a-priori knowledge on the subject of matter. We are taking meaning of environmental health in a broad sense: e.g. effects of seasonality, meteorological factors, pollution, socio-cultural environment and the like. In the presentation, we will show several applications gained with this approach. Besides, we will discuss the methodology of modeling and estimation of climate change health effects.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


24/09/2012

Turing e l'ignoto

Alberto Gandolfi (Università di Firenze)

Durante il lavoro per decifrare il codice di Enigma, Turing ha ideato un metodo per stimare la probabilità delle specie non osservate in un campione casuale. L’apparentemente insormontabile difficoltà di descrivere oggetti mai osservati è superata con una soluzione brillantemente semplice (una volta che la si è vista!). I dettagli matematici furono approfonditi e pubblicati solo in seguito da Good, all’epoca collaboratore di Turing, e ancora oggi sono alla base di metodi usati in molti ambiti tra cui genetica, linguistica, statistica, informatica e scienze sociali. Elenco dei seminari su Turing: http://turing.dsi.unifi.it

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


20/09/2012

A Stata package for the application of nonparametric estimators of dose-response functions

Michela Bia (CEPS/INSTEAD)

In this paper we propose three semiparametric estimators of the dose-response function based on kernel and spline techniques. In many observational studies treatment may not be binary or categorical. In such cases, one may be interested in estimating the dose-response function in a setting with a continuous treatment. This approach strongly relies on the uncounfoundedness assumption, which requires the potential outcomes are independent of the treatment conditional on a set of covariates. In this context the generalized propensity score can be used to estimate dose-response functions (DRF) and marginal treatment effect functions. We present a set of Stata programs, which estimate the propensity score when the treatment is a continuous variable, test the balancing property of the generalized propensity score, and semiparametrically estimate the dose-response function. We illustrate these programs using a data set collected by Imbens, Rubin and Sacerdote (2001).

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


08/05/2012

Constraints on Marginalised DAGs

Robin Evans (University of Cambridge)

Models defined by the global Markov property for directed acyclic graphs (DAGs) possess many nice properties, including a simple factorisation, graphical criteria for determining independences (d-separation/moralisation), and computational tractability. As is well known, however, DAG models are not closed under the operation of marginalisation, so marginalised DAGs (mDAGs) describe a much larger and less well understood class of models. mDAG models can be represented by hyper-graphs with directed and bidirected edges, under a modest extension of Richardson's ADMGs. We describe the natural Markov property for mDAGs, which imposes observable conditional independences, 'dormant' independences (Verma constraints) and inequality constraints, and provide some results towards its characterisation. In particular we describe the Markov equivalence classes of mDAG models over three observed variables. We also give a constructive proof that when two observed variables are not joined by any edge (directed or bidirected), then this always induces some constraint on the observed joint distribution.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


21/03/2012

Automated interviews on clinical case reports to elicit directed acyclic graphs

Davide Luciani (Istituto di Ricerche Farmacologiche `Mario Negri)

Setting up clinical reports within hospital information systems makes it possible to record a variety of clinical presentations. Directed acyclic graphs (Dags) offer a useful way of representing causal relations in clinical problem domains and are at the core of many probabilistic models described in the medical literature, like Bayesian networks. However, medical practitioners are not usually trained to elicit Dag features. Part of the difficulty lies in the application of the concept of direct causality before selecting all the causal variables of interest for a specific patient. We designed an automated interview to tutor medical doctors in the development of Dags to represent their understanding of clinical reports. Medical notions were analyzed to find patterns in medical reasoning that can be followed by algorithms supporting the elicitation of causal Dags. Clinical relevance was defined to help formulate only relevant questions by driving an expert’s attention towards variables causally related to nodes already inserted in the graph. Key procedural features of the proposed interview are described by four algorithms. The automated interview comprises questions on medical notions, phrased in medical terms. The first elicitation session produces questions concerning the patient’s chief complaints and the outcomes related to diseases serving as diagnostic hypotheses, their observable manifestations and risk factors. The second session focuses on questions that refine the initial causal paths by considering syndromes, dysfunctions, pathogenic anomalies, biases and effect modifiers. Revision and testing of the subjectively elicited Dag is performed by matching the collected answers with the evidence included in accepted sources of biomedical knowledge. (Joint with F. Stefanini)

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


15/03/2012

Multivariate Regression Chain Graph Models: an application to the study of determinants of fertility and housing conditions

Ilaria Vannini (Università di Firenze)

Over the last two decades variables related with housing conditions have been the focus of increased attention as determinants of fertility behaviour. Indeed, a complex structure of association between fertility and housing has been found. This work contributes to move beyond existing research findings using the theory of graphical models, in particular of multivariate regression chain graphs models. They are marginal models, i.e. focused in the distribution of clustered joint responses among which mutual dependence is treated as a nuisance, after controlling for a set of covariates. An ad-hoc parametrization, based on multivariate logistic regression and a fitting procedure specified in terms of individual data, which allows to deal with both continuous and categorical explanatory variables, have been proposed. This approach provides for modeling separately the response of each unit of a couple, resulting in important findings about the relationship between fertility and housing by women and men personal point of view. Moreover, it has been shown that multivariate regression chain graphs models are feasible even when data with a complex structure and many variables are involved. We present one of the first examples of a socio-economic application of these kinds of model.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


15/03/2012

Statistical methods and applications for seismology

Marta Gallucci (Università di Firenze)

Among the fields in which statistical applications have limited extension but greater potential, we certainly find geophysics and, particularly, seismology. In the study of seismic events there are several sources of uncertainty: first of all, the development of the theory of plate tectonics as a unifying framework for several geological phenomena is quite recent, and some aspects still remain without a commonly accepted explanation; moreover, the localization of earthquakes is obtained by means of an estimation procedure which interpolates data available in the nearby recording stations; finally, the study of the distribution of future events arises the great problem of earthquake prediction. In this talk, we present two different studies: the first one is a real data application of a statistical test for structural change to models for seismic activity; the second one is a simulation study in which the relationships between the ETAS model parameters and the inter-events times are explored by means of an emulator.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


22/02/2012

A General Framework for modelling ordinal data: Statistical foundations, inferential issues and extensions

D. Piccolo e M. Iannario (University of Naples Federico II)

This contribution discusses some statistical issues of a mixture distribution introduced for the analysis of ordinal data generated by ratings and evaluations (CUB models). The rationale for this framework stems from the interpretation of the respondent's final choice as the output of a complex psychological process whose main components are the personal feeling towards the item and an inherent uncertainty surrounding the choice. Thus, on the basis of experimental data and statistical motivations, the response distribution is modelled as the convex combination of a shifted Binomial and a discrete Uniform random variables whose parameters may be consistently estimated and validated by maximum likelihood inference. In addition, subjects' covariates are introduced in order to assess how the characteristics of the respondents may affect the rating. The Seminar focuses on several extensions which concern objects' covariates, multi-items, hierarchical structure, shelter effects and overdispersion. The approach has been successfully applied in several fields as Sensometrics and Marketing. Finally, some empirical evidence is reported to support the usefulness of this framework.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


21/12/2011

Economic and societal costs of being NEETs

Massimiliano Mascherini (Eurofound)

The traditional indicators for labour market participation are frequently criticised for their limited relevance for youth as many of them are students and hence classified as being out of the labour force. Also for this reason, EU policy makers have recently started to focus their attention on the NEETs. The acronym NEETs stands for those who are Not in Employment, Education or Training, aged typically between 15 and 24, who regardless their educational level are disengaged from both work and education and are at a higher risk of labour market and social exclusion. Being NEET is a waste of young people’s potential, but also has adverse consequences on society and the economy. In fact, spending periods of time as NEET may lead to a wide range of social disadvantages, such as disaffection, insecure and under paid employment, youth offending mental and physical problems. These outcomes each have a cost attached to and as such being NEET is not just a problem for the individual but also for our societies and our economies as a whole. Aim of this seminar is to provide an estimation of the economic and societal costs of being NEET and to highlight the importance of strengthening governments and social partners’ efforts for re-engaging NEETs into labour market.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


28/11/2011

Statistica e causalità: un caso di convergenze parallele

Bruno Chiandotto (Università di Firenze)

Nel corso del seminario, verrà delineato lo sviluppo dei due principali approcci all’ inferenza statistica quello classico e quello bayesiano. Dei due approcci verranno sottolineati, seppure molto sommariamente, la coerenza logica e il successo conseguito in ambito applicativo, soffermando l’attenzione su alcuni aspetti caratteristici che consentono d’interpretarli come casi particolari della teoria statistica delle decisioni. Un percorso analogo verrà seguito nella illustrazione del concetto di causalità e dell’inferenza causale dedicando particolare attenzione alla cosi-detta teoria causale delle decisioni. Si cercherà, infine, di evidenziare come proprio nell’approccio decisionale possano risiedere elementi in grado, per un verso di giustificare alcuni sviluppi recenti dell’analisi causale, per altro verso d’individuare il punto di convergenza tra inferenza statistica ed inferenza causale. Punto che viene identificato nella “Teoria causale bayesiana-soggettiva delle decisioni statistiche”.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


22/09/2011

Markov equivalences for loopless mixed graphs

Kayvan Sadeghi (Oxford University)

In this talk we describe a class of graphs with three types of edges, called loopless mixed graphs (LMGs). The class of LMGs contains almost all known classes of graphs used in the literature of graphical Markov models as its subclasses. We discuss motivations behind using LMGs, and define a unifying interpretation of independence structure for LMGs. We also propose four problems regarding Markov equivalences for LMGs and tackle some of them.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


19/07/2011

Introducing Bayes Factors

Leonhard Held (University of Zurich)

Statistical inference is traditionally taught exclusively from a frequentist perspective. If Bayesian approaches are discussed, then Bayesian parameter estimation is described only, perhaps showing the formal equivalence of a Bayesian reference analysis and the frequentist approach. However, the Bayesian approach to hypothesis testing and model selection is intrinsically different from a classical approach and offers key insights into the nature of statistical evidence. In this talk I will give an elementary introduction to Bayesian model selection with Bayes factors. I will then summarize important results on the relationship between P-values and Bayes factors. A universal finding is that the evidence against a simple null hypothesis is by far not as strong as the P-value might suggest. If time permits, I will also describe more recent work on Bayesian model selection in generalized additive models using hyper-g priors.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


30/06/2011

The Quality of Surveys

Jelke Bethlehem (University of Amsterdam)

Surveys research is a type of research where data is collected by asking questions to a sample of persons from a population. On the basis of the collected data, conclusions are drawn about the population as a whole. The question is whether this always is a scientifically sound research method. The presentation starts with an historic overview of how surveys indeed became a reliable research instrument. Nowadays there are two important issues that may affect the quality of survey results. One of these is the phenomenon of nonresponse. It often causes estimates to be biased. Many countries suffer from increasing nonresponse rates and this makes the problem more serious. The nonresponse problem is discussed and also some techniques that may help to reduce problems. However, success is not guaranteed. The fast development of the Internet has led to a new form of survey, and this is the web survey. Almost everybody can do a web survey. There are many examples of badly designed web surveys. They suffer from methodological problems like under-coverage, self-selection and measurement problems. The effects of under-coverage and self-selection on the quality of survey outcomes are discussed in more detail.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


28/06/2011

Point pattern modeling for degraded presence-only data over large regions

Alan Gelfand (Duke University)

Explaining species distribution using local environmental features is a long standing ecological problem. Often, available data is collected as a set of presence locations only thus precluding the possibility of a presence-absence analysis. We propose that it is natural to view presence-only data for a region as a point pattern over that region and to use local environmental features to explain the intensity driving this point pattern. This suggests hierarchical modeling, treating the presence data as a realization of a spatial point process whose intensity is governed by environmental covariates. Spatial dependence in the intensity surface is modeled with random effects involving a zero mean Gaussian process. Highly variable and typically sparse sampling effort as well as land transformation degrades the point pattern so we augment the model to capture these effects. The Cape Floristic Region (CFR) in South Africa provides a rich class with such species data. The potential, i.e., nondegraded presence surfaces over the entire area are of interest from a conservation and policy perspective. Our model assumes grid cell homogeneity of the intensity process where the region is divided into 37, 000 grid cells. To work with a Gaussian process over a very large number of cells we use predictive process approximation. Bias correction by adding a heteroscedastic error component is implemented. The model was run for a number of different species. Model selection was investigated with regard to choice of environmental covariates. Also, comparison is made with the now popular Maxent approach, though the latter is much more limited with regard to inference. In fact, inference such as investigation of species richness immediately follows from our modeling framework.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


21/06/2011

Recommended tests for association in 2x2 tables

Stian Lydersen (Norwegian University of Science and Technology)

Testing for association in 2x2 contingency tables is one of the most common tasks in applied statistics. A well established approach is to use Pearson’s chi-squared test in large samples and Fisher’s exact test in small samples. Both tests have drawbacks which are not widely known. The true significance level of Pearson’s chi-squared test is often larger than the nominal significance level. The true significance level of Fisher’s exact test is unnecessarily low, with poor power as the result. Better tests are available. Unconditional exact tests produce accurate P-values, have high power, and are available in commercial and free software packages. Another approach, called the mid-p, gives about the same results as an unconditional test. The traditional Fisher’s exact test should practically never be used. (Jointly with Morten Fagerland and Petter Laake)

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


16/06/2011

Bayesian wavelet-based curve classification via discriminant analysis with Markov random tree priors

Francesco Stingo (Rice University)

Discriminant analysis is an effective tool for the classification of experimental units into groups. When the number of variables is much larger than the number of observations it is necessary to include a dimension reduction procedure into the inferential process. Here we present a typical example from chemometrics that deals with the classification of different types of food into species via near infrared spectroscopy. We take a nonparametric approach by modeling the functional predictors via wavelet transforms and then apply discriminant analysis in the wavelet domain. We consider a Bayesian conjugate normal discriminant model, either linear or quadratic, that avoids independence assumptions among the wavelet coefficients. We introduce latent binary indicators for the selection of the discriminatory wavelet coefficients and propose prior formulations that use Markov random tree (MRT) priors to map scale-location connections among wavelets coefficients. We conduct posterior inference via MCMC methods, we show performances on our case study on food authenticity and compare results to several other procedures.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


19/05/2011

Inequality, Health, and Mortality

Prasada Rao (School of Economics, University of Queensland, Australia)

This paper proposes a new method to measure health inequalities that are caused by conditions amenable to policy intervention. The method is built on a technique that can separate avoidable and unavoidable mortality risks, using world mortality data compiled by the World Health Organization for the year 2000. The new method is applied to data from 191 countries. It is found that controlling for unavoidable mortality risks leads to a lower estimate of health inequality than otherwise, especially for developed countries. Furthermore, although countries with a higher life expectancy at birth tend to have lower health inequality, there are significant variations in health inequalities across countries with the same life expectancy. The results therefore support the WHO's plea for using health inequality as a distinct parameter from the average level of health in assessing the performance of health systems.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


19/05/2011

Measuring the Size and Structure of the World Economy: The International Comparison Program

Prasada Rao (School of Economics, University of Queensland, Australia)

The International Comparison Program (ICP) has become the world’s largest international statistical activity that included 100 countries in 2005 plus the 46 additional countries in the Eurostat-OECD program . The ICP is a global statistical initiative that supports inter-country comparisons of Gross Domestic Product and its components using Purchasing Power Parities as a currency converter. The foundation of the ICP is the comparison of national prices of a well defined basket of goods and services under the conceptual framework of the System of National Accounts. While the ICP shares a common language and conceptual framework with national statistical systems for measuring the Consumer Price Index and their national accounts, it faces unique challenges in providing statistical methodology that can be carried out in practice by countries differing in size, culture, the diversity of goods and services available to their population, and statistical capabilities.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


12/05/2011

La SAM come strumento di analisi delle politiche: un'applicazione al caso toscano.

Stefano Rosignoli (Irpet)

Le SAM (Social Accounting Matrix) sono matrici di dati che rappresentano i flussi che intercorrono tra diversi soggetti (branche produttive e settori istituzionali) di un sistema economico in un intervallo di tempo (generalmente l’anno). Nascono all’interno della teoria economica tradizionale come una estensione dei modelli input-output e sono largamente utilizzate per l’analisi delle economie in via di sviluppo e recentemente sono tornate ad essere oggetto di interesse anche per lo studio delle economie sviluppate grazie alla maggiore disponibilità, affidabilità e standardizzazione dei dati di contabilità nazionale e regionale. Le SAM costituiscono la base informativa di un’ampia gamma di modelli multisettoriali, sviluppati spesso all’interno di quadri teorici alternativi (modelli lineari, modelli di equilibrio economico generale, modelli di simulazione micro-macro): la flessibilità della loro struttura consente la calibrazione di modelli per analizzare parti specifiche dell’economia pur rimanendo all’interno di una completa e coerente cornice macroeconomica: ad esempio per studiare l’impatto macroeconomico di particolari politiche settoriali, oppure per analizzare la differenziazione geografica degli impatti (modelli multi regionali, disaggregazione rurale-urbana dell’economia). Malgrado le difficoltà teoriche e pratiche di costruzione di SAM al regionali, questo ambito territoriale risulta particolarmente adeguato per eseguire analisi e simulazione SAM-based, ancora di più se consideriamo la crescente importanza delle politiche economiche in ambito federalista. Il seminario ha lo scopo di illustrare la struttura e le linee guida seguite dall’IRPET nella costruzione della SAM dell’economia della Toscana e a illustrarne le sue potenzialità in termini applicativi. (Lavoro in collaborazione con Benedetto Rocchi).

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


05/05/2011

A MCMC data augmentation algorithm for contaminated case-control analyses

Fabio Divino (Università del Molise)

In molti ambiti applicativi esiste un grande interesse verso lo studio di modelli parametrici di regressione per disegni campionari non casuali. Una situazione molto comune riguarda i cosiddetti dati di sola presenza in cui si vuole studiare la relazione fra un insieme di covariate informative (X) ed una variabile di risposta binaria (Y) nota solo nel caso Y=1. In tal senso uno schema campionario che viene adottato è quello di considerare due gruppi distinti di osservazioni. Il primo gruppo è rappresentato da un campione casuale di presenze (Y=1) mentre il secondo è un campione casuale selezionato dall'intera popolazione di riferimento P per cui solo le variabili esplicative sono osservate e non la variabile risposta Y. Questo tipo di schema, molto utilizzato in forma retrospettiva, è noto in letteratura come disegno caso-controllo contaminato (Hsieh et al., 1985) in quanto le osservazioni campionarie relative ai controlli (Y=0) sono contaminate da un meccanismo di censura. In ambito ecologico, ad esempio, è molto frequente che, per popolazioni animali o vegetali, sia difficile determinare un campione di assenze; in tal senso l'inferenza sui modelli di regressioni deve basarsi su un'informazione solo parziale della variabile risposta. Per affrontare questo tipo di situazione diverse soluzioni sono state presentate in letteratura. Di recente Ward et al. (2009) hanno proposto un algoritmo di tipo EM per studiare dati di sola presenza basato sulla funzione di verosimiglianza. Tale approccio, nella sua formulazione più efficiente, prevede però la conoscenza a priori della prevalenza marginale della variabile Y, assunzione molto difficile da accettare nella pratica. In questo lavoro vogliamo presentare un algoritmo di tipo MCMC che può essere utilizzato soprattutto in ambito Bayesiano per la stima di un modello logistico lineare. Il contributo principale riguarda l'introduzione di un'approssimazione aleatoria del fattore di correzione del modello caso-controllo contaminato. In tal senso, attraverso un passo di data-augmentation, è possibile stimare i parametri di regressione congiuntamente alla prevalenza marginale di Y. I risultati ottenuti, relativi a simulazioni di diverse situazioni sperimentali, sono molto incoraggianti soprattutto per la grande efficacia previsiva sul parametro di prevalenza. Anche la precisione e l'efficienza delle stime di regressione sono buone ma, come è naturale che sia, restano in parte condizionate al grado di informatività delle covariate X. Questo lavoro si svolge in collaborazione con Natalia Golini (Università di Roma), Giovanna Jona Lasinio (Università di Roma) e Antti Penttinen (University of Jyvaskyla).

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


07/04/2011

The state of the art in indicators construction

Filomena Maggino (Università di Firenze)

Measuring social phenomena requires a multidimensional and integrated approach aimed at describing them through a complex multifaceted and compound methodology. The aim of the seminar is to unravel some important methodological aspects and issues that should be considered in developing indicators aimed at measuring social phenomena in quantitative perspective. The purpose is to focus on, examine closely and investigate - the conceptual issues in defining and developing indicators and - the methodologies and operative issues in managing the complexity of the obtained observation, integrating different aspects of the reality.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


27/01/2011

Weighting the Elementary Price Relatives in Time Consumer Price Index (CPI) Construction

Beatriz Larraz (University of Castilla-La Mancha, Spain)

The importance of inflation/deflation in current economic systems claims for it to be measured in an as much accurate way as possible. Consumer Price Index (CPI) has been taken as the tool for inflation/deflation measurement. This is a Laspeyres type index which suffers, inter alia, from the very well-known drawbacks of having a fixed-base weighting system, and therefore of not permitting to catch the substitution effect and the quality modifications. Moreover, in actual CPI estimation, the elementary price relatives are not weighted at the first aggregation level, made both as an arithmetic and a geometric mean, due to the lack of suitable information, that transforms into the impossibility of obtaining a reliable weighting structure. This implies the assumption that the quotations of the same item in different outlets have the same importance, whereas it is not so as the spatial location undoubtedly has an influence on prices. As a consequence, a bias is introduced since the first step of the inflation/deflation estimation procedure. Since the prices are collected in geographical locations, i.e., they are geo-referenced data, it is possible to make resort to the outlets coordinates to set up a weighting system, provided these coordinates are surveyed along with prices. This paper suggests a new approach to the elaboration of the above arithmetic and geometric means based on Kriging methodology. In particular, it proposes to weight the elementary price relatives by taking account of the spatial correlation that these prices present. It is shown that the weighted geometric and arithmetic means of elementary price relatives estimators obtained are better than the simple geometric and arithmetic means estimators. (Joint work with Guido Ferrari, University of Florence, Italy & Renmin University of China, PR China)

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


16/12/2010

The scientific collaboration network(s) of Italian academic statisticians

Susanna Zaccarin (Università di Trieste)

Scientific collaboration is a complex phenomenon characterized by an intense form of interaction among scientists improving the diffusion of knowledge in the community. In this contribution the network of scientific collaboration among Italian statisticians will be analyzed along the lines discussed for other disciplines. Relationships among Italian statistician will be derived from information on their research products (scientific publications). To this aim, several data sources can be considered (from international bibliographic databases as reported in ISI web of Science or Scopus to national archives). Each data source has benefits and limitations related to both the coverage of the population of tatisticians and the inclusion of different kind of publication(papers in international journals, books, working papers). The potential of integrated information to study the patterns of interactions in a research community will also be exploited.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


18/11/2010

Merger Simulation in a Two-Sided Market: The Case of the Dutch Daily Newspapers

Lapo Filistrucchi (Università di Firenze and TILEC, Tilburg University)

We develop a structural econometric framework that allows us to simulate the effects of mergers among two-sided platforms selling differentiated products. We apply the proposed methodology to the Dutch newspaper industry. Our structural model encompasses demands for differentiated products on both sides of the market and profit maximization by competing oligopolistic publishers who choose subscription and advertising prices, while taking the interactions between the two-sides of the market into account. We measure the sign and size of the indirect network e_ects between the two sides of the market and simulate the effects of a hypothetical merger on prices and welfare. (Joint paper with Tobias J. Kleinz and Thomas Michielsenx).

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


04/11/2010

Scale parameters integration for marginal likelihood estimation in conditionally linear state space models

Gabriele Fiorentini (Università di Firenze)

We show that the calculation of the marginal likelihood in the popular class of conditionally linear state space models can be improved by preliminary integration of the scale parameters. We explain how to integrate scale parameters out of the likelihood function by Kalman filtering and Gaussian quadrature. We find that this preliminary integration improves the accuracy of four marginal likelihood estimators, namely the Laplace method, the Chib's estimator, reciprocal importance sampling, and bridge sampling. For some simple but empirically relevant model specifications such as the local level and the local linear models the marginal likelihood can be obtained directly without any posterior sampling. Some examples show the details about the practical implementation of our method and its simplicity. Finally, two applications illustrate the gain in accuracy achieved. (Joint work with Christophe Planas and Alessandro Rossi)

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


25/10/2010

International Migration and Demography: a Two-Way Interaction

Philippe Fargues (European University Institute)

Are there links between the demographic transition and international migration, i.e. between one of the most massive changes to affect humanity in modern times and one of the most significant dimensions of connectivity between peoples? While many empirical studies have highlighted the reciprocal implications between demographic growth and migration, theory is rather silent: international migration theory does not put much emphasis on demography and demographic theory simply ignores international migration. While the model of demographic transition postulates a functional link between birth and death rates, there is no equivalent, cohesive model that would posit a relationship between migration, fertility and mortality. The presentation explores whether demographic change and migration can be intrinsically linked. It deals with only one facet of the demographic transition, which is the shift from high to low birth rates and its sociological correlate, the gradual substitution of a dominant pattern of large families with one of small ones. The other facet, which is the decline of mortality, the increase in longevity and the subsequent changes in the generational composition of the family, will not be tackled here, even though one may assume that this facet is also linked with migration. Part I, based on published works by Fargues (2006), focuses on the impact of international migration on the demographic transition and, more precisely, on birth control and the transition from high to low fertility rates amongst migrants in host countries and non-migrants in source countries. It argues that, because migrants remit ideas to their home countries and because most recent migration has been from high to low birth-rate countries, international migration has contributed to spreading values and practices that produce low birth rates in origin countries. International migration has, therefore, led to a smaller world population than the one that would have been observed in a zero migration scenario. Part II is entirely new and tackles the symmetrical influence of demographic change on international migration. It shows that declining birth rates in origin countries generate a new profile of the migrant. While migrants of earlier times had started to build a family before migrating, new migrants typically leave no wives or children in the home country, as a result of relatively unchanged age patterns of migration while marriage takes place later in the life cycle and fewer children are procreated. The conclusion suggests that this fundamental change may produce a critical shift in the economy of migration. Until recently migrants from the developing world were motivated by an altruistic drive to feed and educate their families at home. Remittances were the main, if not the only, reason for emigration. Today, young migrants’ goal is more likely to be self-accomplishment. Unlike their predecessors, the primary objective of typical migrants is no longer to improve the family’s standing at home for the mere reason that there is no longer such a family, but to increase opportunities for themselves. Remittances shift from an altruistic to a selfish use and migrants have an increasing propensity to accumulate not only financial capital, but also individual human capital through education and experience.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


28/09/2010

Theoretical Foundations for the Analysis of Fertility: Gender Equity

Peter McDonald (Australian Demographic and Social Research Institute and President of IUSSP)

In 2000, I published the following: "Very low fertility as observed in many advanced countries today is the result of incoherence in the levels of gender equity inherent in social and economic institutions. Institutions which deal with women as individuals are more advanced in terms of gender equity than institutions which deal with women as mothers or members of families. There has been considerable advance in gender equity in the institutions of education and market employment. On the other hand, the male breadwinner model often remains paramount in the family itself, in services provision, in tax-transfer systems and in industrial relations. This leaves women with stark choices between children and employment, which, in turn, leads to some women having fewer children than they would like to have, and very low fertility." In this paper, I revisit this work to provide further specification of the theory. In this paper I argue that low fertility is the result of reactions of some women to perceived inequity in the social and economic context in which they live. Not being able to have a political impact to change work and family policy, women react by not having children. This is a manifestation of Nancy Fraser’s concept of parity of participation. I argue the theory can only be tested across social contexts or across time. I provide directions for how the theory might be tested.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


27/09/2010

Demographic Effects on GDP per Capita: A Cross-National Study

Peter McDonald (Australian Demographic and Social Research Institute and President of IUSSP)

In the context of low fertility, there is a concern about future decline in the size of the labour force and its potential economic impacts. This can be addressed through increases in fertility but attention has also been addressed by demographers towards ‘replacement migration’ – using migration to make up for the deficit in labour supply. Recently demographers have been asking questions like: what level of migration would it take to maintain a constant number of people at age 30 in the future? Why should the aim be to have a constant number of people at age 30? This seems to be an unsophisticated approach? It is not difficult to construct models that provide much more useful results for alternative demographic futures than mere replacement of the population at age 30.Much more meaningfully, we can model future labour supply or future GDP or future GDP per capita. This paper uses the computer software, MoDEM2, to model these more useful outcomes under varying scenarios for seven countries: Italy, Spain, Germany, Austria, France, Sweden and Japan. The results show that many countries facing rapid ageing of their populations could alleviate the impact on GDP per capita by increasing participation and, ideally, increasing their low rates of labour productivity. For other countries where participation is already relatively high, dem0ographic approaches (higher fertility or higher migration) would be required but these impacts are slow. Almost all countries face a major fall in rates of growth of GDP per capita in the next decade as a result of the retirement from the labour force of the baby-boom generation.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


16/09/2010

Healthcare utilization, socioeconomic factors and child health in India

Alok Bhargava (University of Houston)

This paper modelled the proximate determinants of height, weight and hemoglobin concentration of over 25,000 Indian children using data from the National Family Health Survey-3. The effects of healthcare services utilization, food consumption patterns and maternal health status on child health were investigated in a multidisciplinary framework. The results from models for birth weight and size showed that antenatal care, birth intervals, and maternal education, food consumption patterns and nutritional status were significant predictors. Second, models for children’s heights and weight showed beneficial effects of child vaccinations against DPT, polio, and measles, and negative effects of not utilizing government health facilities. Methodological issues such as potential endogeneity of birth variables and appropriateness of combining height and weight as the Body Mass Index were tackled. Third, models for children’s hemoglobin concentration indicated beneficial effects of food consumption patterns, treatment against intestinal parasites and maternal BMI. Finally, models were estimated for maternal weight and hemoglobin concentration. Overall, the results provide policy insights for improving maternal and child health in India.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


13/05/2010

Estimating finite mixture models: theoretical and computational issues

Paolo Frumento (Università di Pisa)

We investigate the likelihood surface of finite mixture models; stationary points are classified into three main categories, i.e., local maxima, spurious maxima, and saddle points. Due to their presence, in some settings, obtaining a reliable maximum likelihood estimates (MLE) is a difficult task. We exploit a genetic algorithm for the search of the true MLE; a flexible software is developed and presented and some simulation results are discussed.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


09/04/2010

Semiparametric methods for causal inference: the effect of debit cards on household spending

Andrea Mercatanti

Motivated by recent findings in the field of consumer science, this paper proposes an application of power series methods in order to evaluate the effect of debit cards (i.e. Bancomat) on household consumption. The basic assumption is unconfoundedness and it requires the adoption of debit cards is independent on the potential consumptions given a set of pre-treatment variables. This offers the advantage of avoiding the introduction of assumptions regarding the link between observable and unobservable quantities, and it also improves the precision relative to other main methodological options. The analysis results in positive effects on household monthly spending.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


29/01/2010

L'indagine sui bilanci delle famiglie italiane

Andrea Neri (Banca d'Italia)

Disporre di informazioni sul tenore di vita delle famiglie e sui loro comportamenti economici e finanziari è cruciale, sia perché si tratta di temi assai rilevanti nel dibattito politico ed economico-sociale, sia perché costituisce la base essenziale per impostare azioni di policy e verificarne nel tempo l'efficacia. La Banca d'Italia conduce da oltre 40 anni un'indagine sui redditi e la ricchezza delle famiglie italiane, diffondendone i principali risultati e mettendone a disposizione per finalità di ricerca i microdati. Il seminario si propone di descrivere le principali caratteristiche e modalità di utilizzo di tale indagine. Verranno inoltre discusse le principali complicazioni legate all'uso di un'indagine complessa a fini di ricerca.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


14/01/2010

Strumenti per la valutazione delle competenze

Stefania Mignani (Università di Bologna)

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


17/12/2009

Propensity score reweighting in analysis of the gender pay gap

Philippe Van Kerm (CEPS/INSTEAD)

This paper revisits estimation of Oaxaca – Blinder decomposition of wage differentials using weighted least squares in order to achieve greater robustness against model misspecification when the distribution of covariates are highly imbalanced across the groups compared. Monte Carlo simulations and an empirical application to gender wage differentials on the Socio – Economic Panel „LIEWEN ZU LËTZEBUERG‟ (PSELL) show how WLS estimates are a much more accurate estimate of the wage differentials than OLS ones.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


03/12/2009

Valutazione di sistemi probabilistici di identificazione indiretta basati su evidenze di DNA nucleare

Fabio Corradi (Università di Firenze)

La possibilità di riuscita di una identificazione indiretta basata su loci STR vari in funzione di molti fattori, fra i quali la distanza nel pedigree familiare fra le persone che richiedono l'identificazione e il membro cercato, nonchè le frequenze alleliche corrispondenti ai genotipi delle persone che nella famiglia provvedono evidenza genetica. Nonostante sia opinione comune che ci possa essere molta variabilità fra caso e caso, la valutazione identificativa viene generalmente eseguita facendo uso di un numero e qualità di loci dipendente dalla disponibilità e dalla convenienza economica di kit largamente diffusi sul mercato. Lo scopo di questo lavoro è quello di proporre una valutazione probabilistica ex-ante di una richiesta di identificazione facendo uso dei dati genici dei familiari misurati rispetto ad un kit scelto dall'utilizzatore. Sulla base dei risultati risulterà possibile dare un giudizio sull'opportunità di procedere alla stessa acquisendo le evidenze genetiche del candidato all'identificazione oppure se riproporre la questione identificativa facendo uso di evidenze differernti.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


03/12/2009

Marginal Effect Using The Conditional Distribution Dynamics Approach

Angela Parenti (Università di Pisa)

The paper describes a two-stage procedure for estimating the conditional distribution dynamics. The proposedmethodology is evaluated by a Monte Carlo study on the growth of income across a large sample of countries. The first stage consists in the estimate of the growth regression by parametric and/or semiparametric methods; the estimate is then used to calculate the counterfactual distribution and the marginal impact related to a specific explanatory variable. In the second stage the impact on the distribution dynamics of the selected variable is analyzed by estimating the counterfactual stochastic kernel and the conditional distribution of marginal impact. The methodology also provides a diagnostics for detecting potential distribution effect of omitted variables in the growth regression.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


26/11/2009

An M-quantile Random Effects Model for Hierarchical and Repeated Measures Data

Nicola Salvati (Università di Pisa)

Quantile analysis of hierarchical and repeated measures data has recently attracted some interest. The most recent attempt for extending the quantile regression model into a random effects quantile model is described in Geraci and Bottai (2007). In this work we propose an alternative approach for accounting for the hierarchical structure of data when modelling the quantiles of f (y |X). In particular, we extend the M-quantile model into an M-quantile random effects model. The proposed model allows for outlier robust estimation of both Þxed and random effects. In addition, by modelling M-quantiles, instead of the ordinary quantiles, we gain algorithmic stability.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


17/11/2009

L'insegnamento della statistica nella scuola: leggere e interpretare l'informazione quantitativa

Maria Gabriella Ottaviani (Sapienza, Univ. di Roma)

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


05/11/2009

The transition out of the parental home among second generation migrants in Spain: A cross-classified Multilevel Analysis

Bruno Arpino (Università Bocconi)

This paper presents a model for the analysis of parental home leaving determinants for second generation migrants in Spain by simultaneously taking into account their place of origin and the province of residence. Using a cross-classified multilevel analysis it is shown that variation across origin groups is much larger than that due to province of residence. However, variance at the province level is not negligible. It is also found that migrants are extremely heterogeneous with respect to their origin but geographical clustering is evident. Finally, we find that almost all migrants groups show higher probability of leaving home than natives. We also plan to use multiple membership models to keep into account the effect of previous province of residence.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


08/10/2009

Combining Duration and Intensity of Poverty: Proposal of a New Index of Longitudinal Poverty

Daria Mendola (Università di Palermo)

Traditional measures of poverty persistence, such as the “poverty rate” or the “persistent-risk-of-poverty rate”, do not devote sufficient attention to the sequence of poverty spells. In particular, they are insufficient in underlining the different effects associated with occasional, single spells of poverty and consecutive years of poverty. Here, we propose a new index which measures the severity of poverty, taking into account the way poverty and non-poverty spells follow one another along individual life courses. The index is normalised and increases with the number of consecutive years in poverty along the sequence, while the index decreases when the distance between two years of poverty increases. All the years spent in poverty concur with the measurement of the persistency in poverty, but with a decreasing contribution as long as the distance between two years of poverty become longer. A weighted version of the index is also proposed, explicitly taking the distance from the poverty line of poor people into account. Both the indexes are supported by a conceptual framework and characterised via properties and axioms. They are validated according to content, construct and criterion validity assessment and tested on a sample drawn from young European adults participating in the European Community Households Panel survey.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


30/09/2009

Improved Regression Calibration

Anders Skrondal (Norwegian Inst. of Public Health)

The joint likelihood for generalized linear models with covariate measurement error cannot in general be expressed in closed form and maximum likelihood estimation is hence taxing. A popular alternative is regression calibration which is computationally efficient at the cost of potentially inconsistent parameter estimation. We propose an improved regression calibration approach, based on an approximate decomposed form of the joint likelihood, which is both consistent and computationally convenient. It produces point estimates and estimated standard errors which are practically identical to those obtained by maximum likelihood. Simulations suggest that improved regression calibration, which is easy to implement in standard software, works well in a range of situations.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


29/09/2009

Factor Scores as Proxies for Latent Variables in Structural Equation Modelling (SEM)

Anders Skrondal (Norwegian Inst. of Public Health)

Structural equation models with latent variables are sometimes estimated using an intuitive approach where factor scores are plugged in for latent variables. Ordinary regression analysis is then performed with the factor scores simply treated as observed variables. Not surprisingly, we show that this approach in general produces inconsistent estimates of the parameters of main scientific interest. Rather remarkably, consistent estimates for all parameters can however be obtained if the factor scoring methods are judiciously chosen.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


07/09/2009

Direct and Indirect Effects: An Unhelpful Distinction?

Donald B. Rubin (Harvard University)

The terminology of direct and indirect causal effects is relatively common in causal conversation as well in some more formal language. In the context of real statistical problems, however, I do not think that that terminology is helpful to clear thinking, and rather leads to confused thinking. This presentation will present several real examples where this point arises, as well as one which illustrates that even Sir Ronald Fisher was vulnerable to such confusion.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


18/06/2009

Mixture Priors for Bayesian Variable Selection

Marina Vannucci (Rice University)

In this talk I will review Bayesian methods for variable selection that use spike and slab priors. Specific interest will be towards high-dimensional data. Linear and nonlinear models will be considered, with continuous, categorical and survival responses. Applications will be to genomics data from DNA microarray studies. The analysis of the high-dimensional data generated by such studies often challenges standard statistical methods. Models and algorithms are quite flexible and allow us to incorporate additional information, such as data substructure and/or knowledge on gene functions and on relationships among genes.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


11/06/2009

Bayesian CAR Models for Syndromic Surveillance on Multiple Data Streams: Theory and Practice

Gauri Datta (Georgia University)

Syndromic surveillance has, so far, considered only simple models for Bayesian inference. This lecture details the methodology for a serious, scalable solution to the problem of combining symptom data from a network of U.S. hospitals for early detection of disease outbreaks. The approach requires high-end Bayesian modeling and significant computation, but the strategy described here appears to be feasible and offers attractive advantages over the methods that are currently used in this area. The method is illustrated by application to ten quarters worth of data on opioid drug abuse surveillance from 636 reporting centers, and then compared to two other syndromic surveillance methods using simulation to create known signal in the drug abuse database.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


04/06/2009

A segmented regression model for event history data: an application to the fertility patterns in Italy

Massimo Attanasio (Università di Palermo)

We propose a segmented discrete-time model for the analysis of event history data in demographic research. Through a unified regression framework, the model provides estimates of the effects of explanatory variables and jointly accommodates flexibly non-proportional differences via segmented relationships. The main appeal relies on ready availability of parameters, changepoints, and slopes, which may provide meaningful and intuitive information on the topic. Furthermore, specific linear constraints on the slopes may also be set to investigate particular patterns. We investigate the intervals between cohabitation and first childbirth and from first to second childbirth using individual data for Italian women from the Second National Survey on Fertility. The model provides insights into dramatic decrease of fertility experienced in Italy, in that it detects a 'common' tendency in delaying the onset of childbearing for the more recent cohorts and a 'specific' postponement strictly depending on the educational level and age at cohabitation.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


27/05/2009

Interdependencies between fertility and women's labour supply in Europe. How a multi-process hazard model can help us in modeling this relationship?

Anna Matysiak (Warsaw School of Economics)

The paper discusses the state of the current research on the interdependencies between fertility and women’s labour supply in Europe. It outlines a theoretical model of decision-making with respect to childbearing and economic activity of women and formulates conditions that should be met for a proper assessment of the time conflict between the two activities. Against this theoretical background studies researching the association between fertility and women’s labour supply are critically evaluated. In the next step it is discussed how a multi-process hazard model can help in eliminating some of the shortcomings of the current research. The model is estimated for Poland and its findings are discussed within the Polish socio-economic context. The paper concludes with suggestions for further research in the field of the interdependencies between fertility and women’s labour supply.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


14/05/2009

Pension Issues in Japan: How Can We Cope with the Declining Population?

Noriyuki Takayama

After a brief sketch of the Japanese demography and its impact on financing social security, I turn to explaining the Japanese social security pension program and summarize Japan’s major pension problems. I further examine the 2004 pension reform and use the balance sheet approach to analyze its economic implications. I also discuss future policy options on pensions. Financial sustainability of social security pensions is not often attained even if its income statement enjoys a surplus. The balance sheet approach is an indispensable tool for people to understand the long run financial sustainability of social security pensions and to evaluate varying financial impacts of different reform alternatives. When it comes to social security pensions, the most important question is whether or not they are worth buying. Contributions are required to be much more directly linked with old-age pension benefits, while an element of social adequacy has to be incorporated in a separate tier of pension benefits financed by other sources than contributions. It is also shown that a shift to a consumption-based tax to finance the basic pension in Japan will induce smoother increases in pension burdens among different cohorts.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


07/05/2009

Another look into the effect of premarital cohabitation on duration of marriage: an approach based on matching

Stefano Mazzuco (Università di Padova)

The paper proposes an alternative approach to studying the effect of premarital cohabitation on subsequent duration of marriage on the basis of a strong ignorability assumption. The approach is called propensity score matching and consists of computing survival functions conditional on a function of observed variables (the propensity score), thus eliminating any selection that is derived from these variables. In this way, it is possible to identify a time varying effect of cohabitation without making any assumption either regarding its shape or the functional form of covariate effects. The output of the matching method is the difference between the survival functions of treated and untreated individuals at each time point. Results show that the cohabitation effect on duration of marriage is indeed time varying, being close to zero for the first 2–3 years and rising considerably in the following years.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


09/04/2009

Conjoint analysis and response surface methodology: searching the full optimal profile by status‐quo and optimization

Rossella Berni (Università di Firenze)

Standard conjoint analysis is a multi‐attribute quantitative method useful to study the evaluation of a consumer/user about a new product/service. In this seminar, our proposal is presented for a modified conjoint analysis (CA) in order to evaluate the new or revised product/service, through a generic consumer or user. The proposal is based on the application of Response Surface Methodology (RSM) in order to set the best preference for a sample of respondents by evaluating both the quantitative judgements about the full profiles and the judgements about the current situation, or status‐quo; in addition, baseline variables of respondents are considered. Therefore, the optimal solution for the new or revised product/service is obtained by computing the optimal hypothetical solution through the experienced status quo. Note that the estimated model is subsequently optimized in order to set the best preference on the basis of the factors, involved in the experimental design, and judgments involved. Our proposal is applied to University students of the II‐nd and III‐rd year; the aim is the evaluation of an interdisciplinary degree course of the University of Florence by achieving the best degree course solution according to the considered factors and the student’s judgements.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


09/04/2009

Design and analysis of teaching experiments for course quality in the academic setting

S. Barone

The never ending debate on quality invests every institution devoted to higher education. It is undeniable that with a massive and fast globalisation causing student flows in any direction, and under the constraints of limited economic resources devoted to higher education, the performances and the quality of any academic system must be constantly monitored and improved. Thus, whoever researches and teaches the scientific foundations of quality, has at the same time the right and the duty to provide his/her opinion and to firstly assure the quality of the processes for which he/she is responsible. In the recent past the authors started focusing on Teaching, as a relevant aspect for the transmission of knowledge to an audience who constantly experiences a rapidly changing technological environment. We recently proposed a methodology named TESF: Teaching Experiments and Student Feedback. It is aimed at designing, monitoring, and continuously improving (according to the Deming’s cycle) the quality of a university course. The TESF methodology is based on the concurrent adoption of Design of Experiments and the SERVQUAL model. The experiments are essentially “teaching experiments” performed by the teacher according to a predefined plan. The teacher is therefore the designer, the experimenter and part of the experimental unit. The other part of the experimental unit is constituted by a predefined sample of students attending the course (students evaluators sample) whose feedback is carefully studied. We have shown, by a preliminary application of the TESF, that it is absolutely unnecessary to be an experienced statistician to apply this methodology and even the description of the model is kept at the most general level. In fact the methodology, initially thought for the academic environment could be easily applied to any educational context. On the other side it is evident that expert statisticians can be stimulated by this approach, so a scientific discussion can be opened and further substantial improvements can be gained. This seminar is aimed at giving an overview of the TESF, emphasising the statistical aspects therein involved and the delicate experimental and measurement issues. An interesting upgrade concerning the data (student feedback) analysis will be mentioned. Results from the application of the methodology in three consecutive editions of a Statistics course at the University of Palermo will be presented.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


01/04/2009

Clustering of Curves

Silvia Liverani (Univ. of Warwick)

An increasing number of microarray experiments produce time series of expression levels for many genes. Some recent clustering algorithms respect the time ordering of the data and are, importantly, extremely fast. The aim is to cluster and classify the expression profiles in order to identify genes potentially involved in, and regulated by, the circadian clock. In this presentation we report a new development associated with this methodology. The partition space is intelligently searched placing most effort in refining the partition where genes are likely to be of most scientific interest. This utility based Bayesian search algorithm can be shown both theoretically and practically to outperform the greedy search algorithm which does not use contextual information to guide the search.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


25/03/2009

Forecasting a macroeconomic aggregate which includes components with common trends

Antoni Espasa (Univ. Carlos III de Madrid)

The empirical literature on forecasting an aggregate by forecasting disaggregates has usually used low disaggregation levels. The components at the highest level of the breakdown of a macroeconomic aggregate (full disaggregation) often show features which are shared by a significant proportion. In this paper the disaggregation level is treated as a statistical problem that can be approached by estimating common trends. This information on common trends enables to define a disaggregation scheme, by signalling from the full disaggregation all the components, say m, which share a common trend (these components define the set B) and grouping the remaining components in a sub-aggregate R. The m components of the set B are identified by a testing procedure carried over each of all the possible pairs of components from the full disaggregation. This approach obtains a parsimonious breakdown of the data in m+1 components. A forecasting strategy is proposed, consisting in forecasting each one of the components in B, taking into account the common trend they share, employing a cointegration mechanism for each component and the common trend, and forecasting independently the sub-aggregate R, and then aggregating all these forecasts. This strategy is applied to forecasting Euro area inflation and USA inflation, where a 37% and a 43% of CPI weight shares their common trend, respectively. It is shown that this strategy significantly improves the forecasting accuracy of the corresponding aggregate for all horizons from 1 to 12, this improvement increases with the length of the horizon in the Euro area case and it is constant across horizons for the USA inflation. Additionally, we argue that the out-of-sample accuracy gains implied by our procedure increases with the number of basic components which are full cointegrated. In this approach, it is important to work with the full disaggregation because official breakdowns consist on sub-aggregates which include components from both sets, B and R, so the cointegrated relationships between the official sub-aggregates can be unstable. Finally it is also shown that this procedure performs better than one based on dynamic factor analysis. (Joint work with Iván Mayo)

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


12/03/2009

Bayesian hierarchical model for the prediction of football results

G. Baio

The problem of modelling football data has become increasingly popular in the last few years and many different models have been proposed with the aim of estimating the characteristics that bring a team to lose or win a game, or to predict the score of a particular match. We propose a Bayesian hierarchical model to address both these aims and test its predictive strength on data about the Italian Serie A 1991-1992 championship. To overcome the issue of overshrinkage produced by the Bayesian hierarchical model, we specify a more complex mixture model that results in better fit to the observed data. We test its performance using an example about the Italian Serie A 2007-2008 championship. (jointly with M. Blangiardo)

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


12/03/2009

A Bayesian Calibration Modeling for Combining Different Preprocessign Methods in Affymetrix Chips

M. Blangiardo

In gene expression studies a key role is played by the so called “pre-processing”, a series of steps designed to extract the signal and account for the sources of variability due to the technology used rather than to biological differences between the RNA samples. Many studies have shown how this choice can affect the results of subsequent analysis carried out to measure the influence of biological contrasts in differential expression. At the moment there is no commonly agreed gold standard method and each researcher has the responsibility to choose one pre-processing method, incurring the risk of false positive and false negative features associated with the pre-processing method chosen. We propose a Bayesian model that combines several pre-processing methods to assess the “true” unknown differential expression between two conditions and show how to estimate the posterior distribution of the differential expression values of interest. The model is tested both on simulated data and on a spike in data set and its biological interest is demonstrated through a real example on publicly available data.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


26/02/2009

Ruolo delle donne e utilizzo della risorsa acqua nella valle del wadi Laou (Marocco): un caso di studio del progetto WADI

Lucia Fanini (Università di Firenze)

Il progetto euro-mediterraneo WADI (2006-2008) aveva come scopo la creazione di scenari per la sostenibilità dell’utilizzo umano delle risorse ecologiche. L’approccio alla varietà e complessità dell’ambiente mediterraneo è stato quello dei siti di studio, in modo da partire da una base locale per arrivare a scenari contenenti fin dal principio il contributo delle reali comunità. I conflitti relativi all’acqua sono stati il tema comune ai siti di studio selezionati, con problemi riguardanti la qualità e le scelte di gestione, piuttosto che la quantità di questa risorsa. In aree rurali il problema è particolarmente rilevante, in quanto una parte di popolazione è strettamente legata ai beni ambientali per la propria sopravvivenza, e di conseguenza risulta più esposta di altre agli effetti delle decisioni gestionali. Questo studio è stato effettuato in un’area storicamente estremamente isolata ed ora in rapida transizione, in particolare per quanto riguarda la pressione verso lo sviluppo turistico e l’aumento di infrastrutture. La valle del wadi Laou è ricca di sorgenti e ospita un sistema tradizionale (saquìa) di irrigazione, ma la gestione della risorsa acqua sta passando dalle tradizionali autorità di villaggio agli enti pubblici. In questo contesto, l’analisi di genere è necessaria per ottenere informazioni sulla parte femminile della popolazione che, pur potendo essere proprietaria di terre, non è rappresentata a nessun livello (neanche a quello di autorità tradizionale di villaggio) ma porta avanti buona parte del lavoro domestico e agricolo. Per tenere in considerazione le metodologie SEAGA (proposte dalla FAO per l’analisi di genere a livello di terreno-comunità) ma soprattutto per andare incontro alla situazione reale, la scelta del campione da analizzare e la metodologia di indagine sono state di tipo non probabilistico, coinvolgendo abitanti di zone rurali e urbane della valle. Sono state coinvolte nell’analisi 52 famiglie, in cui venivano intervistati in contemporanea l’uomo e la donna di maggiore potere. Il questionario conteneva quattro sezioni generali: dati socioeconomici della famiglia, accesso ai servizi di base, suddivisione del lavoro (domestico ed esterno), percezione dei problemi e rappresentazione a livello decisionale. I dati raccolti sono stati analizzati in maniera descrittiva mediante cluster analysis e nonmetric Multi Dimensional Scaling, considerando a livello di famiglia le condizioni socioeconomiche e di accesso ai servizi di base, e a livello di genere la percezione dei problemi e la possibilità di essere rappresentati nei gruppi di potere. I risultati hanno evidenziato la dinamicità del contesto relativa all’accesso ai servizi (aumento della scolarizzazione e dell’allacciamento alla rete idrica), così come l’emergenza di specifici problemi (la mancanza di rete fognaria anche nelle aree urbane più recenti, il pagamento richiesto per l’utilizzo del’acqua, che prima era gratuita), con una differenza principale tra l’unica area urbana “storica” e il resto della valle. Per quanto riguarda l’analisi della percezione dei problemi, sono emerse differenze legate alla zona di appartenenza esclusivamente tra la parte femminile del campione, mentre la parte maschile è risultata poco differenziata. Questo tipo di analisi applicata al caso di studio della valle del wadi Laou, oltre a integrare la formulazione di scenari futuri, come era obiettivo del progetto, ha sollevato alcune domande sull’applicazione pratica delle metodologie per l’integrazione di genere negli studi, e ha evidenziato la necessità di una notevole conoscenza pregressa del contesto, per la messa a punto di strategie di campionamento effettive.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


12/02/2009

Matching for Causal Inference Without Balance Checking (joint with G. King and G. Porro)

Stefano Iacus

We address a major discrepancy in matching methods for causal inference in observational data. Since these data are typically plentiful, the goal of matching is to reduce bias and only secondarily to keep variance low. However, most matching methods seem designed for the opposite problem, guaranteeing sample size ex ante but limiting bias by controlling for covariates through reductions in the imbalance between treated and control groups only ex post and only sometimes. (The resulting practical difficulty may explain why many published applications do not check whether imbalance was reduced and so may not even be decreasing bias.) We introduce a new class of ``Monotonic Imbalance Bounding'' (MIB) matching methods that enables one to choose a fixed level of maximum imbalance, or to reduce maximum imbalance for one variable without changing it for the others. We then discuss a specific MIB method called ``Coarsened Exact Matching'' (CEM) which, unlike most existing approaches, also explicitly bounds through ex ante user choice both the degree of model dependence and the causal effect estimation error, eliminates the need for a separate procedure to restrict data to common support, meets the congruence principle, is approximately invariant to measurement error, works well with modern methods of imputation for missing data, is computationally efficient even with massive data sets, and is easy to understand and use. This method can improve causal inferences in a wide range of applications, and may be preferred for simplicity of use even when it is possible to design superior methods for particular problems. We also make available open source software for R and Stata which implements all our suggestions.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


29/01/2009

Size, Innovation and Internationalization: A Survival Analysis of Italian Firms

Margherita Velucchi (Università di Firenze)

Firms’ survival is often seen as crucial for economic growth and competitiveness. This paper focuses on business demography of Italian firms, using an original database, obtained by matching and merging to gain the intersection three firm level datasets (ICE-Reprint, Capitalia, AIDA). This database allows us to simultaneously consider the effect of size, technology, trade, foreign direct investments, and innovation on firms’ survival probability. We use a semiparametric Cox Model to show that size and technological level positively affect the likelihood of survival. Internationalized firms show higher failure risk: on average competition is stronger in international markets, forcing firms to be more efficient. However, large internationalized firms are more likely to ‘survive’. An Italian internationalized firm to be successful and to survive, should be high-tech, large and innovative. (joint work with G. Giovannetti and G. Ricchiuti)

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


18/12/2008

A new approach to the measure of concentration: ABC (Area, Barycentre and Concentration)

Gustavo De Santis

Gini's index of concentration may be viewed from a different, and simpler angle, by considering where the barycentre falls in an ordered, but not cumulated distribution of the possessed quantitative variable (on the y axis) among owners (on the x axis - the poorest on the left, the richest on the right). The abscissa of the barycentre (relative to its maximum and minimum) provides a measure of concentration that coincides with Gini's G. Several empirical applications and a few theoretical considerations show that the ABC approach performs at least as well as - and sometimes better than - the traditional version of Gini's index.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


04/12/2008

The non-rejection rate for structural learning of gene transcriptional networks from E.coli microarray data

Alberto Roverato

Structural learning of transcriptional regulatory networks in silico using microarray data is an important and challenging problem in bioinformatics. Several solutions to this problem have been proposed both in a statistical approach and in a machine learning approach. Statistical approaches typically rely on Graphical models. They assume that the available data constitute a random sample from a multivariate distribution and aim at identifying a network where missing edges are interpreted as conditional independence relationships between genes. Also machine learning approaches usually describe their results by means of a network, but in fact the primary aim of such procedures is the identification of (some) transcriptional regulatory interaction with high confidence. Empirical evidence shows that in order to reach this task it is convenient to apply procedures to a compendium of microarray experiments. In a statistical approach, Castelo and Roverato (2006) proposed a procedure to learn a Gaussian graphical model from data. Here we show how this procedure can be extended in a meta-anlysis context where the available data are obtained by a compendium of different microarray experiments. This is a work in collaboration with Robert Castelo (Pompeu Fabra University, Spain).

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


20/11/2008

A new Data Mining approach for impact evaluation dealing with selection bias

Furio Camillo

This paper presents an algorithmic approach to deal with the selection bias problem. Selection bias may be the most important vexing problem in program evaluation or in any line of research that attempts to assert causality. The main problem of causal inference is essentially one of missing data. That is, in order to know whether some variable cause change in another variable for some unit, it is necessary to observe that unit in both its treated and untreated states. This observation is never possible. In other words, our missing data is the counterfactual outcome, defined as what would have happened in the absence of the intervention and vice versa. Researchers have taken various approaches to resolving the missing counterfactual problem: the most widely applied is the Potential Outcome Approach, pioneered principally by Rubin (1974; 1978) that attempts to address such selection bias via the propensity score (PS) method. PS is computed as a function of a covariates set potentially related to the selection process. In literature, it is very usual the operative use of the PS as a one-dimensional space. In a binning classical procedure, for example, treated and control units with similar propensity scores could be compared, and when the balancing property holds an unbiased estimation of the treatment effect can occur. But, it is not clear according to which fitting criterion choose the best model for PS. When some fitting criterion is maximized, such that a good model is found, the balancing property cannot be tested. Aiming at eliminating the PS tautology (Ho et al., 2007), our strategy explicitly excludes doing any analysis that requires access to any outcome data and any model for the selection mechanism. The fundamental belief underlying this paper is that any research influence affects the results, such that multiple solutions arrive simply by virtue of researchers choice of model. Large variation in estimates across choices of control variables, functional forms and other modeling assumptions cannot ensure objective results. In brief, the underlying paradigm is that the problem at hand should define the approach. Taking an automatic algorithmic approach and capitalizing on the known treatment-associated variance in the X matrix - no outcome in sight - we propose a data transformation that allows estimating unbiased treatment effects. The approach involves the construction of a multidimensional de-conditioned space in which the bias associated with treatment assignment has been eliminated. Then the missing counterfactual could be computed given that treated and control units are like arisen from the same population: no difference due to selection into treatment will exist anymore. The proposed approach does not call for modeling data, based on some underlying theory or assumption about the selection process, but instead it calls for using the existing variability within the data and letting the data to speak. Specifically, it is a two-stage procedure that involves the following: original pre-treatment variables are transformed using a specific eigenvalues and eigenvectors to derive a factorial de-conditioned space; and then the counterfactual is computed using as input the de-conditioned variables obtained in the previous stage.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA


06/11/2008

Living Standards and Fertility in Indonesia: A Bayesian Analysis

Alessandra Mattei

We investigate the relationship between living standards and fertility, using a three-wave panel dataset from Indonesia to provide information on women's fertility histories and the levels of consumption expenditure in the households to which they belong. We adopt a Bayesian approach to estimation and exploit the dynamically recursive structure implied by gestation lags to identify causal eects of living standards on fertility and vice versa.

Torna alla lista dei seminari del Dipartimento di Statistica precedente al DiSIA