Report: Kyoto University School of Public Health International Lecture: Data Harmonization in Cohort Studies
In the afternoon of July 18, 2023, an international lecture on “Data Harmonization in Cohort Studies” was held at the School of Public Health, Graduate School of Medicine, Kyoto University (KUSPH). About 30 students and faculty members from KUSPH participated in this international lecture, which was held in a hybrid format, on-site and via ZOOM. Prof. Albert Sanchez-Niubo of the University of Barcelona was invited as a guest speaker. The purpose of this international lecture was to introduce the methods of data harmonization in cohort studies with research examples.
First, Prof. Sanchez-Niubo briefly introduced the reasons for doing data harmonization, and the recent research projects using data harmonization as well as explained the differences between the harmonization and the integration. The reason of doing harmonization is that the rapid development of computing and communications in recent years has allowed to encourage new projects to combine large datasets from different cohort studies. There are some statistical advantages such as increase of sample size and statistical power, improvement of generalizability and reproducibility of the results, and increase of heterogeneity of effects due to a greater diversity among participants. Thus, more ambitious research questions can be proposed by using harmonization.
The harmonization methodology presented in this seminar is the result of the work conducted as part of a European project called SYNCHROS, funded by the Horizon 2020 research program. The overall objective of the project was to coordinate and support synchronizing cohorts and population surveys in Europe and worldwide.
Next, he talked about the concrete methods of harmonization. When a research project plan to combine data from different cohorts, it should follow four stages. The first stage is to establish the strategy, which he differentiated between prospective, retrospective ex-ante and retrospective ex-post strategies. Prospective is typical for multi-centre studies. When using this strategy, all studies should share the same study design, survey and meta-data. When using the Retrospective Ex-ante harmonization, studies were conducted independently. However, sharing similar standard collection tools and operating procedures. On the other hand, when using Retrospective Ex-post strategy, studies are not designed to be comparable. It uses different standard collection tools and operating procedures. It requires meticulous data processing procedures to achieve homogeneity. Regardless of the strategy, he stressed the importance of designing a DataSchema, a list of potential variables to hold in the final dataset drawn from the research questions of the research project. Next stage is the harmonization process where DataSchema is evaluated across cohorts. There are five types of data processing: 1. Algorithmic transformation, 2. Simple calibration model, 3. Standardization model, 4. Latent variable model, and 5. Multiple imputation model. Third stage is to determine the infrastructure where harmonized data is organized. There are two systems, Centralized Data System and Federated Data System. The final stage is how to analyze harmonized data, and there are three types. Meta-analysis is the results of multiple studies addressing the same variable combined. Pooled analysis is an analysis that can be carried out at individual-level after pooling data. Federated analysis is a centralized analysis with individual-level data remaining on their local servers. In Federated analysis, DataSHIELD can be used.
In the second half of the presentation, he explained about ATHLOS study. The ATHLOS Project is a five-year project funded by the European Union’s Horizon 2020. Its main task was to create a harmonized measure of healthy ageing with the aim to identify the trajectories and determinants of healthy ageing. In order to create the harmonized measure of healthy aging, data from 18 international longitudinal studies were harmonized. ATHLOS study used a mixture of Ex-ante and Ex-post retrospective harmonization. He explained about the detail of the stages of data harmonization.
At last, he gave three pieces of advice on data harmonizing. First is to understand input data about what and how data was collected and quality of study-specific data. Second is to ensure rigour in the systematic harmonization process and quality control. Because of existing the heterogeneity of the studies, forcing too much harmonization can produce variables with too much bias. Therefore, we can find variables that should not be harmonized in all studies. Third is to ensure proper documentation. In order to support reproducibility and long-term usage, detailed harmonization reports should be created.
After the lecture, students asked questions about the data harmonization of depression scale and the way of using data.
At the end of the lecture, Professor Kondo, Professor of Social Epidemiology at KUSPH, appreciated Prof. Sanchez-Niubo for teaching us the ways of harmonizing data of the cohort studies. It is good news that we have a lot of resources that is useful for data harmonization.