COVID-19++: A Citation-Aware Covid-19 Dataset for the Analysis of Research Dynamics

Published in 2021 IEEE International Conference on Big Data (Big Data), 2021

Contribution to the workshop on Big Data Analysis for COVID-19 at IEEE Big Data.

Code available here

Research conducted in context of the Q-AKTIV project.

Abstract: COVID-19 research datasets are crucial for analyzing research dynamics. Most collections of COVID-19 research items do not to include cited works and do not have annotations from a controlled vocabulary. Starting with ZB MED KE data on COVID-19, which comprises CORD-19, we assemble a new dataset that includes cited work and MeSH annotations for all records. Furthermore, we conduct experiments on the analysis of research dynamics, in which we investigate predicting links in a co-annotation graph created on the basis of the new dataset. Surprisingly, we find that simple heuristic methods are better at predicting future links than more sophisticated approaches such as graph neural networks.

@inproceedings{galke2021covid19pp,
    author={Galke, Lukas
        and Seidlmayer, Eva
        and Lüdemann, Gavin
        and Langnickel, Lisa
        and Melnychuk, Tetyana
        and Förstner, Konrad U.
        and Tochtermann, Klaus
        and Schultz, Carsten},
    booktitle={2021 IEEE International Conference on Big Data (Big Data)},  
    title={COVID-19++: A Citation-Aware Covid-19 Dataset for the Analysis of Research Dynamics},
    year={2021},
    volume={}, 
    number={}, 
    pages={4350-4355}, 
    doi={10.1109/BigData52589.2021.9671730}}