General Issues

Understanding Broad Data Quality Concerns in EM-DAT

Three types of data quality issues can be considered:

Types of Data Quality Issues

Disaster events that are missing in EM-DAT.
Disaster events that exist but that have missing values, e.g., for the impact variables.
Disaster events that are well documented but with attributes that are inaccurate or differ from other sources.

A cross-comparison of EM-DAT with a local database and/or a disaster-specific database can help identifiy Issue 1 (e.g., Koç & Thieken, 2018¹; Lin et al., 2021²). For an account of missing values for existing events, we refer to Jones et al. 2021³ and the section on Accounting Biases. Issue 3 is partially related to the data collection sources, protocols, or reporting systems generally used by different databases.

Data quality issues within EM-DAT are related to the data collection protocols from dedicated sources. EM-DAT’s completeness reflects the coverage of its sources. Since source reporting has improved over the years, EM-DAT data coverage has improved significantly over the last 30 to 40 years. Nevertheless, gaps and quality issues remain. EM-DAT protocols are meant to guide the way information is monitored and collected from sources. However, no universally applied protocol ensures that different sources report disaster impact and losses using the same guidelines to define, for instance:

the beginning and end of disaster events.
the geographical footprint of a disaster.
impact variables such as deaths (in particular, when computed based on excess mortality), affected people, or economic costs.
the disaster type selected by the sources.

Some references illustrate the issues and challenges related to collecting and maintaining a disaster database, e.g., Guha-Sapir & Misson 1992⁴, Kron et al. 2012⁵, and Wirtz et al. 2014⁶.

To some extent, EM-DAT owes its popularity to its simplicity. It reports disaster events as rows in an Excel table. However, this simplicity comes at the cost of conceptual limitations in dealing with complex and compound events and situations. In such cases, as exemplified in the box below, EM-DAT will probably report the disaster in the same way as the source which presented it. The EM-DAT database manager can only choose to select some numbers over some others (see Daily Encoding). However, no model is involved in correcting differences in reporting protocols because this task goes beyond the information monitoring conducted at the CRED by the EM-DAT team.

Fictive Example of Disaster Complexity

If a source reports a heatwave with a certain number of deaths, EM-DAT is likely to record it as such.
If the same heatwave hits a neighboring country, whose institutions have different reporting protocols, EM-DAT will also report the disaster entry based on the source’s numbers.
Since the protocols are different, this will create a systemic bias in EM-DAT.
The event duration may be misaligned; some may have accounted for co-occurring effects, such as droughts, wildfires, and air pollution, in the estimation of the loss statistics (e.g., deaths, affected people, or costs).
In some cases, the main type could even be different. In databases other than EM-DAT, the event and the numbers may also have another representation.

Such biases that result from differences in the impact reporting systems were generally referred to by Gall et al. 2009⁷ as systemic biases. Some studies point to systemic biases by highlighting that EM-DAT does not correlate well with other databases (e.g., Moriyama et al., 2018⁸; Panwar & Sen, 2020⁹). In their article, Gall et al. 2009⁷ cover four other types of biases: time, hazard-related, spatial, and accounting biases. These are illustrated in the next sections.

Koç, G., and Annegret H. T. “The Relevance of Flood Hazards and Impacts in Turkey: What Can Be Learned from Different Disaster Loss Databases?” Natural Hazards 91, No. 1 (2018): 375408. https://doi.org/10.1007/s11069-017-3134-6. ↩︎
Lin, Y. C., Khan, F., Jenkins, S. F. and Lallemant, D. “Filling the Disaster Data Gap: Lessons from Cataloging Singapore’s Past Disasters.” Int. J. Disaster Risk Sci. 12, 188–204 (2021). https://doi.org/10.1007%2Fs13753-021-00331-z. ↩︎
Jones, R. L., Guha-Sapir, D., and Tubeuf, S.: “Human and economic impacts of natural disasters: can we trust the global data?”, Sci Data, 9, 572 (2022). https://doi.org/10.1038/s41597-022-01667-x. ↩︎
Guha-Sapir, D. and Misson, C.: “The Development of a Database on Disasters.”, Disasters, 16, 74–80 (1992), https://doi.org/10.1111/j.1467-7717.1992.tb00378.x. ↩︎
Kron, W., Steuer, M., Löw, P., and Wirtz, A. “How to Deal Properly with a Natural Catastrophe Database – Analysis of Flood Losses.” Natural Hazards and Earth System Sciences 12, No. 3,53550 (2012). https://doi.org/10.5194/nhess-12-535-2012. ↩︎
Wirtz, A., Kron, W., Löw, P., and Steuer, M. “The Need for Data: Natural Disasters and the Challenges of Database Management”. Natural Hazards 70, No. 1, 13557 (2014). https://doi.org/10.1007/s11069-012-0312-4. ↩︎
Gall, M., Kevin A. B., and Susan L. C. “When Do Losses Count?: Six Fallacies of Natural Hazards Loss Data.” Bulletin of the American Meteorological Society 90, No. 6,799810 (2009). https://doi.org/10.1175/2008BAMS2721.1. ↩︎ ↩︎
Moriyama, K., Daisuke S., and Yuichi O. “Comparison of Global Databases for Disaster Loss and Damage Data.” Journal of Disaster Research 13, No. 6, 100714 (2018). https://doi.org/10.20965/jdr.2018.p1007. ↩︎
Panwar, V. and Subir S. “Disaster Damage Records of EM-DAT and DesInventar: A Systematic Comparison.” Economics of Disasters and Climate Change 4, No. 2, 295317 (2020). https://doi.org/10.1007/s41885-019-00052-0. ↩︎