This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Known Issues and Limitations
What Difficulties May Be Encountered While Interpreting EM-DAT Data?
EM-DAT is the only comprehensive, free-access disaster loss database with effective global coverage. However, it has limitations due to the limited number of sources and limitations related to how effectively disasters are reported worldwide. This can lead to biases in the data over which CRED may have limited control, and that could be overlooked in the literature. Nevertheless, EM-DAT remains a key resource for understanding disaster events and impacts. No current impact database is completely accurate. The United Nations emphasizes the importance of global improvements in documenting disasters in global agendas such as the SENDAI Framework for Disaster Risk Reduction (SFDRR).
Understanding the limitation of a dataset such as EM-DAT is of paramount importance for those who wish to adequately use the data and mitigate its weaknesses for the following purposes: disaster risk management, emergency planning, scientific research, and raising public awareness.
1 - General Issues
Understanding Broad Data Quality Concerns in EM-DAT
Three types of data quality issues can be considered:
Types of Data Quality Issues
- Disaster events that are missing in EM-DAT.
- Disaster events that exist but that have missing values, e.g., for the impact variables.
- Disaster events that are well documented but with attributes that are inaccurate or differ from other sources.
A cross-comparison of EM-DAT with a local database and/or a disaster-specific database can help identifiy Issue 1 (e.g., Koç & Thieken, 2018; Lin et al., 2021). For an account of missing values for existing events, we refer to Jones et al. 2021 and the section on Accounting Biases. Issue 3 is partially related to the data collection sources, protocols, or reporting systems generally used by different databases.
Data quality issues within EM-DAT are related to the data collection protocols from dedicated sources. EM-DAT’s completeness reflects the coverage of its sources. Since source reporting has improved over the years, EM-DAT data coverage has improved significantly over the last 30 to 40 years. Nevertheless, gaps and quality issues remain. EM-DAT protocols are meant to guide the way information is monitored and collected from sources. However, no universally applied protocol ensures that different sources report disaster impact and losses using the same guidelines to define, for instance:
- the beginning and end of disaster events.
- the geographical footprint of a disaster.
- impact variables such as deaths (in particular, when computed based on excess mortality), affected people, or economic costs.
- the disaster type selected by the sources.
Some references illustrate the issues and challenges related to collecting and maintaining a disaster database, e.g., Guha-Sapir & Misson 1992, Kron et al. 2012, and Wirtz et al. 2014.
To some extent, EM-DAT owes its popularity to its simplicity. It reports disaster events as rows in an Excel table. However, this simplicity comes at the cost of conceptual limitations in dealing with complex and compound events and situations. In such cases, as exemplified in the box below, EM-DAT will probably report the disaster in the same way as the source which presented it. The EM-DAT database manager can only choose to select some numbers over some others (see Daily Encoding). However, no model is involved in correcting differences in reporting protocols because this task goes beyond the information monitoring conducted at the CRED by the EM-DAT team.
Fictive Example of Disaster Complexity
- If a source reports a heatwave with a certain number of deaths, EM-DAT is likely to record it as such.
- If the same heatwave hits a neighboring country, whose institutions have different reporting protocols, EM-DAT will also report the disaster entry based on the source’s numbers.
- Since the protocols are different, this will create a systemic bias in EM-DAT.
- The event duration may be misaligned; some may have accounted for co-occurring effects, such as droughts, wildfires, and air pollution, in the estimation of the loss statistics (e.g., deaths, affected people, or costs).
- In some cases, the main type could even be different. In databases other than EM-DAT, the event and the numbers may also have another representation.
Such biases that result from differences in the impact reporting systems were generally referred to by Gall et al. 2009 as systemic biases. Some studies point to systemic biases by highlighting that EM-DAT does not correlate well with other databases (e.g., Moriyama et al., 2018; Panwar & Sen, 2020). In their article, Gall et al. 2009 cover four other types of biases: time, hazard-related, spatial, and accounting biases. These are illustrated in the next sections.
2 - Specific Biases
Understanding Particular Data Quality Concerns in EM-DAT
Time Bias
Time biases result from unequal reporting quality and coverage over time. The figure below shows the occurrence of disasters in EM-DAT. The figure shows a significant increase that starts in the 1960s. This increase coincides with the creation of OFDA. In 1973, OFDA started compiling disaster data, and the CRED was created. In 1988, the CRED took over the disaster database and created EM-DAT. In the meantime, communication technologies have improved, with the first personal computers and satellites appearing in the 1970s and the advent of the World Wide Web in the 1990s (see also History of EM-DAT)
Technologies and initiatives can be considered responsible for the dominant trend observed. Therefore, it is challenging to infer insight into the actual drivers of disasters such as climate change, population growth, or disaster risk management. Accordingly, excluding pre-2000 data from trend analyses based on EM-DAT is strongly recommended. From September 2023 onward, the CRED refers to pre-2000 data as Historic
data in the EM-DAT Public Table.
Hazard-related biases result from unequal reporting quality and coverage for different hazard types. For example, in EM-DAT, data related to biological hazards (e.g., epidemics) and extreme temperature hazards (e.g., heat waves) are less covered and the cover of lower quality. Some hazard-related biases are illustrated in the Accounting Biases and Geographic Biases sections.
Threshold Biases
Threshold biases result from unequal reporting quality and coverage for disasters of different magnitudes. High-impact disasters attract more attention, resulting in better media coverage and reporting. This could lead to threshold biases in EM-DAT. The EM-DAT entry criteria introduce a kind of threshold bias, as shown in the figure below regarding disaster mortality, while some studies have shown locally that small disasters may have a high cumulative impact, e.g.. Regarding disasters that fit EM-DAT’s entry criteria, it is fair to assume that disasters close to the entry criteria are more likely to be missing than high-impact disasters. However, as shown in the figure below, the cumulative mortality associated with low-mortality events exceeds the cumulative impact of higher-mortality events.
Accounting Biases
Accounting biases result from unequal reporting quality and coverage for different impact variables. For instance, in EM-DAT, the economic losses are, on average, less frequently reported than the human impact variables, which may also depend on the hazard type (see the figure below). Furthermore, insured damages are naturally more reported than uninsured damages, which creates a geographic bias where there is a lack of insurance coverage, as in Africa. For droughts, EM-DAT fails to capture the associated mortality because it is overlooked as an indirect impact, as evidenced by the UNDRR 2021 Global Assessment Report on Droughts.
Besides, it cannot be assumed that because an impact is reported in EM-DAT, there is no accounting bias. In general, direct impacts are often reported by EM-DAT sources, while indirect impact estimates are less available. For instance, indirect deaths for a flood event correspond to the number of fatalities occurring during the event, while indirect deaths result from disease outbreaks due to deteriorated sanitary conditions. Yet, indirect mortality is sometimes more important than direct mortality, for examples.
Geographic Biases
Geographic biases result from unequal reporting quality and coverage across space. In general, EM-DAT has a relatively worse coverage for Sub-Saharan Africa regarding the occurrence and the accounting of impact variables. Any disaster type may be subject to geographic biases in EM-DAT as there may be discrepancies between reporting systems from one country to another (see General Issues).
This issue is particularly pronounced regarding heat waves, as shown in the figure below.
Heat waves are often overlooked in Sub-Saharan Africa. About 52% of heatwave events in EM-DAT occurred in nine countries: Japan, India, Pakistan, the USA, followed by Western European countries (France, Belgium, United Kingdom, Spain, and Germany).
References