Encoding, Quality Control, and Validation Procedure

How EM-DAT Data Is Encoded and Controlled?

Source identification and collection can be facilitated and partially automated thanks to online services offered by specific sources (e.g., email alert systems, news feeds, or APIs). However, data collection and encoding are always supervised manually by the database manager. The database manager controls the sources that are selected, the classification of the event, its spatiotemporal delimitation, and the identification of the impact figures. Data encoding and validation in EM-DAT is a three-step process:

In addition, Automated Procedures and Constraints prevent or check for abnormal values or data formats in the database.

Daily Encoding

The database manager checks daily information using the preferred source list (see EM-DAT sources). Whenever a new disaster is identified based on a source, it is added to the database. In the first stage, the event is not made public. It becomes so when an entry criterion is met and confirmed by at least two sources. The figures remain subject to changes. Any publication or modification made on a public disaster entry will be visible to the user after the weekly update routine. This routine is usually executed on mondays but may be triggered by the manager if deemed necessary, e.g. in the case of faulty figures or typos.

The published impact variables could be selected from one or more sources. An event can therefore be validated from several sources of information. For example, the human impact can come from an OCHA report and the economic data from a reinsurance report, depending on specific expertise. If the figures differ between the sources, the database manager decides which ones to attribute to the disaster. The choice depends on several elements: the figure itself and the area and period to which it refers, the sources’ chronology, and its degree of reliability. Because this task is complex and case-dependent, there is no pre-determined rule for selecting figures, and the database manager makes the final choice. Some examples of general, however, not systematic, decision rules are illustrated in the box below.

The rules here are only informal and the database manager may disregard them. For example, suppose a report mentions 200 deaths and a news article says, “regional authorities estimate the death toll now stands at 243 dead and missing”. Although this is a newspaper article, the precision of the statement suggests that this figure is more reliable than the one in the official report.

Quality Control and Annual Validation

Quality control and annual validation are systematic checks of all the entries starting in a specific year. It typically takes place at the beginning of the following year. During this validation, all the disasters that took place in the previous year are reviewed to consolidate the data, identify possible additional sources, and modify the published figures accordingly. In addition, the georeferencing, i.e., the more precise attribution of the disaster to GAUL level 1 or 2 zones, is also finalized during the annual validation period (see GAUL Index and Admin Levels).

Thematic Reviews

The CRED periodically conducts thematic reviews of disasters to mitigate the database’s weaknesses (see Known Issues). This task involves systematically checking entries for a type of disaster over a given period or region. The revision can be a data analysis for further quality control, a systematic review of the scientific literature, or a comparison with other existing databases.

From September 2023 onward, these substantial database content updates are planned to be notified on the EM-DAT website and in the documentation release notes for tracking purposes (see Introduction). The Entry Date and Last Update column have also been introduced in the EM-DAT Public Table.

Automated Procedures and Constraints

In 2023, the constraints of the EM-DAT database have been strengthened. These constraints define the domain of values that can be encoded and, if correctly set, can prevent encoding errors in the value or its format. In addition, there are automated procedure checks for anomalies that are likely to be an error, i.e., those without enough certainty to be a constraint but with sufficient likelihood to be notified to the database manager for verification.

Currently, constraints and automated routines check for the consistency of date and time fields, latitude and longitude values, and hazard magnitude values. These procedures are implemented incrementally each time an error that could be prevented is discovered. Hence, by detecting and reporting issues, users may contribute to developing these routines and improving EM-DAT data quality.

Issue Reporting

The CRED encourages any external initiative that can help us improve our data quality. Any problem with the data content (such as errors, or missing data) can be reported to CRED by email using the contact address (see Send Us an Email). For missing entries, you should be aware that the CRED only publishes disasters for which reliable figures are available and when these are corroborated by at least two sources (see Daily Encoding). The CRED reserves the right to modify the data according to subsequent notifications. For issues concerning specific disaster events, the problem should be explained and notified with a list including the related Dis No. values (see Column Description).

For less specific issues, users who have analyzed the quality of EM-DAT are encouraged to share their reports or scientific studies with the EM-DAT team to improve the database (see Contributing).