ECONOMICS WORKING PAPER SERIES
EXTENDING THE CEPII GRAVITY DATA SET

Tamara Gurevich
Peter Herman
Serge Shikher
Ricky Ubee
Working Paper 2017–01–A
500 E Street SW
Washington, DC 20436
January 2017

Tamara Gurevich, U.S. International Trade Commission
Tamara.Gurevich@usitc.gov
Peter Herman, U.S. International Trade Commission
Peter.Herman@usitc.gov
Serge Shikher, U.S. International Trade Commission
Serge.Shikher@usitc.gov
Ricky Ubee, U.S. International Trade Commission
Ravinder.Ubee@usitc.gov

## 1 Introduction

In some cases, such as comlang_ethno or conflict, time insensitivity is debatable, since linguistic composition of counties, as well as their conflict status, may change over time. Here, we have elected to use the 2006 values present in the original CEPII database. In most cases, we have done so because the original database has treated them as being time-invariant. In other cases, such as the variable for conflict, it is unclear what criteria the construction of the original variable followed, making a consistent extension difficult. Additionally, we have chosen not to introduce new countries that have gained independence after 2006 in order to preserve the balanced panel of 224 countries in the original database.

## 2 Methodology for Updated Variables

### 2.3 RTAs and Currency Unions

We update CEPII variables rta and comcur using a data set created by Egger and Larch (2008) based on the list of trade agreements provided by the WTO. However, the WTO list of trade agreements and common currencies appears to be inconsistent with the CEPII original definitions of rta and comcur, as evident from a rather large increase in the number of RTAs and common currency unions between the years 2006 and 2007. For example, the final year of the original CEPII database, 2006, exhibits 2,734 country-pairs in an RTA. The data collected by Egger and Larch (2008), used here to extend the CEPII data set, shows 6,470 country-pairs in an RTA in 2006, suggesting a significant disagreement between the two data sources as to what qualifies as an RTA.
In light of this discrepancy, we create two additional variables, rta_larch and comcur_larch, which replace the CEPII variable with the WTO/Larch data for all years rather than only 2007–2015. The rta and comcur variables take values provided by CEPII for the years 1948–2006 and values provided by the WTO/Larch for the years 2007–2015.