Tamara Gurevich
Peter Herman
Serge Shikher
Ricky Ubee
Working Paper 2017–01–A
500 E Street SW
Washington, DC 20436
January 2017
Special thanks to Nabil Abbyad and Grace Kenneally for research assistance with this working paper.
Office of Economics working papers are the result of ongoing professional research of USITC Staff and are solely meant to represent the opinions and professional research of individual authors. These papers are not meant to represent in any way the views of the U.S. International Trade Commission or any of its individual Commissioners. Working papers are circulated to promote the active exchange of ideas between USITC Staff and recognized experts outside the USITC and to promote professional development of Office Staff by encouraging outside professional critique of staff research.

Extending the CEPII Gravity Data Set
Tamara Gurevich, Peter Herman, Serge Shikher, and Ricky Ubee
Office of Economics Working Paper 2017–01–A
This research note describes in detail the process and data sources used in the CEPII gravity data set update for the years 2007 to 2015. This data update preserves the nomenclature and the structure of the original CEPII data for easier integration into ongoing studies.
Tamara Gurevich, U.S. International Trade Commission
Peter Herman, U.S. International Trade Commission
Serge Shikher, U.S. International Trade Commission
Ricky Ubee, U.S. International Trade Commission


The gravity data set made available by CEPII (Head et al., 2010) has become a mainstay in trade research.
However, the data set has not been updated since 2006. This research note describes the process and data sources used in constructing an update that spans the years 2007–2015. We provide two data files, one containing only the updated variables for the years 2007–2015, and another containing all years of original CEPII and the updated years, spanning the entire period 1948–2015.
This update preserves the nomenclature and structure of the original CEPII data set. The extension is constructed by first creating a balanced panel of bilateral relations between the 224 countries present in the original CEPII database for the years 2007–2015. We then separate CEPII variables into time varying and time invariant, consistent with the original CEPII definitions. For all but a handful of variables described in detail below, we assume the variables are time-invariant and assign the corresponding values from 2006 in the original CEPII data set to all subsequent years. Several variables that are expected to be significantly time-varying (GDP, GDP per capita, population, RTAs, common currency, and GATT/WTO membership) are updated with the appropriate data for each year, subject to data availability. For details on time-varying and time-invariant variables see tables 2 and 3 in the data appendix. The updates to time-varying variables are discussed in greater detail in the following section.
In some cases, such as comlang_ethno or conflict, time insensitivity is debatable, since linguistic composition of counties, as well as their conflict status, may change over time. Here, we have elected to use the 2006 values present in the original CEPII database. In most cases, we have done so because the original database has treated them as being time-invariant. In other cases, such as the variable for conflict, it is unclear what criteria the construction of the original variable followed, making a consistent extension difficult. Additionally, we have chosen not to introduce new countries that have gained independence after 2006 in order to preserve the balanced panel of 224 countries in the original database.

Methodology for Updated Variables

2.1 GDP and GDP per capita

We use data from the World Bank's World Development Indicators (WDI) database to update GDP and GDP per capita of both trading partners.
The CEPII variables we are updating are gdp_o, gdp_d, gdpcap_o, and gdpcap_d. The values in these variables are measured in current U.S. dollars, consistent with the original CEPII data.

2.2 Population

Population data come from the WDI database described above. We update the variables pop_o and pop_d for each year in the period 2007–2015.

2.3 RTAs and Currency Unions

We update CEPII variables rta and comcur using a data set created by Egger and Larch (2008) based on the list of trade agreements provided by the WTO.
However, the WTO list of trade agreements and common currencies appears to be inconsistent with the CEPII original definitions of rta and comcur, as evident from a rather large increase in the number of RTAs and common currency unions between the years 2006 and 2007. For example, the final year of the original CEPII database, 2006, exhibits 2,734 country-pairs in an RTA. The data collected by Egger and Larch (2008), used here to extend the CEPII data set, shows 6,470 country-pairs in an RTA in 2006, suggesting a significant disagreement between the two data sources as to what qualifies as an RTA.
In light of this discrepancy, we create two additional variables, rta_larch and comcur_larch, which replace the CEPII variable with the WTO/Larch data for all years rather than only 2007–2015. The rta and comcur variables take values provided by CEPII for the years 1948–2006 and values provided by the WTO/Larch for the years 2007–2015.

2.4 WTO Membership

We update the variables gatt_o and gatt_d for countries that joined the WTO after 2006 using the WTO’s membership timeline.
The gatt variables were updated to take the value of one for all years greater than or equal to the year in which they joined the WTO. For the full list of countries and accession dates see table 1 in the Data Appendix below.


This research note provides documentation for our updated CEPII data set that extends the set of gravity variables from 2006 to 2015. We attempt to do so in a way that preserves consistency with the original CEPII data set and readily permits its usage in gravity research, past and present. In particular, we update variables for GDP, GDP per capita, population, and WTO membership. Time-invariant variables are defined so as to be consistent with the original CEPII data.
We are faced with a greater challenge when updating rta and comcur variables. CEPII does not provide documentation describing origins of these two variables, and they appear inconsistent with the WTO list of accepted trade agreements and common currency unions. To reconcile this issue, we use the WTO data for years 2007–2015 and provide two additional variables that are WTO-consistent, one for the trade agreements and one for the common currency unions.
We expect that this updated CEPII data will prove helpful to researchers at USITC and to trade researchers and gravity modelers outside the USITC.

Data Appendix

On request, the authors can provide a data archive that consists of several source files, a code to combine the data, and two final data sets.
The data were generated using the R code “Cepii Extender v1.R" that requires the following files as inputs:
The GDP and GDP per capita variables gdp_o, gdp_d, gdpcap_o, and gdpcap_d come from the World Bank's WDI data set. They are all measured in current U.S. dollars and can be found in the file “GDPs_ _CurrentUSdollars.csv"
The population variables pop_o and pop_d were updated using figures from the World Bank's WDI. The file “WDI_population.dta" contains these data.
The variables rta and comcur were updated using data made available by Mario Larch.
Larch's data are in the file “rta_20160215_stata12.dta".
WTO Membership was determined based on the list provided by the WTO.
The new entrants were:
Table 1: New Entrants
Table 2: CEPII Gravity Time-Invariant Variables
Table 3: CEPII Gravity Time-Varying Variables


Egger and Larch 2008Egger, Peter H and Larch, Mario, "Interdependent Preferential Trade Agreement Memberships: An Empirical Analysis", Journal of International Economics 72, 2 (2008), pp. 384--399.
Head et al. 2010Head, Keith and Mayer, Thierry and Ries, John, "The erosion of colonial trade linkages after independence", Journal of International Economics 81, 1 (2010), pp. 1--14.