You are here

Frequently Asked Questions (FAQs) about Data


How to reduce the size of the ITPD-E?

You can remove the long string variables to reduce the size of the database in memory and on disk. For example, the complete database in Stata's .dta file format will be about 6.8GB. If long string variables (country names, industry name) are removed, the resulting .dta file can be as small as 1.1GB. Country and industry names will still be indicated by country and industry codes.

How to merge the ITPD-E with DGD?

The steps are (1) read in DGD for 1993-2004, (2) append DGD for 2005-2016, (3) rename all instances of ROM to ROU in DGD, (4) merge ITPD-E with DGD on exporter_iso3==iso3_o, importer_iso3==iso3_d, and year. 



How many countries and years does the dataset cover?

The dataset provides annual records beginning in 1948 through 2016. It provides information for a total of 285 countries and territories. Within each year, however, the total number of countries varies based on historical changes to the global landscape such as the merging or separation of countries.  For example, Slovakia will not appear in the data until the year in which it gained independence.

Does it include trade data?

No. The dataset was designed to be a companion for trade data but does not include any of that information itself.  While constructing the Dynamic Gravity dataset, we worked to insure that it would match easily and without significant effort to standard sources of bilateral trade data---namely Comtrade data.  In most cases, the iso3_o and iso3_d codes should match without issue to bilateral trade data (see below for the exceptions).

Will the dataset be updated in the future?

We intend to update the dataset regularly with recently reported data. Updates for some variables are contingent on the availability of third-party sources.

How do I keep track of the dataset versions?

  • The initial release of the dataset is version 1.0. The documentation associated with that release is documentation version 1.00.
  • Minor updates to the documentation associated with the dataset version 1.0 will receive version numbers 1.01, 1.02, etc.
  • Minor updates to the dataset version 1.0 will receive version numbers 1.1, 1.2, etc. Documentation associated with these updates to the dataset will be numbered version 1.10, 1.11 ... 1.20, 1.21, etc.
  • Major updates to the dataset will be released as version 2.0, 3.0, etc with the associated documentation numbered version 2.00, 2.01, etc, followed by version 3.00, 3.01, etc.

Can I use the dataset with Stata?

Yes. The dataset is stored in comma-delimited format. Stata's import delimited command allows Stata to read comma-delimited files with ease.

How do I merge your data with Comtrade or WITS trade data?

In most cases, the iso3_o and iso3_d codes should match without issue to bilateral trade data. The combination of iso3_o, iso3_d, and year uniquely identify an observation in the dataset. However, the table below describes several countries that we are aware of with different letter codes in the Dynamic Gravity, Comtrade, and WITS datasets.

Country name

Dynamic Gravity



Congo, Dem. Rep. of the

ZAR (1971-1997)

COD (1998-2016)



East Timor TLS TLS TMP
Montenegro MNE MNE MNT
Neutral Zone NTZ n/a NZE
Pacific Islands PCI PCI PCE


ROM (1948-2001)

ROU (2002-2016)



Sikkim SKM n/a SIK
Taiwan TWN 490 OAS
U.S. Miscellaneous Pacific Islands PUS n/a USP
Vietnam, South VNM VNM SVR
Yemen, South YMD YMD YDR

How do I merge this dataset with other datasets?

The Dynamic Gravity dataset uses iso alpha-3 codes to identify countries. A list of country names, iso alpha-3 codes, and years of existence of each country in the dataset can be downloaded here. Note that many international organizations such as the International Monetary Fund, the World Bank, and the United Nations, among others, use their own codes to identify countries that may not always match perfectly to the Dynamic Gravity dataset.

What is the difference between the identifiers iso3 and dynamic_code?

Both codes provide methods for identifying countries within the dataset.  Both iso3 and dynamic_code (combined with year) uniquely identify records.  The iso3 variables use the standard International Organization of Standards (IOS) ISO alpha-3 codes and are likely the most useful identifier for most purposes, such as merging with Comtrade data.  The dynamic_code variables provide a slight refinement to the ISO codes by tracking situations in which countries undergo a significant alteration to their geography and composition but retain the same ISO code.  In these cases (such as Pakistan's split into Pakistan and Bangladesh), the dynamic_code notes this change by altering the code to reflect the difference (post-split Pakistan is given the dynamic_code "PAK.X").

What countries should be given special consideration when merging with other data?

Democratic Republic of the Congo: The country of Democratic Republic of the Congo was known as Zaire between 1971 and 1997 and used iso alpha-3 code “ZAR” during that period. Starting in 1998, the country was renamed to its current name and changed its iso alpha-3 code to “COD”. When combining with other data, it is worthwhile to check what code(s) the other dataset uses to insure that the Democratic Republic of the Congo is able to match.

Romania: The country of Romania has used two different ISO alpha-3 codes over time: "ROM" and "ROU".  The first code, "ROM" was used until 2001. Beginning in 2002, the code "ROU" was used instead.  The Dynamic Gravity dataset follows this standard, using "ROM" for 1948-2001 and "ROU" for 2002-present. When combining with other data, it is worthwhile to check what code(s) the other dataset uses to insure that Romania is able to match.

How does the Dynamic Gravity Dataset differ from the one provided by CEPII?

The Dynamic Gravity dataset seeks to make several improvements over the commonly used dataset made available by CEPII. Specifically, we have aimed to closely follow the ways in which countries and borders have changed between 1948 and 2016, provide more information about countries and their relationships, and have greater transparency with respect to sources and methods underlying the construction of the variables. For more details, please see the paper, which provides a more in-depth discussion of the differences and similarities between the two datasets.