Alternative Data in Emerging Markets Asia

by Feb 20, 2022Blog, Featured


Emerging Markets Asia, and China, in particular, have become a hotbed of activity for alternative data. At Eagle Alpha, we are seeing rocketing demand for data from the region. Initially, this demand was from international managers looking to better understand dynamics on the ground in the region, but increasingly we are seeing local managers show more interest in alternative data. We are seeing the supply side of the market wakening up to this increased thirst for data. The data market in EM Asia is growing in size and sophistication.

Interest in China datasets is now almost on a par with other regions, based on analysis of activity on Eagle Alpha’s platform. EM Asia datasets received 23% fewer views on the Eagle Alpha platform than the “average” dataset on the platform, indicating this region is still trailing other regions in terms of investor interest.

However, challenges persist when working with data in China and EM Asia, including data quality issues and connecting with vendors. But first, we’ll explore the availability of data in EM Asia.

Data Availability

Figure 2 illustrates the most common dataset categories taken from our Eagle Alpha’s portal for sourcing alternative datasets. Web crawling data providers dominate the alternative data space in China. This includes raw web-scraped data, but also derived datasets such as pricing and sentiment. From our conversations with data buyers, there are a large amount of data vendors providing similar datasets in China (for example, eCommerce data), with the dataset price, quality, coverage, and delivery time the variables that create differences in the market.

Figure 1: Most common dataset categories (Source: Eagle Alpha)

Outside of China, the relatively small number of vendors in the emerging market Asian region makes further dissection of the data tricky. ESG-focused data dominates our analysis due to a large number of listings by a single vendor in India. We see a variety of review & ratings datasets across this region also. In the South-East, B2C datasets are common due to the prevalence of mobile devices and the young demographics.  Mobility data is also widely available in these regions and is generally considered to be high quality.

In China, local vendors have a reputation for higher-quality data when compared with international data vendors. Local vendors have access to larger panels, better local language processing resources, and have larger teams on the ground.

Connecting with Vendors

Connecting with vendors is another area that is crucial in the data sourcing process. Many international buyers report having trouble connecting with vendors in China. Data providers may be slow to respond to emails so phoning them directly is often best. Also, using the Chinese messaging app WeChat is often more effective than email to reach data vendors.

For data providers outside of China, these firms have even less experience speaking with investors and may not understand what the buyside do and the legal & compliance aspects surrounding the industry.

Many data buyers are understandably concluding that they must have a presence on the ground in order to connect with the data market in EM Asia or work with aggregators such as Eagle Alpha.

Data Quality

The structure of data from EM Asia, can be hard to deal with. Alternative data in the region is often not structured for an investment application and the frequency can also be uncertain, with updates happening sporadically – this would not suit quant funds that rely on timely data delivery. Some local practitioners acknowledge that the data available in China is lower quality than in the US, but they still consider it good relative to the rest of EM Asia and it is improving.

South-East Asian data is considered even lower quality than China, as well as being more sparse. In both markets, there are vendors collecting data with no defined use case which can be poorly structured with minimal pre-processing.

Figure 2 displays the average history of China and EM Asia datasets as 11.4 and 11.3 years respectively. Global datasets have an average history of 12.2 years and rest-of-world datasets averaged 11.6 years. It is a surprise to many to see that there is a critical mass of datasets in both regions with greater than 10 years of history.

Figure 2: Average Dataset History (Source: Eagle Alpha)

Figure 2: Average Dataset History (Source: Eagle Alpha)

Figure 3 shows that emerging market Asian datasets are least likely to be tagged to tickers with global datasets most likely to be tagged.  For China in particular, firmographic data is poor which raises a challenge when tagging datasets to tickers due to many companies having similar or identical names.

Figure 3: Dataset Ticker Tagging (Source: Eagle Alpha)

Figure 3: Dataset Ticker Tagging (Source: Eagle Alpha)


Emerging markets such as China, India, and the ASEAN region are a key consideration for investors and with this growing interest presents opportunities for data providers of all shapes and sizes, collecting data from a range of different sources.

On a positive note, this increased growth has uncovered a lot of untapped areas for an alpha, where large asset managers are willing to spend resources building a presence in Asia. This not only includes speaking with data vendors and obtaining data, but also hiring talent for on-the-ground expansion. The benefits of this, have increased both the availability and quality of datasets sourced.

Conversely, alternative data in EM Asia, due to being early in the adoption of global standards and quality controls, also present challenges for asset managers entering the space. A lot of resources have to be employed in order to clean and map the data for use, and a lot of time also has to be spent extracting value from the data.

Data availability and quality is improving thanks to the acceleration in technology and the growth in data vendors. Additionally, there are many companies now wishing to monetize their exhaust data in South-East Asian regions.