Alternative Data: The Source of ESG Truth

by Mar 28, 2022Blog, Featured


ESG is the hottest topic in the data world right now. Regulators and asset owners are putting increasing pressure on funds to invest in a more environmentally friendly and socially responsible way. This provides a great opportunity to funds who offer ESG solutions and we have seen massive growth in the AUM directed to funds claiming to incorporate ESG factors into their investment process.

The rise in ESG investing presents some challenges for investors though. The most common challenge we hear about is gaps in the data available, particularly relative to the expectations of regulators and clients.

A closely aligned issue is investor mistrust of company-reported measures, seeing them as unreliable, misleading, or incomplete. Said another way, we are seeing a rise in investor attention to company greenwashing. Clients are interested in techniques for identifying whether a company is overstating or simply lying about its ESG credentials.

For most of the market, the major ESG rating agencies are the go-to for ESG data and insights. While there is utility in the ratings from these agencies, there are some weaknesses too. The rating agencies are heavily reliant on company disclosed information, can have opaque methodologies, and are generally more policy-focused rather than focused on the actions of companies.

For all these reasons alternative datasets are increasingly being seen as the source of ESG truth. Alternative data is data external to the rated entity so helps overcome many of the weaknesses highlighted earlier.

[On March 15th, 2022, we hosted a webinar exploring Alternative Data: The Source of ESG Truth featuring BlackRock and UBS Asset Management. You can now download a copy of the white paper we published on the topic here.]

The Gaps in ESG Data

Broadly we categorize ESG data gaps into two broad buckets:

  1. Coverage gaps – where the data is available for some entities and not others
  2. Granularity gaps – where the level of granularity required isn’t available to any great extent

Coverage Gaps

The more sophisticated funds are using machine learning (ML) to address coverage gaps. The feasibility and accuracy of this clearly on the value being modeled. For example, ML models are quite good at predicting the rating a major agency might apply to a company in developed markets because there are usually many similar companies with which to train the model. By contrast, the models are less strong at predicting CO2 emissions for a company in a developing market, for example, given there are fewer data to train a model.

Other types of coverage gaps that garner many questions are about ESG data for markets outside of English-speaking developed markets and data for asset classes other than public equities.

Significant time is then committed to identifying data vendors that address these gaps. For example, vendors who provide detailed ESG data in markets such as China, Japan, Korea, and India and vendors with coverage of asset classes such as private companies, real estate and infrastructure, and sovereigns.

Natural Language Processing (NLP) solutions are particularly popular for addressing coverage gaps as they provide a scalable way to assess the ESG credentials of an investment based on publicly available data. See figures 2, 3, and 6 later for examples of how NLP can be applied to ESG questions.

Granularity Gaps

Granularity gaps pose an extra challenge to coverage gaps as the data is generally not available for any company so the ability to use gap-filling ML models is severely restricted. Within ESG, the social pillar is rife with granularity gaps.

SASB outlines social sub-topics including customer privacy, data security, labor practices, employee engagement, diversity & inclusion, and employee health & safety. Many of these are poorly reported on by companies, if at all.

Figure 1: Female workforce participation at Adobe and Oracle (Source: Revelio Labs)

Employment data is a powerful category for analyzing many of the human capital considerations of society. For example, using online employee profile data to understand diversity and potentially discriminatory practices. Figure 1 shows an example of this type of analysis. The analysis clearly shows a disparity between Oracle and Adobe in terms of the females being promoted to senior manager levels.

AI & NLP techniques are also popular for addressing the gaps in social data. The example in figure 2 shows negative key passages related to product safety and quality issues from SEC 10K filings for companies in the Russell 3000. The AI model revealed several issues related to product safety, quality control, quality assurance, and customer product safety that could indicate a hidden trend among certain industries or companies. The insights revealed that product safety issues were most apparent in the consumer goods and manufacturing industries.

Figure 2: Product safety and quality issues from SEC 10k filings (Source: Accern)

Another AI & NLP example in figure 3 involves Wirecard. The analysis detected and alerted the presence of a significant number of fraud discussions and complaints on the web 6-months before these mentions had a material impact on the company’s stock price. This generated a strong quantitative signal. The company’s involvement in the artificial inflation of profits scheme was later more widely discovered and ultimately led to the company becoming insolvent.

Figure 3: ESG risks over time (Source: SESAMm)

In the environmental area, scope 3 emissions data is one example of an area where alternative data can plug an important gap in data availability. Figure 4 below shows the top 5 most emitting companies based on scope 3 emissions. In terms of absolute emissions, the stakeholder for whom the largest amount of GHG emissions was calculated downstream is Gazprom, the Russian gas giant. Indeed, with 3,574 million tCO2e, its Scope 3 emissions account for nearly 30% of the sample’s Scope 3 emissions.

Figure 4: Scope 3 emissions in billion tCO2e (Source: Carbon4Finance)


There is some overlap between the topic of gaps and of greenwashing. Some companies may engage in a form of greenwashing by selectively releasing information and deciding not to release other, more damaging, information. In other cases, a company’s statements may simply not match the reality of its ESG credentials. Given the reliance of mainstream ESG rating agencies on company-reported information, company greenwashing can result in company ESG ratings that are out of step with reality.

Employment data again prove valuable in identifying potential Greenwashing. Tracking the previous and subsequent roles for sustainability leads at a company can indicate whether a company is genuine in its sustainability efforts. One analysis showed that corporate strategists and communication specialists are the most common previous roles for Chief Green Officers (CGOs), and a very small percentage of CGOs actually held science-based positions previously.

Figure 5: Green Job Postings compared to ESG Ratings (Source: LinkUp)

AI and NLP again provide a powerful approach to exploring potential greenwashing. In figure 6 we present SDG scoring for a single company. In one instance the scores are based only on self-reported data whereas the second analysis is based on alternative data from public sources. The difference in the scores is stark, with some SDG scores sifting from strong positive to strong negative depending on which source is used.
Another case study compared the ESG scores for the oil majors from a major ESG rating provider to the levels of sustainability recruitment (figure 5). This could indicate companies whose actions in the environmental areas diverge from their policies and stated targets.

Figure 6: SDG Scores from Self-Reported Data and Alternative Data (Source: GlobalAI)

Patent data is another area that shows great promise for identifying potential greenwashing. Patents can provide an indication of investment in green technologies and this can be compared to a company’s claims around their green investments and future plans. Any mismatch could indicate greenwashing.


The ability of alternative data to close gaps in ESG data and spot potential company greenwashing is at an early stage but is rapidly gathering momentum. In this article, we have only skimmed the surface of what’s possible.

[On March 15th, 2022, we hosted a webinar exploring Alternative Data: The Source of ESG Truth featuring BlackRock and UBS Asset Management. You can now download a copy of the white paper we published on the topic here.]