## Methodology

#### Introduction

This dataset was developed in order to address the lack of complete and transversal information about green innovation and the missing global mapping of the evolution of green technologies. On the side of the technologies, we provide an overview of their development, knowledge base and life cycle. We also offer countries' contribution to green innovation and an assesment of their profile. We use patent information in order to localize where green technologies are developed and its intensity is evaluated counting the number of patent families (a patent family is a set of patents which cover a similar invention and share a priority date. More information here).
We detail below the methodology used to build this dataset.

#### Patent identification

In order to obtain information on patents, we use PATSTAT 2016a database, published by the European Patent Office. Patents are considered as environment-related according to the ENV-TECH classification developed by the Organisation for Economic Co-operation and Development (OECD). The ENV-TECH classification, based on the International Patent Classification (IPC) and the Collaborative Patent Classification (CPC), features eight environmental families, separated in 4 different areas:

1. Environmental management
3. Biodiversity protection and ecosystem health (mentioned but not available yet)
4. Climate Change Mitigation Technologies (CCMT)

ENV-TECH is a hierarchical classification, families cited above are the first level of aggregation (1 digit). Each family is divided in groups (2 digits) and sub-groups (3 digits). Not all the groups are divided into sub-groups, and 3 sub-groups include a lower level (4.6.1, 8.2.5 and 9.1.2).

###### Time frame and data aggregation

PATSTAT contains patent information from the 18th century to 2016 as its name indicates. But patenting activity in environment-related technologies really started in the 70s, that is why we choose 1970 as the start year of the dataset. For its end year, we had to consider the presence of a constant backlog of applications that have been filed but not yet examined and, for this reason, not yet added to the database. The delay between filing and inclusion into PATSTAT varies depending on the patent office that received the application, which can be for example of almost 40 months at the US Patent and Trademark Office. The trade-off between having fresh data and taking into account this backlog leads us to consider 2010 as the end date of this dataset.
On the other hand, as explained before, ENV-TECH is a hierarchical classification with 4 levels (from 1 to 4 digits). To remain comparable between them, we should use the same level for all the green technologies; we can not compare a 2-digits ENV-TECH class with a 4-digits one, the latter being much more specific. 4-digits classes are used only in two cases, they can not be our choice to define green technologies. On the other side, 1-digit categories are too broad to capture fine-grained technologies. That is why we decided to consider ENV-TECH 2 digits level as our level of analysis.

#### Geolocalisation of patents

Our goal of developing a worldwide analysis called for accurate geographical localisation of inventive activities. To this end, we used information on inventors' addresses to geocode each patent family at city level. Information on the location of inventors from PATSTAT was parsed through GeoNames and Google Maps API.
We first relied on Institut Francilien Recherche Innovation Société (IFRIS) version of PATSTAT. IFRIS recovers missing addresses combining several external patent sources (REGPAT, National Patent Databases, etc). Second, we geo-localised patent families by identifying the postal codes within the address string and searching in GeoNames. Third, for patent families in which the postal code information is missing, or for which it is not possible to detect the geographical coordinates, we identified the city name in the address using the city table of the GeoNames database and we manually checked the results. Fourth, for the remaining addresses without geographical coordinates, we used the Google Maps API. Finally, we propagate inventors' coordinates inside patent families when inventors appear several times.
The precise geolocalization of inventors allows us to work on different levels of geographical aggregation: countries, regions, urban or metropolitan areas, cities...

#### Measuring the life cycle of technologies

Our methodology to measure the life cycle of technologies is based on the idea developed by William J Abernathy and J. M. Utterback (1978) and Vona and Consoli (2015) where they defined different stages of technologies, which co-evolve with the know-how needed to implement, use and adapt these technologies.
In the early stage, knowledge exploration and experimentation are going on, different designs appear and compete among them, highly localised and only in few places. As the technology develops, designs started to be standardised, inferior variants disappear and dominant designs start to have some diffusion. As technology goes toward maturity, some dominant designs appear with a high level of standardisation and a wider degree of geographical diffusion.
Therefore, we construct our measure of life cycle along two axis: inventing activity (proxied through patenting activity) and geographical ubiquity. we can then create a quadrant along these two dimensions, with four different stages: Emergence, Development, Diffusion, Maturity.

The ubiquity indicator captures the extent to which innovative activities are geographically spread relative to countries' specialisation in green technologies. Following Balland and D. Rigby (2017), the geographical scope of inventions is calculated using the Revealed Technological Advantage (RTA) for each green technology, country and time period as follows:

$RTA_{cjt} = \frac{Patents_{cjt} / \sum_{j}{Patents_{cjt}}}{\sum_{c}{Patents_{cjt}} / \sum_{cj}{Patents_{cjt}}}$

The RTA measures the intensity of the contribution of each country $c$ to the development of technology j at time t. That is, it captures the efforts spent by a country in developing a specific green technology (numerator) with respect to global efforts in developing the same technology (denominator). The ubiquity of each technological domain is given by the number of countries that exhibit a given RTA in a particular green technology at time t:

$UBIQUITY_{jt} = \sum_{c}M_{cjt}$

Where $$M_{cjt} = 1$$ if $$RTA_{cjt} > 1$$. Therefore, the higher the number of countries specialised in the development of a particular technology, the higher the UBIQUITY of that technology. In other words, the indicator is a proxy for diffusion of innovative activities. The advantage of this measure with respect to other potential patent indicators of diffusion (such as i.e. citations, family size, etc.) is that it allows capturing specialisation patterns in specific technologies relative to their global counterparts.
The second indicator is on the number of patent families in green technologies worldwide. This is a proxy of patenting intensity and indicates the development of technologies. Finally, we measure the average value on 10 years periods of both patenting intensity and the ubiquity indicator. This enables us to smooth the trends in both indicators and to assign to each technology a life cycle stage for this period.

## Other ressources

Please find below links to access relevant resources that have inspired our venture and that may be of interest for anyone working with data on technology and innovation.

##### OEC: The Observatory of Economic Complexity

An online ressource for international trade data and economic complexity indicators.

https://atlas.media.mit.edu
##### Skillscape

The Skillscape is a project from the Scalable Cooperation group at the MIT Media Lab to help society understand how Artificial Intelligence and robotic automation can impact human labor.

http://skillscape.mit.edu
##### Technological diversification of European regions

Using patent data, this website shows the potential areas that may be worth to prioritize for each European region based on skills, capacities and technologies.

http://joancrespo.wixsite.com/techdiv
##### Data USA

The most comprehensive visualization of U.S. public data. It provides an open, easy-to-use platform that turns data into knowledge.

https://datausa.io/
##### HistPat

HistPat provides the geography of historical patents granted by the United States Patent and Trademark Office (USPTO) from 1790 to 1975.

https://dataverse.harvard.edu/dataverse/HistPat
##### Green Growth Knowledge Platform

The GGKP is a global community of organisations and experts committed to collaboratively generating, managing and sharing green growth knowledge and data to mobilise a sustainable future.

http://www.greengrowthknowledge.org/data-explorer