0. Introduction

We use two data sets:

The LinkedIn data contain global job offers while the GlassDoor data only jobs from the US. The LinkedIn data including only job offers with the term SEO (or seo) contain 5,856 offers overall, 984 offers from English-speaking countries (USA, Canada, UK, Ireland, Australia, South Africa) and 862 from the USA and the UK (links starting with www.linkedin.com).

We merged both data sets and kept as many variables as possible, manually creating new variables for both datasets (GlassDoor: seniority and employment type; LinkedIn: sector) based on text matching of job titles and descriptions. We also removed as many duplictaed entries as possible by matching job title, employer and job location. The final worldwide data set contains 7,051 observations.

Because the job offers are collected from all over the world, a lot of foreign terms are included. Thus, we merged the GlassDoor data also with the English subset of the LinkedIn data and kept again as many variables as possible by manually creating new variables for both data sets. The final “All English” data set contains 2,569 observations.

The GlassDoor data are cleaner with regard to job titles and description than the LinkedIn data. Consequently, some plots using the GlassDoor data do a better job so we provide for now both version (the merged “All English” data set and the GlassDoor data set).

Also, the GlassDoor data contain information that are missing from the LinkedIn data such as estimated salary range, rating, employer, industry, and size (no. of employees).

1. Job Title