We use two data sets:
The LinkedIn data contain global job offers while the GlassDoor data only jobs from the US. The LinkedIn data including only job offers with the term SEO (or seo) contain 5,856 offers overall, 984 offers from English-speaking countries (USA, Canada, UK, Ireland, Australia, South Africa) and 862 from the USA and the UK (links starting with www.linkedin.com).
We merged both data sets and kept as many variables as possible, manually creating new variables for both datasets (GlassDoor: seniority and employment type; LinkedIn: sector) based on text matching of job titles and descriptions. We also removed as many duplictaed entries as possible by matching job title, employer and job location. The final worldwide data set contains 7,051 observations.
Because the job offers are collected from all over the world, a lot of foreign terms are included. Thus, we merged the GlassDoor data also with the English subset of the LinkedIn data and kept again as many variables as possible by manually creating new variables for both data sets. The final “All English” data set contains 2,569 observations.
The GlassDoor data are cleaner with regard to job titles and description than the LinkedIn data. Consequently, some plots using the GlassDoor data do a better job so we provide for now both version (the merged “All English” data set and the GlassDoor data set).
Also, the GlassDoor data contain information that are missing from the LinkedIn data such as estimated salary range, rating, employer, industry, and size (no. of employees).
We analyzed the data on job titles using text mining techniques. In a first step, we tokenize the job titles into single words and visualize their frequency. Stop words and words that appeared less than 7 times were removed to make the graph easier to grasp.
In a second step, we analyzed sequences of words in the job title. The sorted bar plot shows the most popular consecutive sequences of words (5 or more occurrences), colored by category.
We manually classified in technical and non-technical positions, removing all words that are no specific to any of the both categories:
analy|special|engine|develop|technic|optimimanage|direct|writ|consult|coordinat|edito|market|sale|social|strateg|supervisThe modified stacked bar plot shows the number of words found per job category and, additionally as another stacked bar next to it, the most common words per category (with labels for words that occured at least 20 times). The height of the stacks indicates as well the number, the width is arbitrary.
Note: For now we focussed on the job offers from the US in terms of cities, states and counties. This decision was based on two reasons: First, we believe US data is of most interest; secondly, we center all our analyses on job offers from English-speaking countries only so a world map would be in contrast here. Of course, if you think it is valuable, we can also create a world map.