AusDask

1. Intro

Objectives

As a graduate student seeking more career opportunities in Data Science, I would like to know some most common required skills and tools for Data occupations, including Data Analyst, Data Engineer, and Data Scientist, at different seniority levels in Australia in 2022.

I hence perform this analysis to answer those questions and would like to share with others the most rewarding and in-demanding skills in the Data industry.

I decided to passively gather raw data by scrapping job postings on Linkedin, cleaning them afterwards to suit my need, and performing my analysis using the clean data.

In this report, I will mainly focus on the analysis section. The web scraping code using Selenium, the data cleansing and analysis notebook, and the clean dataset can be founded below.

I also build a dashboard in Tableau Public. You can check it with the link below as well.

Links

2. Questions and Hypothesis

Questions

1. What are the most demanding skills in general?

2. Are those demanding skills correlated with each other in general?

3. What are the most crucial skills for every occupation?

4. Are there differences in the frequency of required skills for different experience levels?

5. Do the job locations influence the required skills?

Hypothesis

1. The most demanding skill in the Data industry, in general, should be SQL, followed by Python/R and Tableau/PowerBi.

2. The correlation between programming languages (SQL, Python, and R) and data visualisation tools should be high, but not between those in the same categories.

3. SQL should be the most critical skill in all Data occupations, followed by one programming language like Python/R and one data visualisation tool like Tableau/PowerBi. For Data Analysts, in particular, Excel is an essential skill as well, while Data Engineer should require candidate knowledge in cloud services like Azure or Aws.

4. Yes, entry-level jobs may only see high frequency in skills like SQL, Python/R, Excel, and Tableau/PowerBi, while higher-level jobs may include cloud services like Azure or Snowflake.

5. No, job locations will not influence the required skills and vice versa.

3. Dataset

I faced some difficulties doing natural language processing to get a list of skills/tools from the job descriptions.

With my limited capability, in the end, I decided to manually make a list of skills and technologies that are widely used in Data Science.

There are a total of 31 skills and technologies attributes in the dataset. All contain binary values(0 or 1), where the value is 1 if the job requires that skill.

Here is a word cloud to represent the frequency of skills that I included in all job descriptions in the dataset

4. Data Analysis

Most Demanding Skills in General

Question 1: What are the most demanding skills in general?

First, let's look at the skill requirements in general in Australia

The most demanding skill is SQL, followed by Python; both are programming languages.

Other common are cloud services like Aws and Azure, and Spark, a data processing service.

Data visualisation tools - Tableau, PowerBi, and spreadsheet skill - Excel - are also covered in a portion of data jobs.

Demanding Skills Correlation

Question 2: Are those demanding skills correlated with each other in general?

I will continue examining whether high-demanding skills are correlated. I will only take in those that account for more than 5% of all data jobs in the above bar chart of question 1.

The correlation of programming languages among each other is significantly higher than the correlation between programming languages and data visualisation tools, mostly because employers require their candidates to know at least one programming language or more in their job descriptions.

The same phenomenon happens for cloud services where Aws, Azure, and Snowflake are highly correlated.

Spark, a large-scale data processing tool in Data Engineer and Data Science, is highly correlated with programming languages Scala, Java, and Python and also significantly connected to Aws.

Data visualisation tools like Tableau and PowerBi, while correlated to one another to some extent, are not that commonly required with Python, but rather SQL and R in the case of Tableau.

Demanding Skills by Occupations

Question 3: What are most crucial skills for every occupation?

Let's continue by looking in more detail at the required skills for Data occupations - Data Analyst, Data Engineer, and Data Scientist

SQL is highly required among all occupations, making up more than 65% of every single one.

For Data Analysts, Python, Excel, Tableau and R are the next required skills.

Data Engineer should possess good knowledge of cloud services(Aws, Azure, Snowflake) and programming languages(Python, Scala, Java) used in the processing pipeline.

Most Data Scientist jobs require candidates to use two programming languages, Python and R.

Demanding Skills by Levels

Question 4: Are there differences in frequency of required skills for different experience levels?

Let's look at the distribution of demanding skills by different seniority levels in the Data industry, including Entry level, Associate, and Mid-senior level.

SQL and Python are highly and equally required at all seniority levels.

At Mid-Senior level positions, it is more common to expect candidates to possess knowledge in services like Aws, Azure, Spark, and Snowflake while the requirements at entry level and associate level are dramatically lower

Tableau is also more demanding at the Associate and Mid-Senior levels.

Demanding Skills by State

Question 5: Do the job locations influence the required skills?

I will end the analysis section with the demanding skills frequency in different states in Australia. I will only include six states in this analysis, and their proportion can be seen in the following pie chart

New South Wales and Victoria account for approximately 80% of data jobs in Australia, and it should mostly be based in Sydney and Melbourne, two megacities of Australia.

Now let's see the skills dependency in every state and see whether the job locations have an influence on the required skills

There are no patterns in the influence of job locations on the candidate's required skills.

Both New South Wales and Victoria have nearly the same high-demanding skills and services distribution.

In Western Australia, Azure is the most required skill, which we could imply from previous parts of this analysis is that a high proportion of data jobs in this state should be Data Engineer.

5. Conclusion

To wrap up, I will summarise the skills for graduate students who are in the same boat as me to focus on in their early careers.

SQL is a must for every occupation, and it should be the priority skill to learn by every Data student.

Python is the programming language you should pick up in general as it can be used in many applications among different occupations and tasks, while if you pursue a Data Scientist career, you can also learn R as well.

If you want to be a Data Analyst, it is good to work on your Excel and Tableau skills

Finally, for Data Engineer jobs, it is highly reward to possess knowledge of cloud services like Aws ,Azure, and Snowflake as well as Spark engine for data processing.


Thanks for reading.

Contact Me

Feel free to reach out to me via any social medias below, or send me an email by clicking on mail icon below as well