On Mitigating the Cost of Change

Recently I was invited to participate in a training for a new client management system the aims to streamline the on-boarding process. On one of my days in the office, I ended up chatting with others…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




What to Learn to Become a Data Scientist in 2021

When I started learning data science a few years ago most job ads requested a PhD, or at the very least a masters, in maths, statistics or a similar subject as an essential requirement.

Over the last couple of years, things have evolved. With the development of machine learning libraries that abstract away much of the complexity behind the algorithms, and a realisation that practically applying machine learning to solve business problems requires a set of skills that are not usually acquired through academic study alone. Companies are now hiring data scientists based on their ability to perform applied data science rather than research.

Applied data science that delivers value to a business in the fastest possible time requires a very practical skillset. Additionally, as more companies migrate their data and machine learning solutions to the cloud, It is becoming paramount for data scientists to have an understanding of the new tools and technology relating to this.

Additionally, I believe that the days of a data scientist working solely on data modelling, using data pulled together by data engineers, and then handing the model over to a team of software engineers to put into production are largely behind us. Particularly outside of the tech giants such as Amazon, Facebook and Google. In most companies, with the exception of some of the very big tech players, there either isn’t the resource available in those teams or the alignment of priorities are not there at the right time.

In order for a data scientist to deliver maximum value to a business, they need to be able to work across the full model development life cycle. Having at least a working knowledge in developing data pipelines, performing data analysis, machine learning, maths, statistics, data engineering, cloud computing and software engineering. This means that as we move into 2021 the data scientist generalist is the preferred hire for most businesses.

This article doesn’t cover absolutely everything you need to be a data scientist in 2021. Instead, it covers the key skills, both new and old, that have become the most essential for every successful data scientist to have in the near future.

There are still some cases where data scientists may use R but generally speaking if you are doing applied data science these days, then Python is going to be the most valuable programming language to learn.

Python 3 (the latest version) has now firmly become the default version of the language for most applications as support for Python 2 was dropped by the majority of libraries on 1st January 2020. If you are learning Python for data science now it is important to choose a course that works with this version.

You will need a good understanding of the basic syntax of the language and how to write functions, loops and modules. Be familiar with both object-oriented and functional programming in Python, and be able to develop, execute and debug programs.

Pandas is still the number one Python library for data manipulation, processing and analysis. In 2021 this is still one of the most vital skills to have as a data scientist.

Data is at the very heart of any data science project and Pandas is the tool that will enable you to extract, clean, process and derive insights from it. Most machine learning libraries also generally take Pandas DataFrames as a standard input these days.

SQL has been around since the 1970’s but it still remains one of the most vital and saught after skills for data scientists. The vast majority of businesses use relational databases as their analytical data stores and as a data scientist SQL is the tool that will deliver you this data.

NoSQL (“not only SQL”) are databases that don’t store data as relational tables, instead data is stored as key value pairs, wide-columns or graphs. Example NoSQL databases include Google Cloud Bigtable and Amazon DynamoDB.

As the volumes of data collected by companies increases and unstructured data becomes more regularly used in machine learning models organisations are turning to NoSQL databases, either as a complement or as an alternative to, the traditional data warehouse. This trend is likely to continue into 2021 and as a data scientist it is important to gain at least a basic understanding of how to interact with data in this form.

The use of cloud in other areas of a business usually goes hand in hand with cloud-based solutions for data storage, analytics and machine learning. The major cloud providers such as Google Cloud Platform, Amazon Web Services and Microsoft Azure are developing out tooling for training, deploying and serving machine learning models at a rapid pace.

As a data scientist working in 2021 and beyond it is very likely that you will be working with data housed in a cloud-based database such as Google BigQuery and developing cloud based machine learning models. Experience and skills in this area are likely to be in high demand as we move into 2021.

I am noticing Airflow being mentioned more and more often as a desirable skill for data scientists on job adverts. As mentioned at the beginning of this article I believe it will become more important for data scientists to be able to build and manage their own data pipelines for analytics and machine learning. The growing popularity of Airflow is likely to continue at least in the short term, and as an open source tool, is definitely something that every budding data scientist should at learn.

Data science code is traditionally messy, not always well tested and lacking in adherence to styling conventions. This is fine for initial data exploration and quick analysis but when it comes to putting machine learning models into production then a data scientist will need to have a good understanding of software engineering principles.

If you are planning to work as a data scientist it is likely that you will either be putting models into production yourself or at least be involved heavily in the process. It is therefore essential to cover the following skills in any learning that you undertake.

In this article, I wanted to highlight some of the key trends emerging in terms of the skills required for data scientists. These insights have been gleaned from reviewing current data science job adverts, my own experience working as a data scientist and reading articles covering future trends in the field.

This is not meant as an exhaustive list, there are certainly a lot more skills and experience needed to become a successful data scientist. However, in this post, I wanted to cover some of the most important skills that are very likely to be required in the coming year.

For a more comprehensive list of skills that you should learn, if you are studying to be a data scientist, I wrote a series of articles giving a complete roadmap for learning. They are linked below.

Add a comment

Related posts:

Blockport Crowdsale Whitelisting

In this article you will find all necessary information regarding the whitelisting process of the Blockport pre-sale. If you have never participated in a token sale before, please read this article…

Announcing the Data Analytics and Use awardees!

We are excited to announce the latest round of awardees via the COVIDaction Data Challenge, under the theme of Data Analytics and Use. OpenFN, BAO Systems and Fraym are the top three we will be…

What Illumination Readers Want

In a list that Dr. Y disseminated to writers for Illumination recently, he pointed out that his informal survey included requests for all kinds of information about our Conditions. A good chunk of…