What makes a Data Scientist or Data Engineer?

Knowing about the latest and greatest data science tools, packages and libraries is a prerequisite to being a great data scientist or data engineer – but that’s not all.

Technical Skills (tools, packages, library) are a prerequisite, but not the MOST critical

As the market for experienced Data Scientists or Data Engineers continues to heat up, is impossible to find a job description without mention of very specific toolsets (i.e. PySpark, HIVE, PIG, SAS, RStudio….the list could take up the rest of this post. To a good Data Scientist, being aware of these commonly used packages, tools and libraries is definitely a prerequisite. However, consider how often the nuances of one group’s systems and codebases lead to small changes to the way tools or library packages function – these are details you will have to learn on the job.

Great Data Scientists care about the problem & enjoy the hunt in mounds of data for practical and statistical significance.

Data Science Teams are presumably considered successful when their activities provide data to support strategic decision-making, reduce cost-to-buy, or make better recommendations for the next show to binge. It is often overlooked, and all over the board but establishing value-add KPIs for your data science team is critical for measurable success. So while knowledge and experience with technical skills are a big prerequisite to landing the job, they are not what makes a great data scientist.

Great data scientists have to be comfortable with nebulous research questions, with incomplete datasets. Most of the day-to-day work performed by data professionals is spent figuring out how to even approach these elusive questions, and what data would be necessary. It is non-linear, research-based investigative work on half-backed questions about your community or services. They are motivated by curiosity, rely on deductive reasoning skills, and search for recognized patterns or trends in data that are worth additional investigation – much like graduate school research scientists. How can novel sets of disjointed data be curated, qualified, and quantified to provide insights that have a real-world impact?

“Searching for Mysteries without any Clues” – Bob Seger (1976)

They need to be comfortable and capable of working on mysteries without any clues (to quote Bob Seger – Nightmoves). Usually, great data scientists bring elements from each project and previous work to cross-pollinate methods & technology to solve or break the status quo & advance the domain knowledge of the current application. These skills are not something that can easily be learned in a Bootcamp or YouTube video series.

Helping your Data Scientists Succeed = Give them hard problems

Top Tip – The company should provide the data team with hard problems to solve right off the bat. Challenge them & give them enough to sink their teeth into. Great Data Science teams will recognize when the company’s first principles or baseline assumptions depart or are no longer supported by current data. They will take widely held beliefs across the company and attempt to validate them with data. Great data scientists often “think they may be wrong” and seek confirmation or negation of the thought using data-drive methods and non-selective data support.

April 20, 2022 jcachat