3 Steps to Enabling Citizen Data Scientists in Your Org

Many companies seek to democratize their data as they embrace modern data analytics. Their goal: to accelerate data-driven decision-making. They want to give analysts, subject matter experts, and other business users access to data across the enterprise. And they want to equip them with advanced tools for exploring that data without help from IT and the true data scientists in the organization.


IT teams are time-strapped, period. Necessity is a factor for this thinking. The less time IT teams need to spend fulfilling data requests from the business, the better. Also, the reality is that even the most well-staffed teams of data scientists can only work so fast.

Time is of the essence. Companies aiming to build a team of citizen data scientists recognize that removing barriers to data access is key. This will shorten the path to powerful insights that will help the business identify risks and opportunities.

Data keeps growing. There’s so much data—structured and unstructured—already waiting to be analyzed and put into action. It keeps piling up, too, across disparate silos, data warehouses, and lakes.

Data scientists are few and far between. Data scientists are hard to find, expensive to hire, and tough to retain. Companies with expert talent in-house prefer to have them focused on using their data to drive innovation and solve critical business problems.

Frankly, it’s hard to see a solution to getting data insights swiftly, and at scale, other than by devoting more people to the task. As two data experts recently shared in a Harvard Business Review article, “We sometimes ask companies, ‘Which would you rather have: a newly minted Ph.D. data scientist or 20 people who can conduct basic analyses in their current jobs?’ Almost all opt for the latter.”

With that as the goal, how can you set yourself up to be able to deliver that data downstream to users?

Step 1: Identify and Enable Your “Power Users”

Who are the citizen data scientists in your organization? Typically, “they reside in a line of business such as finance, sales, marketing, or human resources (HR). They possess a deep domain knowledge of the business challenges their department faces.”¹ They’re also data-savvy, though not necessarily technic al, and curious by nature with a passion for problem-solving.

Potential citizen data scientists are also keen to learn how to work with analytics tools powered by artificial intelligence (AI) and machine learning (ML). With the right training, these individuals can rise to become what Gartner calls the “power users” of data in an organization. They’ll be capable of performing “both simple and moderately sophisticated analytical tasks that would previously have required more expertise.”²

Organizations often find that it’s easier to train business users to work with modern analytics tools than it is to turn their data scientists into business subject matter experts. Business users can move faster to ask the right questions of data. They can also dice, splice, and correlate it in ways that data experts outside of their department or function might not think of.

Business users can move faster to ask the right questions of data. They can also dice, splice, and correlate it in ways that data experts outside of their department or function might not think of.

Step 2: Empower Citizen Data Scientists to Make Breakthroughs

Employees capable of learning how to perform data analysis should be working hands-on with data relevant to their core work. Consider, for example, researchers and clinicians in the healthcare and life sciences industry. They can turn ideas into actionable therapies and effective vaccines to improve patients’ lives. AI and ML tools can help accelerate their research and development (R&D) by letting them mine vast data stores and run simulations.

That’s one powerful example of how citizen data scientists can deliver value when they’re able to dig into, experiment with, and collaborate on data like never before.

But again, citizen data scientists don’t have to be steeped in R&D or trained analysts. They can be anyone working in any part of your organization, from an HR leader to a procurement manager to an applications developer. And with platforms that help them to be self-sufficient when possible, domain experts can get support from data scientists as needed—for example, on more complex modeling scenarios.

Tip: Your vision for a team of citizen data scientists won’t be fully realized if your business lacks IT infrastructure that’s robust enough to support modern data analytics. That includes the data storage platforms AI- and ML-powered tools require to fulfill any data scientist’s request for real-time data analysis.

Step 3: Remove Complexity in Your Infrastructure to Expand Data Access

Harnessing data—so it can be used effectively by all the data scientists in your organization— requires an excellent data management strategy. It should cover everything from data governance and data protection to the underlying infrastructure that will drive optimal application performance.

The underlying infrastructure is the real key. Making data valuable to downstream users can be a monumental task. It requires merging information sets and consolidating different data types—from raw telemetry data to highly structured CRM data.

As you seek to expand data access to more users across your organization, some of the technical aspects you’ll need to weigh include:

  • Scaling multiple workloads efficiently: A modern data storage solution like FlashBlade® disaggregates compute from storage. It provides a modern platform for hosting multiple analytics applications. It will also support a large number of users and can easily scale with data growth.
  • Consolidating fast file and object storage: Modern data analytics applications need to efficiently process vast amounts of structured and unstructured data. Slow, complex legacy storage systems don’t support that efficiency. Unified fast file and object (UFFO) storage does by consolidating data silos and accelerating discovery, insight, and innovation. And, as noted in a recent post on the value of UFFO, not having intuitive, reliable, and fast data storage and IT infrastructure unnecessarily bogs down anyone in the organization working with data.
  • Keep modern data storage, modern. Shifting to a modern data analytics platform is an essential step toward supporting the work of your data scientists. But after that, you’ll need to keep your platform up to date. That doesn’t mean you have to endure expensive upgrades or disruptive downtime, though. A subscription model like Pure Evergreen™ lets you refresh your storage seamlessly and cost-effectively.

Simplify Infrastructures Now to Avoid Losing Ground Later

Experts predict that the pandemic will motivate many companies to invest more in data science initiatives this year. They see these investments as a “big bet” to help them increase agility and gain an edge on their competitors.³ And research from Gartner suggests that the events of 2020 have made investments in data science and ML platforms more practical for businesses, too.4

So, it’s becoming more important, but also potentially easier, for businesses to provide data modeling and analysis capabilities to more users in the organization. Those users, in turn, can make the breakthroughs that drive business forward.

But, take note: Without IT infrastructure to support a Modern Data Experience™, your citizen data scientists aren’t likely to get far with their projects—no matter what amazing tools they have. And neither will your expert data scientists. There’s also a good chance your business could end up falling behind in an increasingly data-driven world.

Get more value from your data and accelerate time to insights with business data analytics solutions from Pure Storage®.