/ Data Integration

How to practice mapping your data flows

We at TetraScience often use the analogy of an interconnected highway system to describe the movement of data in today’s lab. There is immense complexity in how data is produced and consumed by instruments, software, and people.

And worst of all, labs are often working with little understanding of how and where data move. This instrument’s data are moved to one location. That software’s data are moved to another location, as well as here and there. And all of this CRO data are dumped into the same place…

It’s not the lab's fault. Instrument, software, and automation technology expanded rapidly in the last decade. For many, collaborating with external partners became a prudent business choice. Suddenly, labs had to grapple with a huge ecosystem of tools and vendors, most of which produce widely different data formats. This has led to a problem in data management.

An effective way to combat the expanding and increasingly complex data superhighway is to draw a Data Flow Map.

Mapping data flows helps labs in several ways:

  1. Make decisions on tools like an ELN and informatics software
  2. Identify security risks and improvements
  3. Identify opportunities for advanced analysis
  4. Save money, time, and effort

Step 1: Get Draw.io (or a similar tool) and/or break out the whiteboard.

A tool like draw.io will help you quickly create an interconnected diagram with boxes, arrows, icons and more. This is primarily a visual exercise so feel free to use a whiteboard instead (just remember to take a picture).

Step 2: Write down all the Data Sources & Data Targets you can think of.

These will often include:

  • Instruments (platereader, mass balance, chromatography)
  • Applications (ELN, registry)
  • Open information (public)
  • Outside vendors (CRO, academic labs)

Begin by boxing your Sources and Targets. You can be as general or specific as you like. For instance, you could put “chromatography instruments” or you could list “AKTA” or “Waters HPLC”. In some cases, it may make sense to label the associated software (after all, that’s what generates the data), like “UNICORN”, “Empower”, or “Chemstation”.

Step 3: Create all the groups of people (or singular persons) that use data.

These people could produce, consume or analyze data, be responsible for QC, or they could even revise or add to data.

Step 4: Begin by connecting one Source to every Target that Source moves data to. Repeat.

Start with a single box. Think of all the places that its data go. Repeat with all the boxes.

By now you might be getting a messy picture. Move around your Sources & Targets to give them some space and stretch out your arrows.

Step 5: Connect the People that consume data from any Source.

Often there will be many people touching a single Source or Target. If they’re consuming data draw your arrow from Source to Person. This could include analysis of data or searching for data.

Step 6: Connect the People that produce data to any Target.

If someone enters data into an ELN for instance, draw a line. If someone takes a flash drive to another’s computer for analysis, draw a line.

By now you’ve created quite a few boxes and lines. Although things get messy quickly, your map should be a great start to understanding your data flows.

With this map, you can begin to understand where your data is moving and who or what is consuming it. If you’re considering a new tool, you should now have a good idea of what data it will need to ingest or produce. If you’re considering who should gain access to certain data, you should now see who has or has no access today. Are some of your applications untouched? Perhaps it’s time to evaluate if you need that software.

If you're interested in professional data flow mapping services, check out our data strategy workshop.

A few quick tips to finding value in your data map:

Look for gaps
These could be obvious or subtle gaps in where data flows. If a group should really have access to a dashboard, why don’t they? Should your data warehouse feed into your ELN?

Look for opportunities to consolidate
Often times labs are using several different technologies to accomplish the same task. If you’re using Egnyte, Dropbox, and Email to accept CRO data, why not consolidate? The same can be said for various cloud storage providers, like AWS or Azure. You may even save some money.

Look for places to automate
So often do we rely on people to transform data or find data. Try to identify opportunities to automate data flows thereby eliminating human effort and alongside, human error.

Look for opportunities to combine streams for analysis
Do you have multiple chromatography systems? Imagine visualizing batch runs across all these systems at once.

We start many of our customer engagements with a data workshop. Contact us for more information.