Enhancing the scientific data engineering experience

February 7, 2024
Sean Johnston

Scientific data engineers face obstacles every day—fragmented data ecosystems, proprietary data formats, data silos, lack of ontology, manual data aggregation, and more. These challenges must be navigated before the value-adding data analytics work even starts. But there’s a solution that offers a smooth path forward: the Tetra Scientific Data and AI Cloud™. It enables you to efficiently replatform, transform, and update data. Let’s take a tour through the latest improvements for scientific data engineers in the release of Tetra Data Platform v3.6.

Tetra Data Platform v3.6 enhances the scientific data engineering experience in three ways:

Low-code solutions

Tetra Data Platform v3.6 introduces a low-code solution for creating and updating protocols, essential to managing the business logic for data pipelines. Previously, modifying a protocol usually required the use of developer tools to create and deploy changes via the Tetra Data Platform API. This latest update simplifies the process by enabling you to directly incorporate DataWeave scripts through the pipeline’s user interface. This enhancement empowers users to directly input and edit the logic of pipeline protocols within the Tetra Data Platform, reducing the reliance on external development tools.

Example of a DataWeave script protocol within a Tetra Data Pipeline.

Bulk File Processing

This feature enables the reprocessing of large numbers of files with fine control over which files to include (e.g., filtering by date range or process status). Users can name reprocessing jobs and monitor their progress in a new dashboard. As the scientific use cases for your data grow, you can easily re-examine large historical datasets or enrich them with more context to enhance their utility. Bulk file processing ensures that data can be efficiently updated at scale across an enterprise, allowing users to focus on more impactful work.

Bulk pipeline process jobs can be created on the Pipeline File Processing page of the Tetra Data Platform.
See low-code protocols and bulk file processing in action

Entity relationship diagrams

Another powerful tool introduced with Tetra Data Platform v3.6 is the entity relationship diagram (ERD). These diagrams visually represent the JSON Intermediate Data Schema (IDS) structure as tables. Users can intuitively explore the ERD by zooming and panning through the structure and relationships across tables.

Significantly, the ERD features a search function that simplifies the process of locating and selecting columns of interest. With just a single click, you can generate an SQL query for the fields you care about, making it easier and faster to parse through enterprise-level data and locate the specific data you need. This functionality allows users to focus on data utility rather than troubleshooting SQL syntax errors. As a result, users are better equipped to create robust data solutions and derive valuable insights from their data.

Example of an ERD used to view the relational data structures of the IDS, search for terms, select fields, and generate SQL queries from selections.
ERD SQL queries can be reviewed, modified, copied, or run directly on the Tetra Data Platform.
See ERDs in action

Summary

The Tetra Data Platform v3.6 offers a host of new features that will delight scientific data engineers, helping them build a simple, robust, and adaptable data ecosystem for their stakeholders. These new features streamline scientific data replatforming and engineering while simultaneously fueling downstream analytics and AI applications with engineered scientific data.

If you’d like to see how our scientific data engineering improvements can help your team, contact one of our experts today.

If you’re interested in other new features released with Tetra Data Platform v3.6, you can read about our self-service pipelines, pluggable connectors, file journey, and multi-byte character support.

Want expert training on these new features?

TetraU provides in-depth training for the Tetra Scientific Data and AI Cloud, including our new TDP v3.6 features. Check out TetraU for workshops and training content developed by the data and life science experts at TetraScience.