/ Data Integration

What’s Up with My qPCR Experiments?

A case study on leveraging TetraScience Data Integration Platform and REST API for experimental control charting for qPCR experiments. Enabling scientists to track important parameters across multiple experiments over time provides scientists a window into the consistency and reliability of their results.

Authors:
Evan Anderson - Delivery Engineer
Spin Wang - CEO & Co-Founder

Header-1

Everyone conducts experiments with a specific goal in mind. Maybe you need to know the quantity of a given material in your sample, or verify a lack of impurities. Lab instruments often produce far more than the single result you’re seeking in the moment, such as methodological and run-specific metadata. However, the majority of this data is stuck in vendor specific formats and remains siloed. The data just sits on the lab computer and after several days or weeks, people forget about that data.

Here, we’ll show you how you can use the TetraScience Data Integration Platform to automatically collect, centralize, harmonize and prepare your data for rigorous analysis.

We’ll be using qPCR as a case study for this process. qPCR, or quantitative polymerase chain reaction, is an experiment used to quantify the amount of DNA or RNA present in a sample. In bioprocess, it is often used to quantify the amount of contaminant DNA in cell culture samples.

Measurement of DNA quantity depends on calculation of a standard curve. Samples of known quantities are used to generate the standard curve versus the experimental output cycle threshold (Ct), from which experimental DNA quantities can be calculated. Deviations in the parameters of this standard curve – y-intercept, slope, and r2 – reflect potential shifts in reagents, or errors in dilution of standard reference DNA. Enabling scientists to track these parameters across multiple experiments over time provides scientists a window into the consistency and reliability of their results.

Here, we’ll show you this control charting from end-to-end: from uploading your files to seeing your results.

Image: Generalized schematic of the path from RAW Excel reports to TetraScience Data Lake to iPython Jupyter notebook.
work-flow-1

These data are pushed to the TetraScience Data Lake. An upload of a RAW qPCR file triggers the RAW to IDS pipeline. The IDS is the Intermediate Data Schema, used to harmonize data from the experiemnts and a standardized data format which is queryable by ElastiSearch query and SQL. Inside this schema, we can see results for each of the samples in this experiment, as well as the parameters for the standard curve.

Image: (Left) Schematic of qPCR schema on TetraScience Data Integration platform. (Center) Expanded schematic of samples section of IDS. (Right) Exceprt of samples section of qPCR report in IDS-JSON format.
ids-result

Now that we have all of this harmonized data in the Data Lake, we can easily access it with data science tools like interactive python notebook with the TetraScience Web API. A single query can access all of the 80+ qPCR files that were uploaded to the DataLake, and this data can be aggregated into an easy-to-use pandas DataFrame in python.

Image: An elastic search query for all qPCR IDS files produced from by pipelineId in the last hour, executed with the python requests package.
ES-query

With this table of data, we can start to ask interesting questions about our reference values. For instance, which of these parameters are varying most? It looks like the y-intercept – which corresponds to the expected result for the standard quantity of 1 pg/mL DNA – has an interesting distribution.

Image: Using the seaborn function sns.pairplot(), we can quickly look at relationships between reference variables and their distributions.
pairplot

Let's dig deeper! We can plot the y-intercept over time. The mean is the black dashed line, and 2 standard deviations above and below the mean are grey dashed lines. Clearly, the three data points outside the 2 standard deviation lines warrant investigation - perhaps standards were improperly diluted, or old reagents led to unexpected standard curve performance. In the future, this window can serve to inform scientists if their experimental references falls outside of expectation.

Image: y-intercept plotted as a function of time, using matplotlib
control-chart

Let us know what you think! If you have questions, or if you’re curious about digging deeper into your experimental pipelines, please visit us at www.tetrascience.com or contact us at https://www.tetrascience.com/contact-us
summary