Data Sharing - USC Viterbi | Prospective Students

Posted: January 16, 2017, 3:07pm

I’ve written in the past about the trending-toward-open nature of science in the past decade. One of the big pushes behind open science is the sharing of useful data, especially in medicine. This push is in addition to the push for open academic journals, more sharing of thoughts on science blogs and social media, and a trend toward bigger (and more) collaborations across institutions, states, and countries. Today I’ll write about a website I recently stumbled upon that aims at creating a common hub for storing and sharing data from clinical trials.

One of the challenges with medical data is patient identifying data. We don’t want our data to be roaming around the internet, with unlimited access to any scientist, especially if that information can be identified back to a specific individual. We can see how that data might end up in the wrong hands and used for collatoral, black-mailing, or any other nightmare scenarios. So generally health data is codified by a unique patient identifier that can’t be traced back to the original patient. This is just a number (think: hospital_00001, or something similar). The people who encode this data have access to the originals and must undergo the proper channels for patient confidentiality and ethics, governed by an IRB. It’s a huge hassle (although a worth-while hassle) to get approval through an IRB to access data, but it’s much easier once the data has been coded to eliminate patient identifying information.

Once the data is safe, people like me can apply to be a part of one of many data portals. One of these portals, known as DataSphere, began during the 2001 CEO Roundtable on Cancer, a brainchild of President George H.W. Bush. The Project Data Sphere was inspired by the CEO Roubdtable’s taskforce Life Sciences Consortium and their vision of making historical data more accessible to researchers.

Here’s what the portal looks like:

This framework is perfect for the type of research I do. How did I find it? I was reading an academic paper in the journal The Lancet Oncology, and there was a footnote saying that the data described in the paper could be accessed through the Project Data Sphere. I applied for an account with the portal, and ten minutes later I had the data from the journal article on my computer to explore on my own (and hopefully use to inform some of my own cancer models soon!).

Published on July 26th, 2017Last updated on January 20th, 2021