SOCR DataSifter

Statistical Obfuscation of Sensitive Big Data enabling Advanced Information Aggregation, Sharing, and Analytics

The DataSifter may provide the critical infrastructure at the right time to enable secure transdisciplinary team-based interrogation of Big Data including sensitive information. This unique functionality is necessary in many active R&D-intense organizations and high-tech companies!

See the DataSifter Technical Documentation

DataSifter Overview

There are no practical, scientifically reliable, and effective mechanisms to share real clinical data containing clearly identifiable personal health information (PHI) without compromising either the value of the data (by excessively scrambling/encoding the information) or by introducing a substantial risk for re-identification of individuals (by various stratification techniques).

The DataSifter represents a novel method and a computational protocol for on-the-fly de-identification of sensitive information, e.g., structured Clinical/Epic/PHI data. This approach provides a complete administrative control over the risk for data identification when sharing large clinical cohort-based medical data. At the extremes, a data-governor may specify that either synthetic data or completely identifiable data is generated and shared with the data-requester. This decision may be based on data-governor determined criteria about access level, research needs, etc. For instance, to stimulate innovative pilot studies, the data office may dial up the level of protection (which may naturally devalue the information content in the data). On the other hand, for more established and trusted investigators, the data governors may provide a more egalitarian dataset that balances preservation of information content (data-energy or analytical-value) and security (protection of sensitive-information).

In a nutshell, responding to requests by researchers interested in examining specific healthcare, biomedical, or translational characteristics of multivariate clinical data, the DataSifter allows data governors, like Healthcare Systems, to filler, export, package, and share sensitive clinical and medical data for large population cohorts.

About DataSifter

The "Sensitive Data Sharing Problem" Problem

Confidential data can’t be shared without violating HIPPA and legal rules designed to protect each patient’s identity.

However, this also hinders health and biomedical scientists from using the data to study patients’ diseases or conditions, which could lead to new scientific or health-related breakthroughs.

Health systems struggle with honoring patient privacy, while advancing scientific learnings that could improve patient care, treatments, and lives.

While biomedical and healthcare applications provide powerful examples of the ability of the DataSifter to enable cooperation between data-owners and skills data-analysts, there are many other industries that can significantly benefit from enabling data sharing. Examples include Census data, CMS data, fin-tech data, IRS/taxation data, market economic data, business transaction information, etc.

Detailed documentation

DataSifter provides detailed user-friendly documentation enabling users to customize the functionality, extend the application scope, and easily implement new ideas.

DataSifter Video

DataSifter Solution

Allows sensitive data sharing, while protecting confidential patient data to further health discoveries.

DataSifter is vital for any health institution which requires access to aggregated sensitive patient data to:

  • Expand their scientific research and clinical understanding of patients’ diseases and conditions; and
  • Advance their knowledge of clinical treatment, drug effectiveness, and so much more.
Learn more about DataSifter at or by contacting SOCR’s Director, Prof. Ivo Dinov.

The Power of DataSifter

Protecting confidentiality while advancing health science

DataSifter is a new technique for statistical encryption of sensitive information that is HIPPA-compliant. Its proprietary algorithmic process makes possible the sharing of aggregated sensitive health data (e.g., patient clinical or electronic health records) for those currently with unauthorized access without revealing or compromising patients’ confidentiality or privacy.


Sharing and Protecting Sensitive Patient Data for Health Research & Drive for New Scientific Discoveries.

SOCR Resource Visitor number Web Analytics DSPA Email