Data and diagnosis

The rise of big data within healthcare has seen an explosion of its applications across multiple disciplines. One area that has received a great deal of attention is the role of big data in improving diagnosis for patients. This article explores the role of big data in diagnosis, some examples and the moral considerations of its use.

Read more

Read full article

Big data in healthcare

Used in the context of healthcare, big data refers to the vast quantities of health data that may be collected from either individuals or populations and used as a means to treat conditions, cut down costs or help prevent epidemics.1 Big data has changed the way we manage, analyze and leverage data in every industry, and it has also had a huge impact in healthcare.

As we start to live longer, the way we manage our health has also changed. Healthcare professionals want to know more about our medical history and be able to pick up potential warning signs of more serious illnesses as early as possible. This relies on the collection of relevant data.1 In the US, healthcare expenses now total 17.6% of GDP, an estimated $600 billion more than would be anticipated for a country of its size.1 There is more incentive than ever for healthcare professionals to share data and the resulting evidence pool to be used to optimize diagnoses,1 but how exactly can data be used to optimize diagnoses?

Why use big data to support diagnosis?

There are several ways in which digital health solutions are aiming to improve healthcare for patients – one such way is to improve the accuracy with which healthcare professionals are able to diagnose certain conditions. In the US, it is predicted that errors in disease diagnosis range from 5–15% of all diagnoses, and for some diseases, the figure may be as high as 97%.2 The World Health Organization (WHO) lists over 70,000 diseases, meaning that healthcare professionals are faced with a daunting list of possibilities to identify and rule out when making diagnostic decisions.2

To help healthcare professionals navigate through this multitude of options, objective data and sophisticated machine learning models and algorithms, may be trained using large data sets, compiled of previous patient cases and available clinical data, to serve as a bank of clinical decision-making power. These computational models may be able to sift through vast quantities of information far quicker than a human and may prove to be a valuable means of improving the accuracy of differential diagnosis.2

     As noted by Dr Markus Wenzel, “you do not search for methods to replicate and artificially create intelligence. Instead, you look for ways to teach computers to solve a cognitive task as quickly as possible and do at least as well as a human being”.3 Put simply, data used in this way should help to free up healthcare professional time to focus on other responsibilities and tasks.3

Effective data mining and analysis, as aided by these data sets and the predictive algorithms that are trained to utilise them, may also allow a level of ‘in silico’ research (conducted by means of computer simulation) and real-time diagnostic recommendations, based on trends and insights that can be drawn only from large data sets.4 In this way, it is hoped that the collective experience of diagnosing and treating many patients may be used to support individualized patient care.4

Examples of data use for diagnosis and beyond

Diagnoses made through the visual inspection of clinical images may benefit most from the support of large data sets. Indeed, those areas where data appears in a consistent format are the ones most suited to computational models. Radiology, pathology and dermatology have thus been the chief focus for predictive algorithms to date.3 Things become more challenging when a model is asked to utilize data from multiple sources.3

One example comes from ophthalmologists diagnosing diabetic retinopathy. The estimated error rate for diagnosis of this condition is 49%,5 meaning that ophthalmologists are just as likely to misdiagnose their patients as they are to accurately diagnose them. Large data sets have been used with success to improve diagnosis rates by using an algorithm to make predictions based on previous cases. Interestingly, this data-driven method was significantly better at diagnosing diabetic retinopathy versus ophthalmologist interpretive diagnosis, but was less accurate than diagnosis by a retinal specialist; suggesting that the combination of clinical knowledge and predictive algorithms may be more powerful than each in isolation.5 Additionally, predictive models trained to analyze imaging scans (Zebra Medical Vision, based in Shefayim, Israel) may serve as an additional support tool for radiologists when making diagnoses based on clinical images.6

Another example of using data for diagnoses comes from Buoy Health (Boston, Massachusetts), a data-driven chatbot (an AI-driven computer programme that simulates human conversation) that listens to patient symptoms and offers advice on potential care.6 In the field of respiratory care, big data and predictive models may have a role to play in the evaluation of lung function. Fluidda are a company specializing in functional respiratory imaging who have developed a tool to assess lung images and help pulmonologists visualize lung structure and functionality. In doing so, Fluidda’s goal is to optimize diagnosis and monitoring disease progression.7,8

In the future, the hope is that these applications can be extended to not only support diagnosis, but also support prognosis estimates. One ongoing project is currently looking at the ways in which an app can be used to monitor the motor and non-motor symptoms that accompany Parkinson’s disease. Remote monitoring of collected data, including voice or speech patterns for example, could serve as indicators of early-stage Parkinson’s. Detection of subtle changes in this way, may support monitoring by healthcare professionals and give them a clearer picture of disease life cycle and progression.3

A moral challenge

An important consideration is the moral implications of decision-making or diagnosis ‘ownership’ that may be afforded by using big data and predictive analytics. For example, should a doctor rely on a computer and its data to make a diagnosis that goes beyond their own assessment? They may be inclined to take greater risks, on the premise that the computer will be accountable for any misdiagnoses or medical complications.9On the other hand, doctors may choose to ignore well-founded, evidence-based recommendations from big data sets because they believe they know better. Healthcare professionals may also be at risk of becoming detached or indifferent to their healthcare decisions should they not feel like an integral part of the decision-making process.9

Out-sourcing important decisions making tasks may blur the lines of accountability and must be appropriately addressed so as to ensure that healthcare professionals are able to work in harmony with predictive analytics and other digital health solutions. That being said, the computational power of data-driven diagnoses promises big things for patient care.

Discover more about digital health and the potential of big data in our articles on AI and diagnosis and the value of big data. Sign up for our monthly updates too here.


  1. Lebied M. 12 examples of big data analytics in healthcare that can save people. 2018. Available at: [Accessed August 2020].
  2. Brown T. Pros and Cons of artificial intelligence in healthcare. 2018. Available at: [Accessed August 2020].
  3. Medica magazine. Diagnosing disease with big data. 2018. Available at: [Accessed August 2020].
  4. Dilsizian SE and Siegel EL. Current Cardiology Reports 2014; 16: 441.
  5. Ives J. Study shows how AI can improve physician’s diagnostic accuracy. Available at: [Accessed August 2020].
  6. Daley S. Surgical robots, new medicines and better care: 32 examples of AI in healthcare. 2020. Available at: [Accessed August 2020].
  7. Sanyal S. 4 ways in which AI is revolutionizing respiratory care. 2018. Available at: [Accessed August 2020].
  8. Available at: [Accessed August 2020].
  9. Deloitte Insights. Predictive analytics in healthcare: emerging value and risks. 2019. Available at: [Accessed August 2020].

November 2020 RESP-42184

Stay informed

Stay informed with our monthly updates which contain the latest information on the future of connected respiratory and healthcare innovation.