whitepaper 3 cover


Big data—it’s what everyone in health and tech circles seems to be talking about at the moment. In fact, as early as 2014 according to the Gartner hype cycle (a visual representation of the maturity and expectations of new technology innovations), business in general was past the peak of expectation and rapidly moving towards disillusionment with the whole idea before it had truly made a mark.1

More recently in 2016, big data as a term was no longer included as one to watch, while other related concepts such as artificial intelligence and machine learning were taking precedence.2

Still, tech entrepreneurs, healthcare providers and patients are all being wooed with the potential for improved diagnosis and personalized treatment promised by improved data collection and analysis. But contrast this with scare stories in the news about pervasive data collection and analysis and it is clear there is still a lot to learn and even more that needs to be communicated on the topic.

Does healthcare as an industry—or even individual healthcare professionals—really understand what big data is and how it can be used to improve the lives of patients? And if so, is that message reaching those who need to hear it?

While most of the chatter is around the possibilities and, to a lesser extent the concerns, practical implementation is the real challenge. Data must be analyzed and presented in a format that is valuable and useful to healthcare professionals in their interactions with patients.

Interventions which overburden providers or patients with time-consuming data collection and analysis, offer little benefit and can take up time and resources better used elsewhere. Presently, the infrastructure needed for data collection, integration, analysis, and transfer isn’t yet where we need it to be. But the first steps are being made and we can see a way forward to achieving some of the grander promises of the big data evangelists.

To this end we first need to identify how we use the small data that are currently available to us. Electronic health records (EHRs) and other systems collect and store large amounts of data, alongside the personal healthcare and well-being apps that patients increasingly use in their day-to-day lives. But most of what is collected remains siloed, beyond true integration and the analysis available through big data tools.

We can’t move into the big data arena until we understand where implementation of new initiatives will have the most impact and most importantly, how best to use the data that are already available to us.

Throughout this paper we want to look at how we understand big data in the healthcare industry and the steps we can take to begin integrating it into care in a way that produces tangible benefits for healthcare professionals, patients, and providers.

“It’s not just how much data is produced, it’s also how many varying types of data we have now, both structured & unstructured, arriving in real time.” – Christina M. Busmalis, Director, Global Life Sciences, IBM Watson Health

_what does “big data” really mean?

At its simplest big data is used to refer to large volumes of data, gathered from disparate sources such as social media, internet-enabled devices (including smartphones and tablets), machine data, video and voice recordings, along with other structured and unstructured data.3 But big data as a term can be slightly misleading as it encompasses more than just scale and it can be difficult to find a simple, concise definition that is easily understood by different audiences. IBM defines the concept as follows:

“Big Data is a way of harvesting raw data from multiple, disparate data sources, storing the data for use by analytics programs, and using the raw data to derive value (meaning) from the data in a whole new way.”4

Definitions like this incorporate the idea that big data is generally considered as a whole with the methods by which it is analyzed and the tools used to collect and store it.

Most commonly, big data is defined and discussed in terms of the “four Vs”:

  • volume – the quantity of generated and stored data
  • variety – the type and nature of the data
  • velocity – the speed at which the data is generated and processed
  • veracity – the quality of the data captured.3,5

So at its heart “big” data describes large volumes of data (scales at which processing and analysis requires computer systems) gathered from a wide range of sources which can be generated and processed incredibly quickly and which are inseparable from the technology used to analyze them.

“Sometimes it’s an arbitrary distinction; a lot of new algorithmic approaches can be applied to data in general, either big or small.” – Lena Granovsky PhD, Director, Analytics and Big Data, Teva Pharmaceuticals

To complicate the matter, within healthcare systems there is a slightly different emphasis in interpretation. Here, big data is commonly understood as referring to “electronic health data sets so large and complex that they are difficult (or impossible) to manage with traditional software or hardware”.6

Within the field, big data can often be thought of as a completely different entity to small data. It’s viewed as separate from what has come before, rather than an expansion and integration of data that we often collect already. While the main change is in the scale of data sets, our understanding of big data needs to include the integration of data from different sources which can then be assessed as a whole.

It’s clear that, despite leaders such as Ernst & Young and IBM defining “big data” consistently, the concept is still not widely and clearly understood by those outside of the technology arena and the term is often misused by the public.

In healthcare, alternative terms linked to “big data” such as e-health, digital health and e-medicine further blur the lines between the data and the technology used to collect it.5 Where understanding of a term is incomplete or inconsistent, it can be difficult to implement interventions based on the concept and concerns are more likely to be raised. So to realize the potential of big data, we first need to improve understanding of the idea and articulate the benefits it is likely to deliver.

_what can”big data” do for us?

Data has always been a driving force in healthcare, from patient records to clinical trial outcomes and epidemiology, but both the quantity and variety of data collected is continually increasing. And with the proliferation of wearable health monitors, at-home genetic testing and other technological advances, the sources of data available for collection and analysis are becoming much more diverse.7 At the same time, patients are becoming more aware of the data being collected from them and feel increasingly empowered to take greater ownership of both their health and their personal data.7

This leads us to a position where patients are happy to provide data and for it to be collected and analyzed, as long as they are confident that they will receive a tangible benefit from doing so. For instance, personal health monitors such as Fitbit allow patients to track everything from the number of steps they take each day to their sleep patterns and let them share these data in a variety of ways, including online forums and leader boards. It is up to healthcare providers and the pharmaceutical and tech industries to work together to ensure that big data initiatives lead to real improvements in care that justify this trust.

“Distilling big data into personal insights and delivering that with a human touch.” – Max Luthy, Director of Trends and Insights, TrendWatching

To date, prominent examples of big data interventions in the healthcare space have involved increasing efficiency or reducing costs within healthcare systems and clinical research. For example, a prominent partnership between AstraZeneca and HealthCore focused on conducting real-world studies designed to identify how to treat disease both effectively and economically8, while an agreement with PatientsLikeMe in 2015 gave AstraZeneca access to a massive source of patient-reported outcomes data.9 Both of these partnerships look at combining traditional clinical trial data with alternative sources to determine which types of therapy were most valued, and to guide R&D investment decisions.

At the interface between small and big data, new approaches to the design, conduct,recruitment, and retention of clinical trials and in new approaches to drug development and repurposing are all being trialed. Data analysis is seen as key to solving some of the existing challenges and reducing the costs of currently expensive and ineffective processes. Alongside this, as mentioned in our previous whitepaper,10 new data analysis techniques and algorithms offer the potential to identify and provide evidence for new outcome measures, both within clinical trials and as proxies for disease progression and outcomes in wider care. Within Teva, for example, work is being done with alternative measures for pain that avoid some of the subjectivity of patient recall questionnaires and existing scales.11

“Anyone coming up with big data solutions needs to look beyond their own industry.” – Max Luthy, Director of Trends and Insights, TrendWatching

In the public realm, big data is often associated or linked with crowd sourcing, a technique seeing increasing use in public health initiatives. For instance, one device and mobile app aims to use crowdsourcing to monitor and visualize air pollution across the world in real time, an initiative which could have real implications for those living with respiratory disease.12

One of the main challenges to increased use of big data is the lack of transferable data and integrated systems. Data are siloed within EHRs, individual apps, pharma trial databases, and other sources and even at individual healthcare sites in a single computer system. Pharma and tech companies alike are used to the idea of owning the data they collect, and resistance to the concept of open data can be substantial in an industry that is reluctant to change.

But this is a problem that is becoming recognized, with tentative steps being made to improve the situation especially within electronic health records and the data they are designed to capture.13

“Applying machine learning & algorithms to health should give us the ability to uncover hidden signals, better estimate & predict disease progression, optimize treatment & better fit the right medicine to the right person.” – Lena Granovsky PhD, Director, Analytics and Big Data, Teva Pharmaceuticals

Looking forward, when it comes to healthcare, the main areas of focus for big data in the future are in predictive modeling, supporting clinical decision making, and disease and risk monitoring.14

Developments in data collection and analysis techniques are providing new opportunities to change the way we think about care, from increasingly open data sets fueling drug discovery and clinical trials, to online diagnostic tools and improved prediction and prevention.7

But we are a long way from achieving the full promise of technology and data analysis and many still have doubts about the value and validity of some of these new technologies.

_recognizing the challenges

While optimism about the future is encouraging, any implementation of big data techniques needs to balance the excitement and enthusiasm for disruption that is common in the tech world, with the risks and challenges both general and specific to healthcare.

Data privacy and the ethics surrounding collection, retention, and analysis of data are increasingly in the public eye. With the recent Cambridge Analytica scandal, Mark Zuckerberg’s stand before congress, and new laws governing data protection in both the US and EU, there is a temptation to shy away from big data analysis. But perhaps we should see these developments as positive: people are becoming more aware of widespread data collection and analysis, but they are also becoming more invested in knowing how their data is being used and the benefits that can accrue in return.

Where they see a societal or personal good as a potential result, people are often willing to provide data for analysis. In healthcare, this results in patients who are more willing to actively be involved in generating the data, and greater interest in what happens to them, not only in making sure they are used responsibly, but that the benefits for them (and others) are more apparent. This in turn may even make it easier for the industry to engage individuals who previously have been passive and uninterested in what happened to their data, and to engage them more in their overall health management.

There are understandable concerns regarding the use of big data within healthcare: some consider the ethics of risk prediction related to healthcare and insurance coverage,15 while others focus on the pure logistics of handling and managing the volumes of data required. In the first instance, when does risk prediction begin to impact on a person’s ability to obtain reasonably priced insurance?

As insurance companies gain access to greater volumes of population data and more powerful predictive analysis tools, some are worried that risk-sharing, the traditional method on which insurance is based, will become less and less relevant. Instead, companies will more precisely target those at higher risk with increasingly higher premiums.15

While this is a challenge predominantly for policy and lawmakers, other concerns are more mundane: many—if not most—healthcare systems, simply aren’t set up to handle the volumes and immediacy of data required by big data interventions and the tools for analysis are still being developed. Even Google, a name that many would immediately associate with data processing and optimized algorithms, has struggled when it comes to big data in healthcare. The company’s much hyped ‘Flu Trends’ campaign has yet to deliver on its stated goal of predicting flu outbreaks through the analysis of region-specific search queries.16 While the initiative didn’t have the hoped for effect, it did point the way towards a method of integrating search query data with more traditional sources to improve outbreak predictions.17

“In healthcare we’ve been looking at large data sets for years, big data is about being able to more quickly draw insights from that information.” – Christina M. Busmalis, Director, Global Life Sciences, IBM Watson Health

Two things are constantly in short supply within healthcare systems across the globe: funding and time. While big data promises that it can solve both of these problems through increased efficiency, to date achievements have been limited. Given the example of EHRs, perhaps healthcare professionals are justified in their concerns, particularly when it comes to burdens on their time. Where the collection and collation of patient data should have led to efficiencies and streamlining of care it has instead often led to frustrating consultations, fragmented systems and healthcare professionals who feel that they spend all of their time in front of a computer screen rather than speaking to their patients.18,19 With outcomes like this, who can blame many healthcare professionals for becoming wary of the often-outsized promises made by big data enthusiasts?

Alongside worries about increased administrative burden is the potential for a lack of focus on the patient-physician interaction, removing the “human” factor. Patients want to feel that they are being treated as individuals, not as a set of categories, and healthcare professionals need to know that their professional expertise is valued, that they are not being replaced by a set of treatment algorithms following a digital patient profile. The key is to reach a situation of augmented healthcare and supported decision-making.

No one wants to feel like their doctor is being replaced by big data.20 Do we need to reframe how we talk about big data interventions in healthcare? Perhaps we need to move from emphasizing the potential future benefits to highlighting the possibilities offered by trialing and testing new options. There have been failures of big data and interventions that have yet to fulfil their promise, but we don’t write off all clinical trials or new therapies in an area because one widely anticipated therapy fails to meet its outcomes in phase 3. We need to learn to be similarly pragmatic about big data initiatives. As consumers, we often expect more from data than we do from human interventions, when actually we are reliant on the data we have and the limits imposed by them.

Realistically, for the average patient or healthcare professional, the grand potential of big data is a long way off. So for now, it makes sense to refocus our attention on how we collect and handle “small” data to make the most of the systems we already have before we try to jump fully into the world of big data.

“We need to get to the stage where we can spend more time on the analysis and less on the preparation and collection of data.” – Christina M. Busmalis, Director, Global Life Sciences, IBM Watson Health

_making the most of small data

Small data is something that all of us — healthcare professionals, healthcare systems, patients, and consumers alike—are used to dealing with every day. In the world of healthcare, it’s the patient’s history, the efficacy measures from a clinical trial, the changes in FEV1 from one visit to the next, everything we use to make decisions day-to-day.

At its heart, small data is frequently about an individual and the changes they experience over time. Because of its scale, small data is used to find specific insights and answer predetermined questions. If big data is intimidating, small data may be more familiar and more palatable. But there are still plenty of questions to answer about how and when we use small data in healthcare, including in respiratory care:

  • Are we handling the small data we do collect optimally?
  • Is it being used and interpreted in the right way to answer the questions we have?
  • What other small data should we be collecting to improve care?
  • How can new technologies aid us in collecting useful data?
  • Would longitudinal data show a different picture from “snap-shots” and how can it be collected in an efficient manner?
  • Are we currently asking the right questions when it comes to analyzing the data we do collect?

Small data has many of the characteristics and potential benefits of big data, just on a smaller, often individual, scale. Importantly, small data can be collected from many of the same sources as big data and can still provide insights. For some, small data means the opportunity to gain insights at the level of a single patient, then generalizing to wider populations.

“Physicians and providers want access to as much data as possible to help them make the right decisions [for patients].” – Jeffrey Dunn, Vice President, Clinical Strategy and Programs and Industry Relations, Magellan Rx Management

While data from a single device or time point may be limited, providing an incomplete picture of a person, trend or demographic, it can still provide complementary insights to guide care. In an earlier whitepaper issue we looked at some of the potential opportunities for collecting and integrating new sources of healthcare data through collaboration between the healthcare and tech industries.21

Yet even with small data, there are challenges to collecting and analyzing data in ways that produce meaningful insights and actionable reports.

For instance, while the use of personal health monitors is increasing, cost considerations are still limiting the uptake of new technologies that could supplement the data we already collect. In addition, systems are not well set up to collect most types of unstructured data, leading to potential gaps in records and analysis.

While, there are an increasing number of healthcare apps looking to at least partially fill these gaps,21 data are still generally compartmentalized and fragmented rather than integrated and shared across systems.

“It’s not about big data vs small data, it’s about mining the data we have and using it more effectively to make it meaningful.” – Lena Granovsky PhD, Director, Analytics and Big Data, Teva Pharmaceuticals

Within respiratory care as an example, multiple companies are investigating the use of electronic inhalers, reminder systems and mobile and web-based apps to improve adherence to medication in asthma and COPD.22 The test of these technologies will be how well they can integrate adherence and usage data with patient-reported outcomes and combine environmental and other data to empower users, encourage behavior change, inform care and improve outcomes at the individual level.

From this baseline, we can then think about scaling up such technologies to better inform care across populations, using big data to provide small insights.

At its center, the key to making the most of data—big or small—is making sure that the questions being asked are valid, that the data being collected will answer those questions, is actionable and can lead to meaningful changes in behavior or treatment.

“It’s about integrating, not just the clinical data about a person, but also all of the other data about their life to create a personalized healthcare plan.” – Christina M. Busmalis, Director, Global Life Sciences, IBM Watson Health

_making the move from small to big data

It’s impossible to talk seriously about the long-term potential of widespread big data integration without examining the short-term steps we need to take to make implementation possible. Going beyond the need to make sure that we are making optimal use of the small data we already collect and analyze, now we need to look at how best to move forward from this into expanding systems ready to handle big data.

Looking too far forward brings out the cynicism in many people, giving those resistant to the concept free rein and running the risk of overpromising and not meeting expectations. The leap forward to reach the ideal of the future misses out the intermediate steps, so it is easier for those resistant to change to say that things simply cannot be done. To set more realistic goals, we need to know what the incremental steps are and how to implement those changes.

So, what changes can we start making now so that healthcare systems will be ready for big data in the future?

“Big data in healthcare is a legitimate goal. At the moment we’re still in the discussion phase rather than the implementation phase.” – Jeffrey Dunn, Vice President, Clinical Strategy and Programs and Industry Relations, Magellan Rx Management

Further education around big data, what it is and what it means for healthcare can help both healthcare providers and the general public become more open to the concept. We have seen from the consumer space that, as long as people know that they will receive something of benefit in return, they are generally happy for personal data to be collected. As new laws come into place and transparency around what data is being collected and how it will be used increases, we can only expect this to improve. At the other end of the scale, the big data interventions need to be designed and implemented so that they provide useful guidance and insights to healthcare professionals and providers. Overloading systems and individuals with complex or extensive readouts that take up more time to understand and implement than they save are of no use and systems that demand time and effort to input data inefficiently will simply be ignored.

Most important for implementation and growth is making sure that the data we collect is accessible and transferable. Companies are stuck in a mindset based on proprietary software and formats, unwilling or unable to share data for analysis. While this is beginning to change, more still needs to be done. Open, accessible data in consistently readable formats needs to become the norm so that data from multiple sources can be integrated and form the most complete picture possible to inform care.

Alongside this, to alleviate privacy concerns and increasingly to comply with international laws such as General Data Protection Regulation (GDPR) in the EU and equivalents in the US, it is essential to specify why data are being collected. Informed consent is as important in healthcare technology and data collection as it is in a clinical trial and people must be made aware in advance if their data will be shared and analyzed. When asked, patients react positively to the idea of sharing anonymized health data, but do still raise concerns about lack of transparency and awareness of how data will be used.23 This is an area in which transparency and honesty can go a long way in improving trust.

“In the US, payers are going to be  crucial [to big data] as they simply have access to more data than anyone else.” – Jeffrey Dunn, Vice President, Clinical Strategy and Programs and Industry Relations, Magellan Rx Management

Finally, we need to understand the limitations of data and how we complement it with human insight and decision-making. For the foreseeable future, data big or small cannot tell us the questions we need to ask, they can only point to situations where change is needed and help inform us to guide those changes. Big data is not a replacement for healthcare professionals; it is a tool that can support them in treating their patients. Producing interventions and solutions that will achieve this will require continued collaboration and innovation between healthcare, tech and pharma, looking beyond any single industry, starting with small data and working forwards.


While big data analytics hold huge promise for the future of healthcare, the first step is to re-evaluate how we use and integrate the small data that we already have access to. From here, we can build the structures needed to manage larger data sets from more disparate sources and the means of integrating, accessing, and sharing data across systems. But this will take a change in mindset, moving beyond ownership of small, discrete data sets and towards shared access for analysis by different parties.


1.Gartner’s 2014 Hype Cycle for Emerging Technologies Maps the Journey to Digital Business. Gartner.com 2014. Available at: https://www.gartner.com/doc/2816917/gartners-hype-cycle-special-report. Accessed: July 2018.
2. Top Trends in the Gartner Hype Cycle for Emerging Technologies, 2017. Gartner.com 2017. Available at: https://www.gartner.com/smarterwithgartner/top-trends-in-the-gartner-hype-cycle-for-emergingtechnologies-2017/. Accessed: July 2018.
3. Big Data: Changing the way businesses compete and operate by EY. Available at: http://www.ey.com/Publication/vwLUAssets/EY_-_Big_data:_changing_the_way_businesses_operate/%24FILE/EY-Insightson-GRC-Big-data.pdf. Accessed: July 2018.
4. What is big data? More than volume, velocity and variety… Available at: https://developer.ibm.com/dwblog/2017/what-is-big-data-insight/. Accessed: July 2018.
5. European Commission. Study on Big Data in Public Health, Telemedicine and Healthcare. Final Report. 2016.
6. Raghupathi W and Raghupathi V. Health Information Science and Systems 2014, 2:3.
7. Stanford Medicine 2017 Health Trends Report. Harnessing the Power of Data in Health. Available at: https://med.stanford.edu/content/dam/sm/sm-news/documents/StanfordMedicineHealthTrendsWhitePaper2017.pdf. Accessed: July 2018.
8. Businesswire 2011. AstraZeneca and HealthCore Announce Real-World Evidence Data Collaboration in the U.S. Available at: https://www.businesswire.com/news/home/20110202006102/en/AstraZeneca-HealthCore-Announce-Real-World-Evidence-Data-Collaboration. Accessed: July 2018.
9. AstraZeneca 2015. AstraZeneca and PatientsLikeMe announce global research collaboration. Available at: https://www.astrazeneca.com/media-centre/press-releases/2015/astrazeneca-patientslikeme-research-oncology-diabeteslupus-respiratory-disease-13042015.http. Accessed: July 2018.
10. Teva Pharmaceuticals 2018. Redefining value in healthcare. Available at: http://respiratorycarev2.com/whitepaper/. Accessed: July 2018.
11. Teva data on file.
12. Buddhika D. Medium 2017. Air Pollution Monitoring Through Crowdsourcing. Available at: https://medium.com/codeinsights/air-pollution-monitoring-throughcrowdsourcing-5701f0549340 1/. Accessed: July 2018.
13. Kaiser Permanente 2010. Kaiser Permanente Completes Electronic Health Record Implementation. Available at: https://share.kaiserpermanente.org/article/kaiser-permanente-completes-electronic-healthrecord-implementation/. Accessed: July 2018.
14. Lee CH, Yoon H-J. Medical Big Data: Promises And Challenges. Kidney Res Clin Pract 2017; 36:3–11.
15. Bloomberg 2017. Big Data Is Coming to Take Your Health Insurance. Available at: https://www.bloomberg.com/view/articles/2017-08-04/big-data-is-coming-totake-your-health-insurance. Accessed: July 2018.
16. Wired 2015. What We Can Learn From the Epic Failure of Google Flu Trends. Available at: https://www.wired.com/2015/10/can-learn-epic-failuregoogle-flu-trends/. Accessed: July 2018.
17. Davidson MW, Haim DA, Radin JM. Using Networks to Combine ‘‘Big Data’’ and Traditional Surveillance to Improve Influenza Predictions. Sci. Rep. 2015;5:8154; DOI:10.1038/srep08154.
18. Sinsky C. Allocation of Physician Time in Ambulatory Practice: A Time and Motion Study in 4 Specialties. Ann Intern Med. 2016;165(11):753-760.
19. Arndt BG, et al. Tethered to the EHR: primary care physician workload assessment using EHR event log data and timemotion observations. Ann Fam Med 2017;15:419-426.
20. Forbes 2017. Will Big Data Replace Your Doctor? Available at: https://www.forbes.com/sites/peterlipson/2017/01/16/will-big-data-replace-yourdoctor/#1a0ffb1f2f72. Accessed: July 2018.
21. Teva Pharmaceuticals 2017. Technology and Healthcare: a call for collaboration. Available at: http://respiratorycarev2.com/whitepaper/. Accessed: July 2018.
22. Hui CY, et al. The Use Of Mobile Applications To Support Self-Management For People With Asthma: a systematic review of controlled studies to identify features associated with clinical effectiveness and adherence. Journal of the American Medical Informatics Association 2017;24(3):619–632.
23. Spencer K et al. Patient Perspectives On Sharing Anonymised Personal Health Data Using A Digital System For Dynamic Consent And Research Feedback: a qualitative study. Journal of Medical Internet Research 2016;18(4):e66. ISSN 1438-8871. 

September 2018 RESP-41796

Stay informed

Stay informed with our monthly updates which contain the latest information on the future of connected respiratory and healthcare innovation.