Big data will have a growing role in nursing scholarship requiring parallel growth in data competencies and literacies to benefit nursing practice, education, and policy. This article offers background information about big data and bibliometric analysis, and describes a study that analyzed published research focused on big data in nursing to evaluate trends and understand the current evidence on this topic. Scopus was used as the primary database to retrieve published articles. The study results and subsequent citation analysis of 64 articles from 2014 to 2020 determined the types and extent of literature on big data in nursing, which authors and journals publish big data research, and the origins of the studies that were reviewed. The majority of the articles included offered descriptive or review type scholarly contributions with a clinical practice focus and/or application. Our discussion suggests future directions for research, content about nurse involvement in big data important for nursing curricula, implications for policy action based on evidence, and our assessment of the state of the science. This analysis demonstrated strong interest in big data approaches in nursing literature but little application of these approaches was identified in the represented articles.
Key Words: big data, informatics, nursing, technology, bibliometric analysis
Big data will have a growing role in nursing scholarship requiring parallel growth in data competencies and literacies to benefit nursing practice, education, and policy. Big data will have a growing role in nursing scholarship requiring parallel growth in data competencies and literacies to benefit nursing practice, education, and policy. In an effort to focus on innovations and approaches surrounding big data in nursing, this study was undertaken to examine the current evidence describing the results of scholarly publications written about the topic. The Scopus interdisciplinary database (Elsevier BV, Amsterdam, Netherlands) was searched for literature specific to articles with the loose phrase “big data” and “nurs*”. The interdisciplinary nature of the Scopus database allowed the collection of nursing-specific as well as nursing-related articles using the phrase “big data.” A bibliometric analysis of the retrieved publications was conducted. This analysis determined the types and extent of articles pertaining to big data presented in nursing literature, the study origins, and details about subsequent citations.
Background
Recent technological advances have revolutionized the ways in which patient and healthcare data can be collected, stored, and analyzed. With increased digital and computing technologies at hand, the capability and capacity to collect and analyze data have increased dramatically. These advancements have resulted in massive quantities of data, one of the attributes of “big data,” that cannot be assessed or explored in absence of other technologies and innovations. Big data has changed how we investigate and evaluate information and how we interpret meaning (Warren, 2017). Big data is relevant to nurses due to the potential implications for practice, education, research, and policy.
Big data has changed how we investigate and evaluate information and how we interpret meaningBig data, often referred to as data science or data analytics, offers an approach to explore broad topics using computational methods to identify patterns and trends. The term “big data” describes a volume of data that exceeds human comprehension and is unmanageable by standard computing systems. Big data is becoming more widespread in nursing and healthcare settings. Data scientists are constantly developing knowledge and specific methods to manage these data.
In addition to volume, data are characterized by four additional aspects: 1) velocity, 2) variety, 3) veracity, and 4) value. Volume refers to the amount of data being collected and stored. Velocity describes the rate at which data are generated or collected. Variety considers that data are coming from multiple places/points simultaneously. Variety also recognizes that the retrieved data are in different forms, such as unstructured and structured. Veracity refers to the completeness or uncertainty of the data. Last, value refers to the purpose of collecting the data and should always be considered with any data-related project (Brennan & Bakken, 2015; Topaz & Pruinelli, 2017).
The term “big data” describes a volume of data that exceeds human comprehension and is unmanageable by standard computing systems.Changes in policy, such as The Health Information Technology for Economics and Clinical Health (HITECH) Act (2009), have helped to promote and expand the use of electronic health records (EHRs) in clinical settings. EHRs have become a valuable data source for many nurses in practice and research. Use of such existing data offers many possibilities to clinicians and researchers, including but not limited to improving quality outcomes, safety, and predictive modeling. (Hardy & Bourne, 2017).
Big data driven research typically starts with a question...Big data driven research typically starts with a question, usually research (e.g., a PICOT question) or a hypothesis. These are research tools familiar to nurses and nurse scientists. Methods used to address big data research questions can be helpful in either primary or secondary data analyses. After a research question is identified, data are collected. It is not uncommon for big data to be located in different sources in multiple formats, and with varying degrees of interoperability. Once retrieved, data must be cleaned and an assessment of data integrity completed before data exploration can begin (Brennan & Bakken, 2015).
Brennan and Bakken (2015) identified four elements common to big data inquiries:
1) Highly disseminated data with original data owner in control of dataset;
2) Security procedures related to the data;
3) Involved stakeholders (investigators and collaborators) that reveal their methods and results; and
4) Accelerating insights that can be reused with data generated from many strategies.
It is not uncommon for big data to be located in different sources in multiple formats, and with varying degrees of interoperability.There are several examples of categories of big data in healthcare. For example, social media and web-based content is a form of big data. These types of data can be extracted from sources such as Twitter, Facebook, LinkedIn, and other social media platforms. Machine to machine examples of big data include uploads or sensor data. Billing or claims data are considered big transaction type data. Examples of biometric sources of big data may include vital signs, radiology records, fingerprints, and genetic information. Human-generated sources of big data may be in the form of electronic health records, physician notes, or paper documents. All of the sources of data listed above may be found in a structured, semi-structured, or unstructured form. (Institute for Health Technology Transformation, 2013).
Bibliometric Analysis
Limited information exists about the expanding subject of big data in nursing. Therefore, we conducted a bibliometric analysis to learn more about the landscape of the published literature in nursing pertaining to this topic. Bibliometrics can be used for a variety of reasons, including assembling information about a field of study or a specific topic; exploring new developments in a field; assessing research productivity; and gaining information about the uptake of information globally. Bibliometrics offers a quantitative analysis of bibliographic information and an approach to examine trends in established or emergent bodies of literature.
Bibliometrics offers a quantitative analysis of bibliographic information and an approach to examine trends...Bibliometric analysis differs from other review types commonly seen in the nursing literature such as systematic and integrative reviews. Bibliometric approaches are rooted in library sciences and were originally designed to inform librarians’ selections for texts and resources collections (Gingras, 2016). This approach illuminates how scientific and evidence-based information is generated and communicated through and among disciplines. Critical appraisal and syntheses of the literature included in this dataset, which may be included in other standard reviews, are not included in this article. For example, Carter-Templeton, Frazier, Wu, & Wyatt (2018) explored robotics in nursing research literature using bibliometric techniques. Nicoll et al. (2018) used bibliometric techniques to analyze publication patterns of 81 articles found in a virtual journal.
Methods
The initial step in our bibliometric analysis was assembling a thorough list (Kokol, Blazen, Vosner, & Saranto, 2014) of eligible articles to create the dataset. The four steps of the analysis procedure that we followed were:
Step 1: The project team consulted with a medical librarian for the literature search. Key concepts related to the study were identified: "Nursing” and "Big Data." The final search terms (“big data” and “nurs*”) were searched limited to keywords selected and associated with articles by the respective authors. By limiting the search to keywords within Scopus, the authors were able to identify publications in which big data was a central focus rather then a secondary or tertiary focus mentioned in passing in the abstracts or titles of publication.
Titles and abstracts were not searched for these terms. Scopus was used for the search as it indexes more journals than Web of Science. In addition, it has the strongest quarterly increase in the number of papers, citations, and H-index, which is a metric used to gauge productivity of an author and is calculated based on the number of publications and the citation counts for each of those works (Harzing & Alakangas, 2016). Scopus also reports higher citation levels and average numbers of papers across multiple disciplines (including humanities, social science, engineering, sciences, and life sciences) and consequently was more appropriate for searching nursing specific as well as nursing related literature.
Each article was reviewed separately. Metadata, including type of article, first author discipline, first author country of origin, first author state of origin, citations by year, subsequent types of publications, discipline of first author of subsequent citations, and diffusion of knowledge globally, were collected and exported to Excel (Microsoft, Inc., Redmond, WA, USA) for analysis. Additionally, we reviewed and analyzed impact factors of journals, which is a metric used to report the average frequency of citations in a given year, in which subsequent citations appeared. Separate spreadsheets were used to collect information for each article; these were later merged for further evaluation and assessment. A hand search of references in the collected articles was used to identify relevant articles which had not been tagged with the term “big data.” The search was limited through 2020, and yielded 64 articles.
Step 2: The project team compiled a comprehensive list of articles meeting inclusion criteria by reviewing titles and abstracts from the search results. All articles that were retrieved were retained for analysis.
Step 3: The project team conducted citation and content analysis using Microsoft Excel (Microsoft, Inc., Redmond, WA, USA) to coordinate sorting and descriptive data.
Step 4: Within the bibliometric analysis, the following measures were used: number of articles per year; article focus; most prolific journals; and most cited papers. From each article common elements of metadata (bibliographic data) were extracted. The extracted data were recorded in Excel. Additional analysis was performed by members of the team to assess the primary focus of the article (clinical practice [n = 49; 76.6%], education [n = 7; 11%), professional development [n = 3; 4.7%], or leadership/administration [n = 5; 7.8%]). Team members further classified articles as research reports, descriptive articles/reviews, QI/case studies, or statements as reported in Table 1 (Nicoll et al., 2018).
Table 1. Target Article Classification by Major Focus
| 
 | Research n (%) | Descriptive Article/Review n (%) | QI/Case Study n (%) | Statement n (%) | 
| Clinical Practice | 12 (18.8%) | 28 (43.8%) | 3 (4.7%) | 6 (9.4%) | 
| Education | 2 (3.1%) | 4 (6.3%) | 0 (0%) | 1 (1.6%) | 
| Professional Development | 1 (1.6%) | 1 (1.6%) | 0 (0%) | 1 (1.6%) | 
| Leadership/Administration | 2 (3.1%) | 1 (1.6%) | 0 (0%) | 2 (3.1%) | 
We assessed the number of citation counts found in the data retrieved for each article included in the dataset. To determine depth of impact of the articles, the second phase of this research explored subsequent citations for each article. Subsequent citations were followed also using Scopus. By researching subsequent citations, we could increase our understanding about how this literature has influenced other researchers and disciplines.
Results
Sixty-four articles were included in the final dataset. A descriptive bibliometric analysis of retrieved articles was performed to evaluate the developments and trends of publications related to big data in nursing. Articles in the dataset were published between 2014-2020, with the largest number (n = 17) of articles published in 2019. Based on Scopus categories, publication types in the dataset consisted of 45 articles (70.3%), 3 conference papers (4.6%), and 6 reviews (9.4%). Notes, letters, and short surveys (n = 6, 9.4%) were also present in the dataset.
Figure 1. Number of Articles Published by Year

Figure 2. Types of Documents Published

A nursing scholar was the first author in the majority of articles.The main disciplines represented were nursing (n = 55, 86%), medicine (n = 4, 6%), and engineering (n = 2, 3%). A nursing scholar was the first author in the majority of articles. First authors were from a range of countries that included: Australia, Canada, China, Italy, Poland, Slovenia, Sweden, Taiwan, the United Kingdom, and the United States (US). The majority of first authors were U.S. scholars (n = 49, 77%). Publications in the dataset represented 35 unique journals. Nursing Outlook (n = 7, 11%); CIN: Computers, Informatics, Nursing (n = 6, 9%); Nursing Administration Quarterly (n = 6, 9%); and Western Journal of Nursing Research (n = 6, 9%) had greatest number of publications in this dataset. Therefore, we assumed that these journals demonstrated the most interest in big data topics.
Figure 3. First Author Discipline

Figure 4. First Author (Geographic) Origin

These networks and connections may include authors, journals, topics, or citations.To analyze the foci, including the topics discussed in relationship to big data, a co-occurrence analysis of terms in publication titles was performed using VOSViewer (Leiden University, Leiden, Netherlands), a software program used to illustrate bibliometric networks and connections. These networks and connections may include authors, journals, topics, or citations. A visualization based on the terms was generated. A number of concepts associated with big data have been explored in the represented publications. Figure 5 illustrates the connections between big data and related terms or concepts. Text and circle size represent the number of times terms were used by authors in publication titles. The colored clusters illustrate closely related terms based on VOSviewer algorithm (van Eck & Waltman, 2010).
Figure 5. Co-occurring Terms in Titles

Subsequent Citation Analysis
From the 64 articles included in this sample, few (n = 9, 14%) had zero subsequent citations. The remaining articles were cited at least one time and one was cited 100 times since publication (Brennan & Bakken, 2015). During this phase of the study, we identified each original source article’s persistence (the rate of subsequent citations over time); reach (defined as geographic distribution by country and state [if originating in the United States]); and dissemination (represented by first author of subsequent citation’s discipline) (Nicoll et al., 2018). A total of 562 subsequent citations had referenced articles from the original source article dataset. However, some references had incomplete or missing data elements. Subsequent citations began in the year 2015 with 5 and continued through this 2021, with 100 subsequent citations already in May 2021 (See Figure 6).
Figure 6. Subsequent Citations By Year

The majority of the subsequent citations had first authors in nursing (n = 300; 57%), followed by medicine (n = 77, 15%). These data demonstrate that many within nursing have consumed and referred to scholarly literature in the discipline (See Figure 7).
Figure 7. First Author Discipline of Subsequent Citations

Dissemination-related analysis revealed that more than 50% of subsequent publications were first authored by someone within the United States (n = 292; 52%) as shown in Figure 8. Additional analysis (see Figure 9) revealed that the majority of these subsequent citations with first authors from the United States were from Minnesota (n = 36; 12.3%).
Figure 8. First Author Origin in Subsequent Citations

Figure 9. First Author Origin of Subsequent Citations within United States

Discussion
Overall, it is evident that big data is being discussed among nurse scholars in multiple realms in nursing such as education, practice, administration, and professional development. However, the majority of scholarly works in nursing on big data have focused on the practice setting with a number from workgroups articulating their stance or statement on the topic. There is emphasis on describing the concept of big data and offering advice to others about how it might be used to maximize patient care. Limited research or descriptions of applications of big data have been shared through written scholarly works in nursing.
Overall, it is evident that big data is being discussed among nurse scholars in multiple realms in nursing...The results of this analysis indicate a recognition of the value of big data among nurse scholars to achieve the core missions of nursing. However, the lack of application of big data and computational approaches in the articles represented indicates a need for increased emphasis on data competencies. Executing big data analytics frequently requires a multidisciplinary skill set that includes statistics, programming, and informatics (Murphy, Goossen, & Weber, 2017). Integration of these competencies into nursing research is essential to grow this aspect of nursing scholarship.
Competencies may include a range of specific approaches and skills. For example, text mining, or the process of taking unstructured text and converting it to meaningful structured text through the use of natural language processing, which helps machines read text, may be used. This is a common approach to collections and wrangling large corpora of text data such as patient records or practitioner notations. Patterns and trends may be more readily identified using text mining and natural language processing. This approach may also be used to create machine readable text files for transcribed interviews or oral recordings. Natural language processing may be valuable in identifying patterns in large volumes of non-standardized text data via automated algorithms.
Nurse educators will need to respond as they work to prepare nurses for the future.Artificial intelligence may assist in clinical design-making through automated image recognition, medication tracking, or patient biometrics. Nurse educators will need to respond as they work to prepare nurses for the future. It is essential to equip nursing students with knowledge and skills related to big data (Barton, 2016) and to prepare data scientists ready for our workforce (Warren, Clancey, Delaney, & Weaver, 2017).
In addition to technical competencies, nurse scholars will also need data literacy and ethical knowledge. For example, big data collection and analytic approaches may provide the technical pathways for patient outcomes via biometric trackers; yet, it is equally important for scholars to consider the risks to patient or participant data privacy and sovereignty. Text mining social media data is another example to consider ethically. Further, scholars should be literate in emerging legislation and case law around big data collection and analysis in healthcare (for example, Dinerstein v. Google, 2020). Literacies to critically evaluate big data output in practice settings are equally important.
In addition to technical competencies, nurse scholars will also need data literacy and ethical knowledge. Both technical competency and critical literacies will also aid nursing scholarship in interdisciplinary research settings and goals. While some nurse scholars may have access to and collaborate with data scientists, others may find themselves in the role of data scientist and analyst. To ensure rigorous application of big data approaches, nurses need to be competent either as self-sustaining data users or as big data research collaborators. Table 2 highlights implications of Big Data in nursing with regard to practice, research, education, and policy.
Table 2. Implications of Big Data in Nursing
| Practice | 
 | 
| Research | 
 | 
| Education | 
 | 
| Policy | 
 | 
Within policy, big data can be applied to patient outcomes by accessing and analyzing local, national, or global data for comprehensive understandings of a given intervention. Using data visualization would also increase the intuitive communication of evidence in policy-based decision-making.
Conclusion
The articles demonstrated interest and exploration of big data concepts among nursing scholars across several countries and topical aspects.This analysis examined 64 articles where respective authors selected “big data” and a variation of nursing as keywords to describe the focus of the articles. The articles demonstrated interest and exploration of big data concepts among nursing scholars across several countries and topical aspects. Yet, a lack of application of big data approaches also demonstrated the need to grow technical and critical literacies to facilitate the integration of big data into nursing practice, research, education, and policy. It is our hope this analysis has demonstrated that the nursing scholar community is ready to take the next steps to operationalize big data approaches to enhance the mission of nursing and quality patient outcomes.
Authors
Heather Carter-Templeton, PhD, RN-BC, FAAN
Email: heather.cartertempleton@hsc.wvu.edu
ORCID ID: https://orcid.org/0000-0002-8086-2094
Heather Carter-Templeton is Chairperson of the Adult Health Department and an Associate Professor at West Virginia University School of Nursing. Dr. Carter-Templeton has published and presented nationally and internationally regarding her research interest areas, specifically addressing informatics, information literacy, and evidence-based practice. She is currently conducting research in the areas of information literacy and healthcare information technology. In addition to her faculty responsibilities, Dr. Carter-Templeton is involved in a number of professional organizations and serves as the Editor of the Alliance for Nursing Informatics (ANI) Connections section of CIN: Computers, Informatics, Nursing journal.
Leslie H. Nicoll, PhD, MBA, RN, FAAN
Email: Leslie@medesk.com
ORCID ID: https://orcid.org/0000-0003-2149-7856
Leslie H. Nicoll is the President and Owner of Maine Desk, LLC. Since 1995 she has been the Editor-in-Chief of CIN: Computers, Informatics, Nursing. She is also the Editor-in-Chief of Nurse Author & Editor, a role she has held since 2014. She is the author of many peer reviewed articles, book chapters, and books, including the third edition of The Editor's Handbook, published in 2019 by Lippincott. Dr. Nicoll lives just outside Portland, Maine with her husband and four rescue pets and two adult children who live close by.
Jordan Wrigley, MSLS, MA
Email: jordan.wrigley@colorado.edu
ORCID ID: https://orcid.org/0000-0003-0176-5980
Jordan Wrigley, MSLS, MA is a data scientist and librarian with the Center for Research Data and Digital Scholarship (CRDDS) at the University of Colorado – Boulder. Ms. Wrigley's research focuses on data-based approaches to evidence and knowledge synthesis, bibliometrics, and public/environmental health. She has taught extensively on research data literacies, data visualization, and data ethics including for the Medical Library Association (MLA) and national data-focused organization. She is a contributing editor focused on data in information and library sciences for The Librarian Parlor. Lastly, she is a consultant with international medical device and environmental impact firms.
Tami H. Wyatt, PhD, RN, CNE, CHSE, ANEF, FAAN
Email: twyatt@utk.edu
ORCID ID: https://orcid.org/0000-0002-9491-8626
Dr. Tami H Wyatt is the endowed Torchbearer Professor, Associate Dean of Research in the University of Tennessee, College of Nursing and the Co-Director of the Health, Innovation and Technology Simulation (HITS) Lab at UT. Dr. Wyatt, a Robert Wood Johnson Executive Nurse Fellow alumna, is a nurse scientist and entrepreneur who works closely with engineers, graphic designers, and design thinking strategists to create innovative solutions in healthcare. Dr. Wyatt has published research about information technology systems in healthcare, robotics, and mHealth applications.
References
Barton A. J. (2016). Big data. The Journal of Nursing Education, 55(3), 123–124. https://doi.org/10.3928/01484834-20160216-01
Brennan, P. F., & Bakken, S. (2015). Nursing needs big data and big data needs nursing. Journal of Nursing Scholarship, 47(5), 477–484. https://doi.org/10.1111/jnu.12159
Carter-Templeton, H., Frazier, R. M., Wu, L., & H Wyatt, T. (2018). Robotics in nursing: A bibliometric analysis. Journal of Nursing Scholarship, 50(6), 582–589. https://doi.org/10.1111/jnu.12399
US District Court Northern Illinois Eastern Division. (2020, September 4). Matt Dinerstein v. Google, LLC. No. 1:2019cv04311 Document 85 (N.D. III. 2020). Retrieved from: https://law.justia.com/cases/federal/district-courts/illinois/ilndce/1:2019cv04311/366172/85/
Gingras, Y. (2016). Bibliometrics and research evaluation: Uses and abuses. MIT Press.
Hardy, L. R., & Bourne, P. E. (2017). Data science: Transformation of research and scholarship. In C.W. Delaney, C. A. Weaver, J. J. Warren, T. R. Clancy, R. L. Simpson (Eds.)., Big-data enabled nursing (pp.183-209). Springer.
Harzing A. W., & Alakangas S. (2016). Google scholar, Scopus, and the Web of Science: A longitudinal and cross-disciplinary comparison. Scientometrics, 106, 787–804. https://doi.org/10.1007/s11192-015-1798-9
Institute for Health Technology Transformation. (2013). Transforming health care through big data: Strategies for leveraging big data in the health care industry. Retrieved from: http://c4fd63cb482ce6861463-bc6183f1c18e748a49b87a25911a0555.r93.cf2.rackcdn.com/iHT2_BigData_2013.pdf
Kokol, P., Blažun, H., Vošner, J., & Saranto, K. (2014). Nursing informatics competencies: Bibliometric analysis. Studies in Health Technology and Informatics, 201, 342–348. Retrieved from: https://pubmed.ncbi.nlm.nih.gov/24943565/
Murphy, J., Goossen, W., & Weber, P. (2017). Forecasting informatics competencies for nurses in the future of connected health: Proceedings of the Nursing Informatics post-conference 2016. IOS Press. Retrieved rom: https://ebooks.iospress.nl/volume/forecasting-informatics-competencies-for-nurses-in-the-future-of-connected-health-proceedings-of-the-nursing-informatics-post-conference-2016
Nicoll, L. H., Carter-Templeton, H., Oermann, M. H., Ashton, K. S., Edie, A. H., & Conklin, J. L. (2018). A bibliometric analysis of 81 articles that represent excellence in nursing publication. Journal of Advanced Nursing, 74(12), 2894–2903. https://doi.org/10.1111/jan.13835
Topaz, M., & Pruinelli, L. (2017). Big Data and Nursing: Implications for the Future. Studies in Health Technology and Informatics, 232, 165–171. https://doi.org/10.3233/978-1-61499-738-2-165
U.S. Congress (2009). Health Information Technology for Economic and Clinical Heath Act (HITECH), 42 USC § 201. Retrieved from: https://www.hhs.gov/sites/default/files/ocr/privacy/hipaa/understanding/coveredentities/hitechact.pdf
van Eck, N. J., Waltman, L. (2010). Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics, 84, 523–538. https://doi.org/10.1007/s11192-009-0146-3
Warren, J. J. (2017). A big data primer. In C.W. Delaney, C. A. Weaver, J. J. Warren, T. R. Clancy, R. L. Simpson (Eds.)., Big-data enabled nursing, (pp.33-39). Springer.
Warren, J. J., Clansey, T. R., Delaney, C. W., & Weaver, C. A. (2017). Big-data enabled nursing: Future possibilities. In C.W. Delaney, C. A. Weaver, J. J. Warren, T. R. Clancy, R. L. Simpson (Eds.)., Big-data enabled nursing, (pp.441-463). Springer.