Speech recognition for medical documentation: an analysis of time, cost efficiency and acceptance in a clinical setting
Abstract
Background/Aims
Medical documentation is an important and unavoidable part of a health professional's working day. However, the time required for medical documentation is often viewed negatively, particularly by clinicians with heavy workloads. Digital speech recognition has become more prevalent and is being used to optimise working time. This study evaluated the time and cost savings associated with speech recognition technology, and its potential for improving healthcare processes.
Methods
Clinicians were directly observed while completing medical documentation. A total of 313 samples were collected, of which 163 used speech recognition and 150 used typing methods. The time taken to complete the medical form, the error rate and error correction time were recorded. A survey was also completed by 31 clinicians to gauge their level of acceptance of speech recognition software for medical documentation. Two-sample t-tests and Mann–Whitney U tests were performed to determine statistical trends and significance.
Results
On average, medical documentation using speech recognition software took just 5.11 minutes to complete the form, compared to 8.9 minutes typing, representing significant time savings. The error rate was also found to be lower for speech recognition software. However, 55% of clinicians surveyed stated that they would prefer to type their notes rather than use speech recognition software and perceived the error rate of this software to be higher than typing.
Conclusions
The results showed that there are both temporal and financial advantages of speech recognition technology over text input for medical documentation. However, this technology had low levels of acceptance among staff, which could have implications for the uptake of this method.
Introduction
Medical documentation has several important functions within the medical treatment process. Maintaining complete, accurate and timely documentation of treatment steps is essential for patient safety. Incomplete documentation could lead to duplicate medical examinations being performed and/or a reduced quality of treatment for the patient. This is particularly pertinent to inpatient hospital care, where incorrect or incomplete documentation can potentially hinder the treatment process and, in healthcare systems that rely on reimbursement coding, could have significant economic impacts for medical institutions, such as a potential loss of revenue (Cheng et al, 2009; Zafirah et al, 2018).
However, as the time pressures of everyday clinical practice increase, completing timely and accurate medical documentation is becoming more difficult for clinicians and can be seen as a burden (Clynch and Kellett, 2015). While the digitisation of patient records has led to a reduction in the time spent on medical documentation (Clynch and Kellett, 2015), clinicians may still spend 3–4 hours or up to 65% of their working day creating medical reports (Cheng et al, 2009; Oxentenko et al, 2010; Neri et al, 2015). To address this, some hospitals use a transcription service whereby dictated notes are transcribed by a typist. Although this method saves time, it can delay the completion of medical documentation and adds another step to the process, as well as additional personnel costs.
One alternative to transcription services is digital speech recognition technology, which has developed significantly over the last 15 years as an instrument for optimising working time and resources (Koivikko et al, 2008). In a healthcare context, speech recognition involves the use of speech input or dictation software as an alternative to typing medical notes. This allows spoken words to be converted directly into continuous text on an electronic medical record or word processing programme.
Research into this method has been promising, with several studies showing that using speech recognition software for medical documentation can lead to both time and cost savings (Rosenthal et al, 1998; Chapman et al, 2000; Vorbeck et al, 2000; Zick and Olsen, 2001; Callaway et al, 2002; Henricks et al, 2002; Parente et al, 2004; Koivikko et al, 2008; Krishnaraj et al, 2010; Prevedello et al, 2014). However, most of these studies have focused on radiology and many have limitations, such as small sample sizes. There is also a paucity of evidence regarding the error rates of speech recognition software and the extent to which clinicians find this method of medical documentation acceptable. For example, Rosenthal et al (1998) and Koivikko et al (2008) found that speech recognition software significantly decreased the time taken to complete medical reports and resulted in substantial savings in transcription costs in radiology departments. However, neither of these studies evaluated the software's error or accuracy rates. Chapman et al (2000), Callaway et al (2002) and Trumm et al (2008) achieved similar results, but their use of retrospective data meant that the scope of their analysis was limited. The results of Krishnaraj et al (2010) and Prevedello et al (2014) supported the idea that speech recognition software leads to time and cost savings in the completion of medical documentation, but neither of these studies reported an exact sample size.
However, some disadvantages of speech recognition software have been noted. For example, Zick and Olsen (2001) found that, although speech recognition was faster, it was less accurate than using a transcription service. Similarly, Vorbeck et al (2000) found that the mean error rate of speech recognition software was higher than when medical notes were typed, although the authors noted that the difference varied considerably depending on the typist, speaker and the extent of the stored vocabulary provided by the speech recognition software. Issenman and Jaffer (2004) found that speech recognition was 66% less efficient than using a transcription service and, when the licensing fee was included, cost considerably more annually. However, these studies are all over 15 years old and technology has developed significantly in this time. One of the most recent studies investigating speech recognition software in healthcare used a controlled observational design and found that all recording methods took a similar amount of time, with dictated notes using speech recognition software taking slightly less time than typed notes (Blackley et al, 2020). Dictated notes also had less errors and a higher mean quality score, with a greater amount of information included. However, it should be noted that only 10 participants were involved in this study, so it may not be generalisable.
In terms of clinician acceptance of speech recognition software, a survey by Goss et al (2019) found that 78.8% of clinicians were satisfied with this method of note taking and 77.2% believed that it increased efficiency. Meanwhile, Vogel et al (2015) found that speech recognition software not only saved clinician's time, it also resulted in a better mood when compared to typing. However, clinicians' attitudes towards this method of note taking requires further investigation.
The aim of the present study was to evaluate the impact of speech recognition on the completion of medical documentation, focusing on time and cost savings, as well as error rate and acceptance among clinicians.
Methods
This study used a prospective, non-randomised design to compare data from speech recognition and standard typing methods of medical documentation. Data were collected by direct observation of clinicians in the nephrology, haematology and emergency departments of Robert Bosch Hospital in Stuttgart, Germany. Clinicians were observed in their normal work environment while they completed their reports following patient consultations. This allowed the researchers to stop the clock if the clinician was interrupted by an external circumstance during the process, such as a phone call or page.
As speech recognition technology can take some time to get used to, only clinicians who habitually used either this method or typing for medical note taking were selected. Only doctors were observed in this study. In total, 15 clinicians participated over a study period of 6 months, producing 313 samples, of which 163 used speech recognition software and the remaining 150 used typing.
The clinicians using speech recognition technology all used Indicda easySpeak (DFC Systems, Aschheim, Germany) and Dragon Naturally Speaking (Nuance Communications, Massachusetts, United States of America) software. To allow for the variation between the different sections of the medical form used to document patient information, each of the major fields on the form were observed separately and the number of lines entered into each field of the report was counted. The average amount of time taken to complete a line, recorded in seconds, was then calculated. Line lengths on the form were the same regardless of which recording method clinicians used. The average time taken per line was then multiplied by the average number of lines per field on the form to give the total time taken per report.
The number of errors per field on the form and the correction time for these errors were also recorded using direct observation. All errors per completed form field were noted and divided by the number of lines entered to determine the error rate per line. The correction time per error was measured in seconds and the average calculated.
To measure the level of acceptance of speech recognition technology in the hospital, an anonymised online survey was sent to all clinicians who worked on inpatient wards. The survey asked respondents about the method of medical documentation they typically used, whether they wanted to see an increased use of speech recognition software and what they estimated the error rate of this software to be compared to typing methods. The latter question was measured on a 5-point Likert scale, with 1 indicating significantly less errors and 5 indicating significantly more errors. The survey also asked respondents to estimate how much time they spent on medical documentation each day and how much (if any) time they believed they could save by using speech recognition technology.
Statistical analysis was performed using a two-sample t-test or Mann–Whitney U test, as required, to compare the results of speech recognition software and typing for medical documentation.
Results
Time
The results of the two-sample t-test showed significant differences between the sample groups. On average, speech recognition software recorded at a rate of 6.8 seconds per line, which was significantly lower than the typing rate of 11.6 seconds per line (P<0.01) (Table 1). The average number of lines per field was 3.6, with the average total lines on a form being 42.8. The average time taken to complete the form, broken down by field, using both methods is shown in Table 2. Across all fields, speech recognition was faster than typing, taking an average of 5.11 minutes to complete the whole form, compared to 8.9 minutes typing. This represents a 43% greater time efficiency for speech recognition software compared to typing when completing medical documentation.
Time taken (seconds) | ||
---|---|---|
Speech recognition software (n=163) | Typing (n=150) | |
Mean (standard deviation) | 6.8 (2.5) | 11.6 (3.4) |
Median | 6.0 | 12.1 |
Minimum | 2,3 | 4.0 |
Maximum | 14.6 | 20.0 |
Form field | Average time per line (seconds) | Average number of lines | Average time per field (seconds) | ||
---|---|---|---|---|---|
SR software | Typing | SR software | Typing | ||
Anamnesis | 7.2 | 13.8 | 6.2 | 44.6 | 85.56 |
Physical examination | 5.6 | 9.5 | 5.1 | 28.6 | 48.45 |
Procedure | 5.7 | 9.7 | 5.1 | 29.1 | 49.47 |
Clinical summary | 7.1 | 13.2 | 26.4 | 187.4 | 348.48 |
Total | 289.7 (5.11 minutes) | 531.96 (8.9 minutes) |
Error rate
Speech recognition software was shown to be more accurate than typing, with an average of 0.15 errors per line compared to 0.3 errors per line with typing (P<0.001) (Table 3). The average time taken to correct an error was 0.1 seconds longer with speech recognition software, but this difference was not found to be statistically significant.
Number of errors per line | ||
---|---|---|
Speech recognition software (n=163) | Typing (n=150) | |
Mean (standard deviation) | 0.15 (0.26) | 0.3 (0.3) |
Median | 0.0 | 0.2 |
Minimum | 0.0 | 0.0 |
Maximum | 1.3 | 1.5 |
Correlation coefficients between the number of errors and the length of the text were calculated to discern which method of documentation would cause fewer errors in longer texts. The length of the text and the number of errors were significantly correlated in both the speech recognition and typing group (P<0.01). However, the correlation was stronger in the typing group, with a coefficient of 0.82 compared to 0.63 for speech recognition software. This suggests that, for longer notes, typing may lead to more errors than dictating using speech recognition software.
Acceptance
A total of 31 clinicians at the hospital responded to the survey to measure acceptance of speech recognition software for medical documentation. A slight majority (55%, n=17) stated that they preferred to type their medical notes, rather than dictating using speech recognition software. However, 82.4% (n=14) of those who preferred typing also stated that they wanted to see greater use of speech recognition software in the hospital. On average, those who typically used speech recognition software reported that medical documentation took up 47.9% of their working day, compared to 36.5 % for those who preferred typing. However, the standard deviation was considerably higher among the former group, at 25.2%, compared to 16.7 % in the latter group. The majority of respondents stated that if they could save time on medical documentation, they would use it to deliver patient care.
In both groups, respondents estimated that error rates for speech recognition would be higher than for typing. On a 5-point Likert scale (where 1 indicated significantly less errors with speech recognition compared to typing and 5 indicated significantly more errors), the average ratings for those who regularly used speech recognition and those who regularly used typing were 4.07 and 4.53 respectively. This indicates low confidence in the accuracy of speech recognition software for medical documentation, even among those who regularly used it. However, this contrasts with the observed data for error rates in this study, which demonstrated that speech recognition was more accurate than typing overall.
Many respondents commented that their acceptance of speech recognition software would be improved if certain adjustments were made, such as the creation of a mobile speech recognition software application, a faster speech recognition learning system or a medical-specific vocabulary. There was no significant difference in responses to this section of the questionnaire between those who preferred to use speech recognition and those who preferred to type.
Discussion
The results of this study suggest that increased use of speech recognition software for medical documentation could lead to both time savings and increased accuracy of notes. This more efficient use of time resources could free up clinicians' time to for patient care, which could improve clinical outcomes and the patient experience (Dugdale et al, 1999; Plantinga et al, 2004). Results regarding the accuracy of speech recognition software contrast with some previous studies (Vorbeck et al, 2000; Issenman and Jaffer, 2004). This may be because many existing studies are from over 15 years ago, during which time there have been significant advances in speech recognition technology (Johnson et al, 2014; Zuchowski et al, 2020). To the best of the authors' knowledge, the use of direct human observation in a clinical setting to evaluate speech recognition software for medical documentation is unique to the present study, which may also explain the difference in outcomes compared to previous studies.
In the German context, the use of speech recognition software could also lead to financial savings for hospitals, as doctors in Germany work 2.04 hours of paid overtime on average per week (Marburger Bund, 2020). Therefore, improving efficiency could reduce the need for overtime and save staffing costs for hospitals. In a wider context, the reduction of documentation errors has the potential to enhance care quality by reducing consequent medical errors, which could, in turn, reduce overall healthcare costs (Cheng et al, 2009; Zafirah et al, 2018).
The results of this study indicate that clinicians are generally interested in using speech recognition software for medical documentation, but perceive the accuracy of this software as relatively low. If the time and cost benefits of speech recognition software are to be fully realised, it is crucial for the technology to be accepted and trusted by clinicians. Therefore, it is important that any misconceptions regarding this software are corrected before it is implemented in the workplace. It is possible that increasing awareness of how speech recognition can be used in a clinical setting could increase clinicians' confidence in using this technology and thus their acceptance of it. However, further research is needed to fully explore why some clinicians feel apprehensive about using speech recognition software.
Limitations
Participation in this study was voluntary and participating clinicians were able to choose their method of note taking. This may have introduced a selection bias, with clinicians choosing the method they were already positively disposed to. Furthermore, the results may have been affected by factors that were not observable, such as individual variations in clinicians' note-taking ability and experience.
Conclusions
The use of speech recognition software could be more advantageous than typing for at least some aspects of the medical documentation process. In contrast to some previous research, this study found speech recognition software to be faster and more accurate than typing medical notes. However, results also suggested that this method of medical documentation is yet to be fully accepted by clinicians, which would need to be addressed before this software could be comprehensively implemented into a hospital setting.
Key points
Speech recognition software can significantly reduce the amount of time taken to complete medical documentation by clinicians.
Error rates were lower on medical forms completed using speech recognition software compared to those that used traditional typing methods.
Acceptance of speech recognition software among clinicians is limited, which may be partly explained by misconceptions about the accuracy of this technology.
Conflicts of interest The authors declare that there are no conflicts of interest.
References
- Physician use of speech recognition versus typing in clinical documentation: a controlled observational study. Int J Med Inform. 2020;141:104178. https://doi.org/10.1016/j.ijmedinf.2020.104178 Crossref, Google Scholar
- Speech recognition interface to a hospital information system using a self-designed visual basic program: initial experience. J Digit Imaging. 2002;15(1):43–53. https://doi.org/10.1007/bf03191902 Crossref, Google Scholar
- Contribution of a speech recognition system to a computerized pneumonia guideline in the emergency department. Proceedings. AMIA Symposium. 2000;131–135 Google Scholar
- The risk and consequences of clinical miscoding due to inadequate medical documentation: a case study of the impact on health services funding. Him J. 2009;38(1):35–46. https://doi.org/10.1177/183335830903800105 Crossref, Google Scholar
- Medical documentation: part of the solution, or part of the problem? A narrative review of the literature on the time spent on and value of medical documentation. Int J Med Inform. 2015;84(4):221–228. https://doi.org/10.1016/j.ijmedinf.2014.12.001 Crossref, Google Scholar
- Time and the patient-physician relationship. J Gen Intern Med. 1999;14(S1):S34–40. https://doi.org/10.1046/j.1525-1497.1999.00263.x Crossref, Google Scholar
- A clinician survey of using speech recognition for clinical documentation in the electronic health record. Int J Med Inform. 2019;130:103938. https://doi.org/10.1016/j.ijmedinf.2019.07.017 Crossref, Google Scholar
- The utility and cost effectiveness of voice recognition technology in surgical pathology. Mod Pathol. 2002;15(5):565–571. https://doi.org/10.1038/modpathol.3880564 Crossref, Google Scholar
- Use of voice recognition software in an outpatient pediatric specialty practice. Pediatrics. 2004;114(3):e290–3. https://doi.org/10.1542/peds.2003-0724-L Crossref, Google Scholar
- A systematic review of speech recognition technology in health care. BMC Med Inform Decis Mak. 2014;14:94. https://doi.org/10.1186/1472-6947-14-94 Crossref, Google Scholar
- Improvement of report workflow and productivity using speech recognition—a follow-up study. J Digit Imaging. 2008;21(4):378–382. https://doi.org/10.1007/s10278-008-9121-4 Crossref, Google Scholar
- Voice recognition software: effect on radiology report turnaround time at an academic medical center. AJR Am J Roentgenol. 2010;195(1):194–197. https://doi.org/10.2214/AJR.09.3169 Crossref, Google Scholar
Marburger Bund . MB-Monitor 2019: Überlastung führt zu gesundheitlichen Beeinträchtigungen. 2020. https://www.marburger-bund.de/mb-monitor-2019 (accessed 30 November 2021 ) Google Scholar- Emergency medicine resident physicians' perceptions of electronic documentation and workflow: a mixed methods study. Appl Clin Inform. 2015;6(1):27–41. https://doi.org/10.4338/ACI-2014-08-RA-0065 Crossref, Google Scholar
- Time spent on clinical documentation: a survey of internal medicine residents and program directors. Arch Intern Med. 2010;170(4):377–380. https://doi.org/10.1001/ARCHINTERNMED.2009.534 Crossref, Google Scholar
- An analysis of the implementation and impact of speech-recognition technology in the healthcare sector. Perspect Health Inf. 2004;1(5) Google Scholar
- Frequency of patient-physician contact and patient outcomes in hemodialysis care. J Am Soc Nephrol. 2004;15(1):210–218. https://doi.org/10.1097/01.asn.0000106101.48237.9d Crossref, Google Scholar
- Implementation of speech recognition in a community-based radiology practice: effect on report turnaround times. J Am Coll Radiol. 2014;11(4):402–406. https://doi.org/10.1016/J.JACR.2013.07.008 Crossref, Google Scholar
- Computer-based speech recognition as a replacement for medical transcription. AJR Am J Roentgenol. 1998;170(1):23–25. https://doi.org/10.2214/ajr.170.1.9423591 Crossref, Google Scholar
- Impact of RIS/PACS integrated speech recognition on report availability. Radiol Manage. 2008;30(6):16–23. quiz 24–26. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/19115708 Google Scholar
- Analysis of documentation speed using web-based medical speech recognition technology: randomized controlled trial. J Med Internet Res. 2015;17(11):e247. https://doi.org/10.2196/jmir.5072 Crossref, Google Scholar
- Report generation using digital speech recognition in radiology. Eur Radiol. 2000;10(12):1976–1982. https://doi.org/10.1007/s003300000459 Crossref, Google Scholar
- Potential loss of revenue due to errors in clinical coding during the implementation of the Malaysia Diagnosis Related Group (MY-DRG®) casemix system in a teaching hospital in Malaysia. BMC Health Serv Res. 2018;18(1): https://doi.org/10.1186/S12913-018-2843-1 Crossref, Google Scholar
- Voice recognition software versus a traditional transcription service for physician charting in the ED. Am J Emerg Med. 2001;19(4):295–298. https://doi.org/10.1053/AJEM.2001.24487 Crossref, Google Scholar
- Medizinische Spracherkennung im stationären und ambulanten Einsatz: Eine systematische Übersicht. Gesundheitsökonomie Qualitätsmanagement. 2020;25(2):E1. https://doi.org/10.1055/a-1115-6980 Crossref, Google Scholar