Computer Speech Recognition in Psychiatry

John S. Leipsic, MD;

Publication

Article

August 1, 1998

Psychiatric Times

Psychiatric Times Vol 15 No 8

Volume15

Issue 8

Computer Speech Recognition in Psychiatry

Author(s):

John S. Leipsic, MD

As her patient leaves the consulting room, Susan Roth, M.D., picks up her computer's microphone and begins dictating. "Wake up. Open template recurrent major depression. Patient identification: Mr. Johnson is a 64-year-old married white male. Chief complaint: difficulty sleeping, loss of appetite and depressed mood with suicidal ideation for the last three weeks."

As Roth speaks, words appear on her portable computer screen to complete the patient's intake evaluation. Adding an electronic signature, she prints out a hard copy of her evaluation and saves the medical record to her hard drive.

Using a voice-activated command to her computer, Roth faxes her findings and a cover letter to the referring physician and sends prescription orders to the patient's pharmacy.

Continuous Speech Recognition

A remarkable technology for personal computers, called continuous speech recognition, is emerging for clinical use. Through continuous speech recognition, dictation can be transcribed directly into a medical report. As the clinician speaks into a microphone, the computer "recognizes" his or her speech and "types" it out immediately.

Early speech recognition systems were based on a technology called discrete speech recognition. The user had to pause after each word for the computer to recognize and translate the sounds into text. As computer processor speed has grown exponentially and speech recognition systems have improved, continuous speech recognition has supplanted discrete speech recognition.

Continuous speech recognition lets the user speak naturally into the computer at a normal pace to dictate patient assessments, follow-up notes, consultation letters, medication management notes and even the patient's prescriptions. One can also dictate memos, notes or other documents directly into Microsoft Word, enter numbers into spreadsheets and execute common computer commands via voice, rather than by keyboard or mouse.

Hardware and Setup

Speech recognition programs require specific computer hardware configurations. Minimum system requirements for good performance of speech recognition systems are a Pentium 200 MHz computer with 64 MB of RAM and a CD-ROM drive for software installation. The sound card standard is SoundBlaster 16-bit compatible, but 32-bit works just as well. The wrong sound card or hardware configuration can adversely effect the accuracy of speech recognition.

A headset microphone is often supplied with the speech recognition software, but most physicians find headset microphones rather unwieldy. An innovative product is the SpeechMike from Philips, a handheld computer microphone with built-in trackball and speaker. SpeechMike Professional features dictation-specific features such as microphone off and on, play and pause buttons.

For portable computer users, the computer's compatibility or certification for speech recognition software is essential. I recommend the Micron TransPort configured with the Pentium 266 MHz processor speed and 64 MB or RAM a solid and well-built portable computer. Micron Electronics Inc. has endorsed the emerging technology of continuous speech recognition, including Dragon NaturallySpeaking, with all new portables.

In order for speech recognition software to be used, the microphone must be plugged into the computer's sound card, and the speech recognition software installed. While all speech recognition products can be used after initial setup, the best results come after the system is trained to recognize the user's own patterns of speech while a passage of text is read for about 30 minutes. The program will continue to learn to improve its accuracy of speech recognition through repeated dictation.

Medical Applications

Speech recognition technology has tremendous potential for medical information systems, facilitating the transition from manual charting on a paper-based medical record to the long- awaited computerized patient record. The National Academy of Sciences Institute of Medicine included speech recognition in its 1991 "gold standard" study on the computer-based patient records (CPR) with the following ideal CPR attributes:

The ideal CPR supports direct physician entry by dictation into the patient record.
The ideal CPR further supports integrated and interfaced voice dictation via icon on the clinical workstation.
The ideal CPR capitalizes on the repetitive nature of medicine by using templates at a point-of-care facility.

There are currently five suppliers of speech recognition systems: IBM, Philips, Kurzweil, Dragon Systems and Lernout & Hauspie. IBM, Philips and Kurzweil have introduced medical speech recognition systems over the last two years, the first of which was designed for radiology. Only two companies, Voice Input Technologies and Voice Activated Systems Technologies, have developed systems for the psychiatric practice, which are based on the Philips and Dragon Systems.

IBM MedSpeak

IBM MedSpeak, released in September 1996, was the first real-time continuous speech package available for physicians. IBM has released two professional modules: MedSpeak/Radiology and MedSpeak/Pathology but has no current plans for a psychiatry-specific system. IBM also offers two consumer speech recognition packages called IBM Via Voice and Via Voice Gold. I do not recommend these consumer products as stand-alone medical dictation modules, although they may help give novices a feeling for speech recognition by personal computer.

Philips SpeechMagic

Voice Input Technologies developed SpeechWriter Mental Health in 1996 using the Philips dictation engine SpeechMagic and a mental health language model. SpeechWriter is configured for group or hospital practices and is not cost-effective for a single user. SpeechWriter Mental Health has the advantage of a separate voice file to accompany the text file for playback during correction and editing. SpeechWriter does not include or currently support templates for psychiatric documentation.

Philips' batch-processed system differs from the other speech recognition engines by producing reports in two stages after the dictation is completed. First, the physician records a speech file, then SpeechMagic converts the file to text with a one-minute conversion time for each minute of dictation.

Thelma Camus, M.D., a psychiatrist in private practice in Sedona, Ariz., has used Voice Input Technologies' SpeechWriter for 18 months. Camus initially tried Dragon Dictate when it used discrete speech recognition, but was not satisfied with the speed. SpeechWriter achieves a 97% rate of speech- to-text recognition, and is especially good with psychiatric terms and medications based on its built-in vocabulary. Camus has also begun using an Olympus portable digital recorder to dictate and transfer information to her computer. She either dictates between patients, or writes notes during sessions and dictates afterward.

Kurzweil Clinical Reporter

Kurzweil Applied Intelligence Inc.'s Clinical Reporter is configured for radiology, pathology and emergency medicine. The Clinical Reporter allows clinicians to create complete medical reports by voice and capture data for later analysis through a voice-enabled system it calls a "prompted interaction system."During dictation, users respond to the system's prompts with numbers or short phrases. Physicians have found that these prompts slow down dictation and the natural flow of speech.

Lernout & Hauspie, a Belgian firm specializing in speech and language technology, acquired Kurzweil Applied Intelligence Inc. in June 1997, and since that time, development and support of Clinical Reporter has slowed. In May 1998, Lernout & Hauspie released L&H Voice Xpress Plus, a continuous speech recognition product with integration into Microsoft Word. Early reviews have found good command and control features with 95% accuracy of speech recognition.

In 1997, Microsoft purchased an 8% interest in Lernout & Hauspie. Microsoft and Lernout & Hauspie will be sharing technologies to cooperate on future speech initiatives. Microsoft will thereby position itself for future inclusion of speech recognition into the Windows operating system. In October 1997, Microsoft, Lernout & Hauspie and the medical dictation/transcription company MedQuist announced a broad-based strategic alliance to integrate medical reporting with Kurzweil's Clinical Reporter.

Dragon Systems' NaturallySpeaking

Dragon Systems' NaturallySpeaking continuous speech recognition software is widely recognized for its performance, speed, and intuitive, flexible user interface. Currently the market leader in use and acceptance, Dragon NaturallySpeaking allows users to edit and format as they dictate. Dragon NaturallySpeaking features command and control of Windows applications through voice and can handle rates of speech up to 150 words a minute.

John Carleton, M.D., an occupational psychiatrist in practice in Santa Barbara, Calif., finds that Dragon NaturallySpeaking helps him save on transcription costs: "I enjoy being more in control of my dictation; this saves me from having to drop my cassettes off with the transcriptionist, and reduces my time in the report turnover from two weeks to one day."

A companion to speech recognition software is a specialty specific language model to improve the speed and accuracy of medical dictation. Voice Input Technologies includes a psychiatry language model with SpeechWriter. Dragon Systems is currently developing a psychiatry language model for its next generation of professional speech recognition systems. Voice Automated Corporation has compiled language models for IBM Via Voice and Via Voice Gold, and Dragon NaturallySpeaking Deluxe for Disability/Workers Comp Evaluation and Psychology/Psychiatry.

Voice Automated constructs each language model by analyzing thousands of reports within the specialty. Each Voice Automated language model was created with a minimum of 15,000 reports and 12 to 15 million words of specific dictation within the field. By analyzing reports within a specific field for each specialty, 8,000 to 12,000 words are added to cover 97% to 99% of the words used within that field. These numbers mean that physicians will need less time to train words or teach the system a word's context for their specialty.

VAST Express NoteTaker

Voice Activated Systems Technologies (VAST) uses Dragon Systems' NaturallySpeaking for medical dictation. VAST Express NoteTaker Psychiatry contains a pre-formatted psychiatric diagnosis-specific report template for initial assessment and progress notes. VAST Express NoteTaker is also available with dictation modules for endocrinology, radiology, pathology and emergency medicine in addition to psychiatry. VAST also offers setup and training of speech recognition systems.

Each of the 35 VAST Express NoteTaker templates outlines the DSM-IV diagnostic criteria for major psychiatric diagnoses with subsequent fields for past history, mental status examination, diagnosis and treatment. VAST Express NoteTaker Templates can be further customized or designed for each clinician's preferred document styles. Templates facilitate the generation of complete notes and allow physicians to automatically include essential information, such as the 1998 Health Care Financing Administration E/M documentation requirements.

The commands and features of speech recognition programs take time to learn. The intricacies of computer hardware, software configuration and system maintenance can also be a challenge to set up and maintain. Yet the advantages of continuous speech recognition are well worth the learning curve.

What speech recognition products are expected in the near future? Intel and Microsoft, the two driving forces of the personal computer industry, are gearing up to include speech recognition in the operating systems of their computers. In 1999, Intel will ship a new processor technology named Katmai, which is designed to enhance complex applications such as 3-D graphics and speech recognition and run at speeds starting at about 500 MHz. Microsoft' s strategic alliance with Lernout & Hauspie Speech Products will further accelerate development of the next generation of voice-enabled computing on the Windows platform.

Articles in this issue

From Poster Child to Wanted Poster

Diagnostic Assessment of Children

Information and Psychiatry: Digitizing the World

Will Guidebook Assure Death With Dignity?

Computer Speech Recognition in Psychiatry

Medicare to Purchase New Claims Processing System

Can Telepsychiatry Pay Its Own Way?

Uncle Joe

Picnic for Parity Grows Nationally

New Medicare RVUs Would KO Psychiatrists

European Study Shows Mirtazapine More Effective Than SSRI