SpeechPulse Help

SpeechPulse demo

How to install SpeechPulse on macOS (Apple silicon)

Controls Overview

controls overview

AI Templates

For AI Templates, you need to download an AI language model as explained here. You can run the model either on the CPU or the GPU.

You can customize the AI Templates via "Settings->Options->AI Templates" or using the edit template and add new template buttons in the main SpeechPulse UI.

When prompting AI models, try to be specific, descriptive, and as detailed as possible about the desired context, outcome, length, format, style, etc.

Experiment with different prompts until you get the desired output.

You can also give example input/outputs in AI Templates for better results. Language models will use the given examples to tailor the output to the desired format.

Several default templates and example templates included with SpeechPulse are listed below.

Text enhancing

Email formatting

Specification

Summarization

Code generation

Speech Profiles

If you are in a noisy environment and SpeechPulse generates random text, you can use the "Speech Profiles" feature to add a new speech profile that only detects your own voice.

If multiple users share the same SpeechPulse installation, each user can create a separate speech profile sensitive to their voice.

The speech profiles feature can ignore noisy background audio and eliminate text hallucinations. However, it can't separate speech from multiple active speakers.

How to add language models

SpeechPulse Windows installer only includes the "English (base) L" language model. For better accuracy and multi-language support, you can download additional language models as explained here.

SpeechPulse macOS version has a built-in model downloader.

GPU support - Windows

To run the GPU models, you need an NVIDIA GPU with CUDA support. You also need to download CUDA libraries from here. Then unzip the cuda_libs.zip to the SpeechPulse installation directory (e.g.: "C:\Program Files\SpeechPulse") as explained here.

∗ NVIDIA 551.XX series drivers can cause SpeechPulse to hang and slow down. If you experience this issue, please roll back to a previous NVIDIA driver. Tested on NVIDIA Game Ready Driver versions 537.58, 545.92, 546.33, and 546.65 and NVIDIA Studio Driver versions 537.58, 546.01, and 546.33.

How to update SpeechPulse to the latest version

You can download the latest version of SpeechPulse from here. Then double-click the downloaded exe file to update SpeechPulse.

You don't need to uninstall the older version. SpeechPulse will automatically detect the older installation and update it.

Basic dictation

  1. Open your favorite text editor (Notepad, Wordpad, MS Word, Google docs etc.).
  2. Press the SpeechPulse Start button to start voice typing.
  3. Put the mouse cursor where you wish to insert text in your text editor.
  4. Start speaking in your natural voice. Try to speak in complete sentences for better accuracy.

Voice commands

You can refer to the voice command guide to get the full list of supported voice commands.

The list of active voice commands changes depending on the punctuation mode and whether you are dictating to the built-in editor or to an external application.

Hotkeys

SpeechPulse also supports hotkeys. You can enable and configure hotkeys in the settings menu.

Voice Hotkeys

You can use Custom Voice Hotkeys to trigger keyboard shortcuts with your voice. Simply enter the voice command and corresponding keyboard shortcut, and SpeechPulse will trigger the hotkey when it detects the voice command.

Mappings

You can use custom word/phrase mappings to replace SpeechPulse's text output with your own custom words and phrases.

For example, you can replace the phrase "speech pulse" with "SpeechPulse" using the custom mappings feature.

Similarly, you can remove unwanted words/phrases by replacing them with an empty string.

The mappings feature also supports regular expressions for wildcard matching. For regular expression matching, you need to enter the regex pattern as the search string. All the common regex patterns are supported.

For example, you can use the following mapping to replace "gray color", "grey color", "gray colour", or "grey colour" with the new phrase "Gray color".

regex mappings

For case-sensitive regex matching, you need to enable the case-sensitivity option.

Custom mappings can be enabled for both Live mode and File Mode.

Text Inserter

You can use the Text Inserter to insert frequently used text snippets with voice commands. Simply enter the voice command and corresponding text snippet, and SpeechPulse will paste the text snippet when it detects the voice command.

Text Inserter supports both plain text and rich text. So you can insert text with URLs and other formatting information using the Text Inserter.

SpeechPulse Editor

You can open the SpeechPulse editor using a customizable hotkey or with the voice command "Open editor".

The editor can automatically insert text into other applications.

Place the cursor into the text field of the target application (e.g., Notepad, MS Word) and open the editor using the hotkey or voice command. The editor will indicate the target app name in the title bar.

Once you have completed dictating, say "Transfer text" or press the "Transfer Text" button to transfer the text to the target app.

Punctuation Modes

SpeechPulse supports two punctuation modes: Auto punctuation and Manual punctuation.

Auto punctuation

Auto punctuation mode inserts common punctuation marks, including period, comma, question mark, and exclamation mark, automatically.

It also supports "new line" and "new paragraph" commands within a continuous speech segment.

Manual punctuation

With manual punctuation mode, you need to dictate punctuation marks explicitly.

Manual punctuation mode also supports "new line" and "new paragraph" commands within a continuous speech segment.

You can refer to the voice command guide to get the full list of supported voice commands.

* onnx language models only support Auto punctuation.

* Manual punctuation is only supported for the English language.

Dictating numbers in the manual punctuation mode

You can dictate most numbers the way you normally say them. If you would like to see more formatting, please submit a feature request.

Whole numbers

  • 342 -> Three hundred and forty-two
  • 4,567 -> Four thousand five hundred and sixty-seven
  • 873,120 -> Eight hundred and seventy three thousand one hundred and twenty
  • 7,340,200 -> Seven million three hundred and forty thousand two hundred

Decimals

  • 0.023 -> Zero point zero two three (or Point zero two three)
  • 13.56 -> Thirteen point five six

Fractions

  • 1/2 -> One half
  • 3/4 -> Three fourths (or Three quarters)
  • 121/645 -> One twenty one over six forty five (or One twenty one divided by six forty five)
  • 6 4/5 -> Six and four fifths

Phone numbers

  • 917-642-8073 -> Nine one seven six four two eight zero seven three

Dates

  • May 18, 2024 -> May eighteen two thousand twenty four

Time

  • 10:30 a.m. -> Ten thirty a m
  • 10:30 p.m. -> Ten thirty p m

Currency

  • $20 -> Twenty dollars
  • $20.50 -> Twenty dollars and fifty cents
  • £20 -> Twenty pounds
  • £20.50 -> Twenty pounds fifty (or Twenty pounds and fifty pence)
  • €20 -> Twenty euros
  • €20.50 -> Twenty euros and fifty cents

Percentages

  • 25% -> Twenty five percent
  • 25.6% -> Twenty five point six percent

Prompts

You can use custom prompts to alter SpeechPulse's text generation. Several use cases are as follows:

You can add words to SpeechPulse's vocabulary by listing the words in a custom prompt. Example prompt: "ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, OrbitalLink Seven, DigiFractal Matrix"

Correct the spelling and capitalization. Example prompt: "SpeechPulse, GPT, RTX, Electronic Arts"

Correct the punctuation. Example prompt: "Hello, welcome to my lecture."

* Custom prompts are only supported in the Auto punctuation mode.

* Custom prompts have a maximum length of 200 tokens. SpeechPulse only considers the last 200 tokens of a custom prompt. Both words and punctuation symbols are considered as tokens.

Buffer filling indicator

To prevent text hallucinations, you can enable the audio buffer filling indicator via Settings, and try to pause before the indicator fills.

SpeechPulse supports speech segments longer than 30 seconds. However, pausing before 30 seconds can help prevent text hallucinations (30 seconds is the context width of Whisper models).

You can continue to dictate the next segment as soon as the processing indicator for the previous segment appears.

How to get better accuracy

1) Try to reduce the background noise

SpeechPulse has decent accuracy even with moderate background noise. However, for the best accuracy and lowest latency, you should try to lower the background noise as much as possible.

2) Try to speak in complete sentences

Short phrases can confuse the AI language models, causing poor accuracy. So try to speak in complete sentences as much as possible.

3) Use a headset microphone instead of a PC/laptop microphone

PC/laptop microphones can capture faraway sounds/noises causing random text hallucinations in SpeechPulse. So it is better to use a headset microphone for better accuracy and fewer hallucinations.

4) Use a larger language model

Larger language models have better accuracy. They also work well under background noise. However, larger models require more RAM and have higher latencies.

Frequently asked questions:

How to fix the random text generation

The random text generation issue can happen for several reasons. This is called text hallucinations in AI terms.

The most common issue is background noise (especially faint human/animal voices). When AI language models get faint voices (low volume) they tend to incorrectly transcribe them to random text phrases (like "Thank you.", "Hello, welcome to my lecture." etc.). Sometimes the models will even repeat the same incorrect phrase multiple times in a loop.

You can try the following to fix the text hallucination issue:

  1. Enable the audio buffer filling indicator via Settings, and try to pause before the indicator fills. SpeechPulse supports speech segments longer than 30 seconds. However, pausing before 30 seconds can help prevent text hallucinations (30 seconds is the context width of Whisper models).
  2. Try the Push-to-talk mode. It can reduce text hallucinations by preventing unintended pauses.
  3. If you are using a PC/USB/laptop microphone, try SpeechPulse with a headset microphone. Headset microphones only pick up your voice and won't capture faraway background noise.
  4. If you notice random text generation even if you are not talking, you can add a new speech profile to make SpeechPulse sensitive to only your voice.
  5. You can also try dictating slightly louder so models can differentiate voice from background noise.
  6. If possible, try to lower the background noise (try dictating in a quieter location).
  7. Noise cancellation can distort the audio signal and cause issues like text hallucination. So try disabling the noise cancellation of your noise-canceling microphone. (also try disabling any noise-canceling software on your PC)
  8. Make sure the microphone has a sufficient volume. Too quiet microphones can cause text hallucinations. If the microphone has a volume/gain control, increase the volume.
  9. Make sure the microphone is not clipping your speech. Try reducing the microphone volume/gain if it is too high.
  10. Different microphones have different pickup patterns and distances. Make sure to place the microphone at a correct distance from your mouth. If the microphone is not in the optimal position, it can degrade the speech signal and cause text hallucinations.
  11. If all else fails, try SpeechPulse with a different microphone.
  12. You can also use the mappings feature to automatically remove common text hallucinations. Simply use a blank replacement string in your mappings entry. You can also use a regular expression to replace all hallucinations of a specific pattern. remove text hallucinations using mappings

Which microphones work well with SpeechPulse?

SpeechPulse is compatible with any PC/USB/laptop/headset microphone. However, headset microphones usually work better as they only pick up your voice and won't capture too much background noise.

Also, make sure to place the microphone at a correct distance from your mouth. If the microphone is not in the optimal position, it can degrade the speech signal and lower the transcription accuracy.

Some noise-canceling microphones can distort the speech signal and cause issues like text hallucination and poor transcription accuracy. In that case, try disabling the DSP noise-cancellation of your microphone.

Also, make sure your microphone has a sufficient volume. SpeechPulse can take longer to transcribe your speech if the mic volume is too low. It can also cause poor transcription accuracy and text hallucinations.

Why does SpeechPulse sometimes take longer than usual to transcribe speech?

SpeechPulse can sometimes take longer than usual to transcribe your speech for several reasons.

The transcription can take longer if the speech signal is of poor quality. For example, if you place your microphone far away from you, the speech signal will degrade, causing SpeechPulse to try the transcription several times with different internal settings. This repeated processing can significantly increase the total processing time you experience. The same issue can arise if your microphone has a very low volume.

You can try a headset microphone instead of a PC/USB/laptop microphone to prevent longer transcription times. Also, make sure your microphone has sufficient volume and place the mic at a proper distance from your mouth.

Too much background noise can also cause SpeechPulse to take longer than usual to complete the transcription. Faint human/animal voices can confuse AI models, causing them to repeat the transcription process with different internal settings.

If you think background noise is the reason for the longer processing times, you can try SpeechPulse in a quieter environment and see if it can solve the issue.

How to control capitalization and spacing in the manual punctuation mode?

SpeechPulse automatically capitalizes the first character of a sentence. It considers the dictated text as a new sentence if the character before the current cursor position is a sentence ending character (.?!), if the cursor is at the beginning of a line, or if the previous character is a whitespace character (space, tab).

If you want to continue a sentence, you can place the cursor right after the last word with no space in between. SpeechPulse will then automatically insert a space before the next dictated text segment and continue the sentence without capitalizing the first letter.

How to continue a sentence in lowercase

You can also manually control the capitalization using "Cap", "No cap", and "All cap" commands.

"Cap" command:
Use the Cap command to capitalize words
"No cap" command:
Use the No cap command to prevent capitalizing words
"All cap" command:
Use the All cap command to capitalize all letters of a word

When you continue a sentence, SpeechPulse automatically inserts a space before the next dictated text segment. You can say "No space" before dictating the next text segment to prevent adding a space. The "No space" command works only at the start of a dictated segment.

How to fix automatic capitalization and spacing?

If automatic capitalization and spacing don't work properly, it can be due to incorrect key bindings on your PC.

SpeechPulse uses the hotkeys "CTRL+C", "CTRL+V", "SHIFT+Left", and "SHIFT+Right" to detect the previous/next characters on the text input area (e.g. Notepad). So make sure those hotkeys have their default meaning on your PC.