How to Install and Use Whisper AI on Windows for Speech Recognition

Learn how to install Whisper AI on Windows with this simple guide. Explore its powerful speech-to-text transcription capabilities today!

Introducing Whisper

What's Whisper?

OpenAI Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.

Whisper is a series of pre-trained models for automatic speech recognition (ASR), which was released in September 2022 by Alec Radford and others from OpenAI. Whisper is pre-trained on large amounts of annotated audio transcription data. The annotated audio duration used for training is as high as 680,000 hours, so it shows comparable performance to the most advanced ASR systems.

Available models and languages

There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and inference speed relative to the large model; actual speed may vary depending on many factors including the available hardware.

Whisper models

In December 2022, OpenAI released an improved large model named large-v2, and large-v3 in November 2023.

System Requirements

Python 3.8–3.11

Windows 10, 11

Git, Conda, Pytorch

GPU support requires a CUDA®-enabled card, 4GB+ VRAM

5 Steps to install Whisper AI

This guide uses the Advanced GPU - V100 Plan on GPUMart, which is equipped with a dedicated NVIDIA V100 graphics card with 16GB HBM2 GPU Memory and can easily run the latest large-v3 multi-language model. Since Whisper has many dependencies to run, the process of installing whisper is a bit long but simple. It mainly consists of the following 5 steps.

Step 1 - Install Git

Click here (https://git-scm.com/download/win) to download the latest 64-bit version of Git for Windows, then right click on the downloaded file and run the installer as administrator.

Step 2 - Install Miniconda3 and create Python 3.10 Environment

Miniconda is a minimal installer provided by Anaconda. Please download the latest Miniconda installer (https://docs.anaconda.com/free/miniconda/) and complete the installation.

Whisper requires Python3.8+. You'll need Python 3.8-3.11 and recent versions of PyTorch on your machine. Let's set up a virtual environment with conda if you want to isolate these experiments from other work.

> conda create -n Whisper python=3.10.11
> conda activate Whisper

Step 3 - Install PyTorch Stable(2.3.0) with CUDA 12.1 support

Whisper requires a recent version of PyTorch (we used PyTorch 1.12.1 without issues).

> conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

Step 4 - Install Chocolatey and ffmpeg

Open a PowerShell terminal and from the PS C:\> prompt, run the following command:

> Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))

If you don't see any errors, you are ready to use Chocolatey! Whisper also requires FFmpeg, an audio-processing library. If FFmpeg is not already installed on your machine, use one of the below commands to install it.

> choco install ffmpeg

Step 5 - Install Whisper

Pull and install the latest commit from this repository, along with its Python dependencies:

> pip install git+https://github.com/openai/whisper.git
install whisper finished

How to Use Whisper for Speech-to-text Transcription

Command-line usage

The following command will transcribe speech in audio files, using the medium model:

> whisper audio.wav --model medium

The default setting (which selects the small model) works well for transcribing English. To transcribe an audio file containing non-English speech, you can specify the language using the --language option:

> whisper chinese.mp3 --language Chinese

Adding --task translate will translate the speech into English:

> whisper chinese.mp3 --language Chinese --task translate

Specify the output format and path:

> whisper Arthur.mp3 --model large-v3 --output_format txt --output_dir .\output

To learn more about usage, please see the help:

> whisper -h

Python usage

Transcription can also be performed within Python:

import whisper

model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])

JupyterLab usage

If you have not installed JupyterLab, please install it first, and then start it. The reference command line is as follows.

(Whisper) PS > conda install -c conda-forge jupyterlab
(Whisper) PS > jupyter lab
run whisper with jupyter

Conclusion

In this tutorial, we cover the basics of getting started with Whisper AI on Windows. Whisper AI provides a powerful and intuitive speech recognition solution for Windows users. By following the steps outlined in this guide, you can easily install and utilize Whisper AI on your Windows operating system. Experience the convenience and efficiency of speech recognition technology as you embrace a hands-free approach to various tasks.

Additional - GPU Servers Suitable for Running Whisper AI

Please choose the appropriate GPU server based on the maximum model size you need to use. The medium model requires 5G of VRAM, and the large model requires 10GB of VRAM.

Express GPU - P1000

64.00/mo
1mo3mo12mo24mo
Order Now
  • 32GB RAM
  • Eight-Core Xeon E5-2690
  • 120GB + 960GB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia Quadro P1000
  • Microarchitecture: Pascal
  • Max GPUs: 1
  • CUDA Cores: 640
  • GPU Memory: 4GB GDDR5
  • FP32 Performance: 1.894 TFLOPS

Basic GPU - GTX 1650

99.00/mo
1mo3mo12mo24mo
Order Now
  • 64GB RAM
  • Eight-Core Xeon E5-2667v3
  • 120GB + 960GB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia GeForce GTX 1650
  • Microarchitecture: Turing
  • Max GPUs: 1
  • CUDA Cores: 896
  • GPU Memory: 4GB GDDR5
  • FP32 Performance: 3.0 TFLOPS

Basic GPU - GTX 1660

139.00/mo
1mo3mo12mo24mo
Order Now
  • 64GB RAM
  • Dual 10-Core Xeon E5-2660v2
  • 120GB + 960GB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia GeForce GTX 1660
  • Microarchitecture: Turing
  • Max GPUs: 1
  • CUDA Cores: 1408
  • GPU Memory: 6GB GDDR6
  • FP32 Performance: 5.0 TFLOPS

Professional GPU - RTX 2060

159.00/mo
1mo3mo12mo24mo
Order Now
  • 128GB RAM
  • Dual 10-Core E5-2660v2
  • 120GB + 960GB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia GeForce RTX 2060
  • Microarchitecture: Ampere
  • Max GPUs: 2
  • CUDA Cores: 1920
  • Tensor Cores: 240
  • GPU Memory: 6GB GDDR6
  • FP32 Performance: 6.5 TFLOPS

Basic GPU - RTX 4060

149.00/mo
1mo3mo12mo24mo
Order Now
  • 64GB RAM
  • Eight-Core E5-2690
  • 120GB SSD + 960GB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia GeForece RTX 4060
  • Microarchitecture: Ada Lovelace
  • Max GPUs: 2
  • CUDA Cores: 3072
  • Tensor Cores: 96
  • GPU Memory: 8GB GDDR6
  • FP32 Performance: 15.11 TFLOPS
  • Powerful for Gaming, Streaming, Video Editing, Emulators, Rendering.

    Cost-Effective Choice for AI, Deep Learning.

Advanced GPU - RTX 3060 Ti

179.00/mo
1mo3mo12mo24mo
Order Now
  • 128GB RAM
  • Dual 12-Core E5-2697v2
  • 240GB SSD + 2TB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: GeForce RTX 3060 Ti
  • Microarchitecture: Ampere
  • Max GPUs: 2
  • CUDA Cores: 4864
  • Tensor Cores: 152
  • GPU Memory: 8GB GDDR6
  • FP32 Performance: 16.2 TFLOPS
Summer Sale

Advanced GPU - A4000

167.2/mo
Save 40% (Was $279.00)
1mo3mo12mo24mo
Order Now
  • 128GB RAM
  • Dual 12-Core E5-2697v2
  • 240GB SSD + 2TB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia Quadro RTX A4000
  • Microarchitecture: Ampere
  • Max GPUs: 2
  • CUDA Cores: 6144
  • Tensor Cores: 192
  • GPU Memory: 16GB GDDR6
  • FP32 Performance: 19.2 TFLOPS
  • Good Choice for Android Emulators, 3D Rendering, Video Editing, AI/Deep Learning, Data Science, etc

Advanced GPU - V100

229.00/mo
1mo3mo12mo24mo
Order Now
  • 128GB RAM
  • Dual 12-Core E5-2690v3
  • 240GB SSD + 2TB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia V100
  • Microarchitecture: Volta
  • Max GPUs: 1
  • CUDA Cores: 5,120
  • Tensor Cores: 640
  • GPU Memory: 16GB HBM2
  • FP32 Performance: 14 TFLOPS

If you do not find a suitable GPU server plan, please leave us a message.

Email *
Name
Company
Any Questions/Suggestions *
I agree to be contacted as per Database Mart privacy policy.