Several voice search and transcription software exist that can capture the words from an audio file (for example .mp4) and save it as text, or other document, which is what mp4 to text conversion typically represents.
A poor audio source may require manual editing though to flesh out the errors, which are somewhat unavoidable.