Movie to txt in dutch

Last Updated or created 2023-01-05

In the past i’ve converted some VHS movies speech to text, using all kinds of tools.
Lets use some opensource tools!

pip install moviepy
pip install SpeechRecognition

Create a python script with the following:
(Called mine wav2txt.py)

import math, contextlib
import speech_recognition as sr
from moviepy.editor import AudioFileClip
movie_audio_file_name = "movieadiofile.wav"
with contextlib.closing(wave.open(movie_audio_file_name,'r')) as f:
    frames = f.getnframes()
    rate = f.getframerate()
    duration = frames / float(rate)
total_duration = math.ceil(duration / 60)
r = sr.Recognizer()
for i in range(0, total_duration):
    with sr.AudioFile(movie_audio_file_name) as source:
        audio = r.record(source, offset=i*60, duration=60)
    f = open("transcription.txt", "a")
    f.write(r.recognize_google(audio, language="nl-NL"))
    f.write(" ")
f.close()

Now convert a movie to wav using below.

ffmpeg -i /fileserver/path/koolhoven.mkv movieaudiofile.wav

run python3 wav2txt.py

output
(Note .. these are not timestamped for subtitles)
I only needed the things being said in the home movie recordings as text.

Ik zit hier in de film The James Dean aan de
wereld voorstelde en daarmee de tienerfilm ingeleverd introduceren zelden werden onrustige 10 asiel zo mooi blootgelegd als ik deze film van Nicolas bij en dat wordt dan meteen toevallig even de mooiste
titels ooit wel eens autocross vanavond kijken we naar de kom ik nog even veel zomer dat je voor het eerste meisje Zoem de eerste baantje

etc..

Leave a Reply

Your email address will not be published. Required fields are marked *