AI-Youtube-Shorts-GeneratorRepository49/100 via “speech-to-text transcription with timestamp alignment”
A python tool that uses GPT-4, FFmpeg, and OpenCV to automatically analyze videos, extract the most interesting sections, and crop them for an improved viewing experience.
Unique: Integrates Whisper transcription directly into the pipeline with automatic timestamp extraction, eliminating the need for separate transcription tools. Uses FFmpeg for robust audio extraction from any video container format, handling codec variations automatically.
vs others: More accurate than generic speech-to-text APIs (Whisper is trained on 680k hours of multilingual audio) and cheaper than human transcription services, while providing timestamps required for video cropping without additional processing steps.