I made a Youtube Video Summarizer using Python and ChatGPT

I’ve been talking about (and playing with) ChatGPT a lot lately – the AI is an excellent tool for novice programmers to make things a little beyond their usual skill-level, and it can feed into a few neat applications. One I’ve dabbled with lately is an application that can take a Youtube URL, read the transcript, input that transcript into ChatGPT, and ask for a summary. In the end, this code does that fairly well! So let me begin with the caveat that I myself am very much a novice.

One additional caveat is that due to a limitation on the number of ‘tokens’ ChatGPT accepts, this code breaks the Youtube transcript into batches. Those batches are summarized independently of one another – this can mean, at times, that ChatGPT can’t see the forest for the trees. It summarizes each batch it reads, not the document as a whole. It also sometimes gets a little confused at the end, if the final ‘batch’ doesn’t give it much to work with. That said, it’s still usually pretty good at producing a pretty good summary of a video – and as you might notice, it can be adapted for any text input. So, here’s the code. Remember, you’ll need your own openAI API key (which you can get after signing up on their website) and install the various modules.

import re
import openai
import tkinter as tk
from tkinter import simpledialog
from youtube_transcript_api import YouTubeTranscriptApi

openai.api_key = "YOUR API KEY HERE"

def extract_video_id(url):
    video_id = re.search(r'v=([^&]*)', url).group(1)
    return video_id

def get_transcript(url):
    video_id = extract_video_id(url)
    try:
        transcript = YouTubeTranscriptApi.get_transcript(video_id)
    except Exception as e:
        print(e)
        return

    with open('transcript.txt', 'w') as f:
        for line in transcript:
            f.write(line['text'] + '\\n')

def generate_batches(max_tokens):
    with open('transcript.txt', 'r') as f:
        transcript = f.read()

    batch_size = max_tokens - 1048
    batches = [transcript[i:i+batch_size] for i in range(0, len(transcript), batch_size)]

    responses = []
    for batch in batches:
        response = gpt_transcript(batch)
        responses.append(response)
    return responses

def gpt_transcript(transcript):
    completion = openai.Completion.create(
        engine="text-davinci-003",
        prompt="Analyze " + str(transcript) + " and summarize in bullet form. Make sure you capture what tier they are.",
        max_tokens=1048,
        n=1,
        stop=None,
        temperature=0.5
    )
    response = completion.choices[0].text
    print(response)
    return response

def main(url):
    get_transcript(url)
    responses = generate_batches(4071)

    with open('output.txt', 'w') as f:
        for response in responses:
            f.write(response + '\\n')

if __name__ == '__main__':
    root = tk.Tk()
    root.withdraw()

    url = simpledialog.askstring("Input", "Enter the YouTube URL:", parent=root)
    if url:
        main(url)
    else:
        print("No URL entered. Exiting program.")

What essentially happens here is that when you run the script, it will first produce a popup asking for your Youtube URL. It’ll then run that URL through a function which extracts the video ID, and runs that ID through the Youtube Transcript API, which fetches the transcript. I have this save to a file, since sometimes I want the full transcript. Be aware that many Youtube transcripts are software generated, and sometimes be a little bit inaccurate themselves.

So, once the transcript is generated, another function reads that transcript and breaks it into batches that do not exceed ChatGPT’s token limit, and feeds them one by one into ChatGPT. It will print the output into your terminal, and also save it to an output text file. Be aware, if you run it with a different URL, it’s going to overwrite the previous item.

Due to the nature of these deep learning AIs, you can sometimes feed the same video and get different results, and the quality of the results can sometimes vary considerably. I wouldn’t recommend this code for anything important, or trust the output too much, but it can be handy to get some cliffs-notes on a video. Another notable application is that with very minor modification, you could have this code read any batch of text, not just a Youtube transcript, and produce summaries of it.