Notes on indexing YouTube math channels


Off the top of my head, these are the types of channels I've encountered:

Channels where we want to pick out what videos to keep.

Channels where we want to pick out which videos to exclude. For example, most of Eddie Woo's channel consists of high-quality math videos, but there are a few promotional videos, which we would like to exclude.

Channels where we want to pick out just a few playlists. Maybe only a few playlists are dedicated to math, and all other videos are non-mathematical. For example, a university might publish lots of videos, each belonging to a course playlist. Some of those courses are math related, but most are not.

Channels where videos or playlists are titled according to some sort of pattern. The pattern can be detected by matching the title against a regex. In case the titles can't be matched against a regex, or using a regex would be difficult, we could pass the title to a total predicate instead.


Each channel could have a set of commands, where each command is either a name, or name-value pair. The include commands would be executed first, then the exclude commands.

include_video: yt_video_id
include_playlist: yt_playlist_id
include_all_playlists
include_videos_by_regex: regex
include_playlists_by_regex: regex
include_videos_by_predicate: predicate
include_playlists_by_predicate: predicate
include_all_videos
exclude_video: yt_video_id
exclude_playlist: yt_playlist_id
exclude_video_by_regex: regex
exclude_playlist_by_regex: regex