I've localized a video with a custom yolov4 model, which produces a boundary box around the subject that I use to crop the frame with. This method of localization seems incompatible with the current example notebooks in DeepPoseKit; creating an annotation set seems to expect fixed-size images by using a stacked numpy array while the boundary box changes shape constantly based on the subject's pose and distance.