-
Notifications
You must be signed in to change notification settings - Fork 61
Two Faces: How Face Tracking Works
This page provides an in-depth explanation of how the _process_face_tracking_both function operates. It details the process for tracking two faces concurrently, handling the complexity of two moving targets and avoiding common issues like flickering or swapping of face assignments. We'll delve into the logic, algorithms, and nuances that make this function work.
The _process_face_tracking_both function aims to track two distinct faces throughout the video, allowing for a smooth face swap between the two. This requires keeping track of the embeddings, positions, and unique ids for two faces, making decisions on how to handle new faces in each frame, and preventing the tracker from mixing up the identities.
Let's break down the key variables and their roles in this process:
-
first_face_embedding(numpy.ndarray, Optional):- The face embedding of the first face being tracked. This is a high-dimensional vector that uniquely represents facial features.
- Initialized to
None, assigned to the first matching face and then updated each frame using weighted average.
-
second_face_embedding(numpy.ndarray, Optional):- The face embedding of the second face being tracked.
- Initialized to
None, assigned to the first matching face and then updated each frame using weighted average.
-
first_face_position(Tuple[float, float], Optional):- The (x, y) coordinates of the center of the first face being tracked.
- Used to calculate position consistency with previous frames.
- Initialized to
None, assigned to the first matching face and then updated each frame using weighted average.
-
second_face_position(Tuple[float, float], Optional):- The (x, y) coordinates of the center of the second face being tracked.
- Used to calculate position consistency with previous frames.
- Initialized to
None, assigned to the first matching face and then updated each frame using weighted average.
-
first_face_id(int, Optional):- Unique identification number for the first face on each frame.
- Initialized to
None, assigned to the first matching face.
-
second_face_id(int, Optional): * Unique identification number for the second face on each frame. * Initialized toNone, assigned to the first matching face. -
first_face_position_history(deque):- A double-ended queue storing the last 30 face positions for the first face.
- Used to calculate an average position, useful for predicting where the face should be if we did not see the face on a current frame.
- Limited to a size of 30, so it acts as a short memory of where the face was.
-
second_face_position_history(deque):- A double-ended queue storing the last 30 face positions for the second face.
- Used to calculate an average position, useful for predicting where the face should be if we did not see the face on a current frame.
- Limited to a size of 30, so it acts as a short memory of where the face was.
-
target_face(Face):- An object containing the information about each of the detected faces in the current frame.
- This includes properties like the bounding box, landmarks, and embedding.
- It's what the algorithm uses to find if this face is either the first or second face that we were previously tracking.
-
source_face(List[Face]):- A list containing the
Faceobject for our source face. - This is the face we use to replace the tracked target face.
- A list containing the
-
source_index(int):- An integer determining which index from the
source_faceto use for thistarget_face. - Used for identifying which of our two faces is going to be swapped.
- An integer determining which index from the
-
source_face_order(List[int]):- A list containing
[0, 1]which is the order of which source faces to use. If flipping faces is enabled this list will change to[1, 0].
- A list containing
The _process_face_tracking_both function handles two-face tracking through the following steps:
-
Initialization of Variables:
- We use
globals()to check if the face history queues exist or not, if they do not then we create them and add them to theglobals()variables, this means these variables will persist between function calls. This is to ensure the face position histories are persistent and shared between function calls. - The function extracts the embedding (
target_embedding), position (target_position), and unique id (face_id) of thetarget_face. -
use_pseudo_faceis a boolean we use to know if we need to use a fake face or not.
- We use
-
Data Structure for Tracked Faces:
- A dictionary,
tracked_faces, is created to store data for both tracked faces in one structure. This improves how we can loop through each tracked face.tracked_faces = { 0: { "embedding": first_face_embedding, "position": first_face_position, "id": first_face_id, "history": first_face_position_history }, 1: { "embedding": second_face_embedding, "position": second_face_position, "id": second_face_id, "history": second_face_position_history }, }
- A dictionary,
-
Check If All Faces Are Initialized:
- The code checks if both
first_face_embeddingandsecond_face_embeddingare notNone, implying that we're already tracking two faces. -
If both faces are already being tracked:
- We initialize variables for
best_match_scoreandbest_match_index, which are set to-1because they are going to be used to track the best face match. - We loop through each tracked face by looping through the
tracked_facesdictionary. - For each tracked face, we do the following:
- The embedding, position and history of the current face are extracted
-
Similarity: The cosine similarity between the tracked face's embedding and the
target_embeddingis calculated. The closer this value is to 1.0 the more similar it is. -
Position Consistency: A score is determined based on the inverse of the distance from the
target_positionand the average of the position history. -
Total Match Score: A weighted score of the
similarityandposition_consistencyis calculated using weights frommodules.globals.embedding_weight_sizeandmodules.globals.position_size. The code below shows the total score calculation.TOTAL_WEIGHT = EMBEDDING_WEIGHT * modules.globals.weight_distribution_size + POSITION_WEIGHTscore = ((EMBEDDING_WEIGHT * similarity + POSITION_WEIGHT * position_consistency) / TOTAL_WEIGHT)
-
Stickiness: If the unique
idof the detected face is the same as the current tracked face's id we multiply the score by(1 + STICKINESS_FACTOR). This helps to avoid flickering between multiple faces. - If the calculated score is greater then the
best_match_score, we remember the index of the best match and its score.
-
If the best score is found to be higher then the
modules.globals.sticky_face_value:- We get the data of the tracked face from the
tracked_facesdictionary using thebest_match_index. - We update the embedding of the tracked face with a weighted average of the current embedding and the new
target_embedding, using weights frommodules.globals.old_embedding_weightandmodules.globals.new_embedding_weight. - We update the position of the tracked face with a weighted average of the current position and new
target_position, using a static weight0.8for the old position and0.2for the new position. - The tracked face's id is also updated.
- The new position is appended to the face's history.
- We also update the score on global variable for the face that was tracked for display in the UI
- The corresponding index from the source face order from
source_face_orderis determined and stored insource_index.
- We get the data of the tracked face from the
- We initialize variables for
-
Else if the current best score is below
modules.globals.pseudo_face_thresholdandmodules.globals.use_pseudo_faceis true:- We set
use_pseudo_faceto true. - If the
best_match_indexis0then we create a pseudo face using the average of thefirst_face_position_historyor if it is empty we usefirst_face_position. - Otherwise if the
best_match_indexis1then we create a pseudo face using the average of thesecond_face_position_historyor if it is empty we usesecond_face_position. - If the best match was not
0or1we use the currenttarget_position. - We call
_process_face_swapwith the pseudo face,source_faceandsource_index.
- We set
- Otherwise, if no good match was found, or
modules.globals.use_pseudo_faceis false we do nothing and return
- The code checks if both
-
Initialization of One or Both Faces:
- If the above check fails (meaning not both faces are being tracked), then at least one of the faces needs to be initialized.
- The code extracts the correct
source_indexfromsource_face_order. -
If the extracted
source_indexis0:- The
target_embeddingis assigned tofirst_face_embedding. - The
target_positionis assigned tofirst_face_position. - The
face_idis assigned tofirst_face_id. - The
target_positionis also appended to thefirst_face_position_history.
- The
-
Else (the
source_indexis1):- The
target_embeddingis assigned tosecond_face_embedding. - The
target_positionis assigned tosecond_face_position. - The
face_idis assigned tosecond_face_id. - The
target_positionis also appended to thesecond_face_position_history.
- The
-
Pseudo Face Call
- If
use_pseudo_facewas set totruein step3then we need to call_process_face_swapwith a pseudo face- If the current
source_indexis0then a pseudo face will be created using thefirst_face_position_history, or if that is empty the currentfirst_face_position. - Otherwise a pseudo face will be created using the
second_face_position_history, or if that is empty the currentsecond_face_position. - We then return the result of
_process_face_swapwith the pseudo face.
- If the current
- If
-
Face Swap Call
- Otherwise, if no pseudo face is used we call
_process_face_swapwith thetarget_face,source_face, and thesource_indexand return.
- Otherwise, if no pseudo face is used we call
- Simultaneous Tracking: This function is optimized for tracking two faces at once without getting them mixed up.
- Weighted Averaging: Weighted averaging of the embeddings and positions ensures a smoother update, reducing abrupt changes and flickering and improving the stability of the tracking.
- History-Based Position Tracking: Using a history of previous positions makes it less likely for the algorithm to lose track of the face, even if the face moves quickly or briefly disappears.
- Stickiness Factor: This helps maintain continuity in face tracking. It is more likely to track the same face between frames than to switch to a different face.
- Pseudo Faces: Pseudo faces can be used if the face is occluded or if the face turns away from the camera.
-
Modular Design: The function builds upon other modules such as
extract_face_embedding,get_face_center,_process_face_swap, andcosine_similarity. This promotes code reuse and readability. -
Global Module: The use of global values in
modules.globalsallows for configuration of many of these variables such as thresholds, weights and distances on the fly from the UI without modifying the code.
- Performance: This function may require more resources compared to the single-face tracking because of the extra computations for each face being tracked.
- Robustness: While this algorithm is robust, it can still have some issues in cases of rapid movements, extreme angles, or significant occlusions of the face.
-
Settings: The specific values used for constants like
STICKINESS_FACTOR,embedding_weight_size,position_sizeand thepseudo_face_thresholdall have an impact on the overall performance of the tracker and can be changed on the fly from the UI.
The _process_face_tracking_both function is a sophisticated algorithm for tracking two faces in video. It combines embeddings, positions, weighted averages, history and a stickiness factor to create stable, robust tracking. Understanding these intricacies provides insight into the challenges and techniques involved in achieving robust and seamless face-swapping with multiple targets. This detailed exploration should provide a comprehensive understanding of this advanced face tracking component.