Deep Dive

Face Matching in 5 Lines of Python: Detect, Encode, Compare

How face matching works under the hood and how to implement it in 5 lines of Python. Covers face detection, embedding vectors, and similarity comparison with real test results.

Two different photos of the same person compared side by side with a green MATCH badge between them

This tutorial uses the Face Analyzer API. See the docs, live demo, and pricing.

Your fintech app needs KYC. A user uploads a selfie and a photo of their ID. Your backend needs to answer one question: is this the same person? Between the shutter click and the green checkmark, three things happen.

This article breaks down those three steps, compares the models used at each stage, and shows how to implement the whole pipeline in 5 lines of Python.

The 3 Steps of Face Matching

Every face matching system, from Apple Face ID to airport e-gates, follows the same pipeline:

Three steps of face matching: 1. Detect face with bounding box, 2. Generate embedding vector, 3. Compare two vectors for similarity

Step 1: Detect

Find the face in the image. The detector outputs a bounding box (x, y, width, height) around each face. This crops out the background, hair, and clothing so the next step only sees the face.

DetectorAccuracy (WIDER FACE)SpeedBest for
RetinaFace96.9% (easy)~3.8s/face (CPU)Max accuracy, batch processing
MTCNN84.8% (easy)~0.4s/faceGood balance, widely used
MediaPipeLower~0.04s/faceReal-time, mobile, edge
YuNetGood~0.03s/faceFastest, real-time on CPU

The tradeoff is always accuracy vs speed. RetinaFace catches faces that others miss (small, occluded, angled), but it is 100x slower than MediaPipe. For KYC where the user takes a clear selfie, any detector works. For surveillance with crowded scenes, RetinaFace is the only reliable choice.

Step 2: Encode

Convert the cropped face into a numerical vector called an embedding. A neural network takes the face pixels and outputs a list of numbers, typically 128 or 512 values.

text
Photo of Alice → [0.23, -0.45, 0.78, 0.12, -0.33, ...]  (128 numbers)
Photo of Alice → [0.21, -0.47, 0.76, 0.14, -0.31, ...]  (similar!)
Photo of Bob   → [-0.67, 0.89, -0.12, 0.55, 0.44, ...]  (very different)

The key property: two photos of the same person produce similar vectors. Two photos of different people produce distant vectors. The model learns this during training on millions of face pairs.

ModelLFW AccuracyEmbedding SizeNotes
FaceNet (Google, 2015)99.63%128Introduced triplet loss. Still competitive
ArcFace (2018)99.83%512Current standard. Best on hard benchmarks
SFace (2021)99.60%128Better on low-res and surveillance images

LFW (Labeled Faces in the Wild) is the standard benchmark. All modern models score above 99%, which means the difference in practice comes down to how well they handle edge cases: bad lighting, extreme angles, low resolution.

Step 3: Compare

Measure the distance between two embedding vectors. If the distance is below a threshold, the faces match.

MetricHow it worksMatch thresholdUsed by
Cosine similarityAngle between vectors. 1.0 = identical> 0.6ArcFace, most APIs
Euclidean distanceStraight-line distance. Lower = closer< 1.1FaceNet (original paper)

The threshold is the critical parameter. Too low and you get false rejections (same person not matched). Too high and you get false accepts (different people matched). In production, most systems use cosine similarity with a threshold around 0.6, tuned based on their tolerance for each error type.

That is the entire pipeline. Detect, encode, compare. Everything else (liveness detection, anti-spoofing, lighting normalization) is built on top of these three steps.

The Code (5 Lines)

The Face Analyzer API runs detect + encode + compare on cloud GPUs. You send two images, it returns match or no match.

python
import requests

response = requests.post(
    "https://faceanalyzer-ai.p.rapidapi.com/compare-faces",
    headers={"x-rapidapi-key": "YOUR_API_KEY", "x-rapidapi-host": "faceanalyzer-ai.p.rapidapi.com"},
    files={"source_image": open("selfie.jpg", "rb"), "target_image": open("id_photo.jpg", "rb")},
)
print(response.json()["body"])

The response tells you if the faces match:

json
{
  "matchedFaces": [
    {
      "boundingBox": {"topLeft": {"x": 0.47, "y": 0.27}, "bottomRight": {"x": 0.89, "y": 0.64}},
      "landmarks": {"eyeLeft": {...}, "eyeRight": {...}, "mouth": {...}}
    }
  ],
  "unmatchedFaces": []
}

matchedFaces is not empty = same person. unmatchedFaces lists faces in the target image that did not match the source face.

Real Test: Same Person vs Different Person

We tested with real photos to verify the results.

Test 1: Same person, two different photos

Two different photos of the same person compared with face matching API showing MATCH result
python
files = {
    "source_image": open("photo_1.jpg", "rb"),
    "target_image": open("photo_2.jpg", "rb"),
}
result = requests.post(URL, headers=HEADERS, files=files).json()["body"]
print(f"Matched: {len(result['matchedFaces'])}")    # 1
print(f"Unmatched: {len(result['unmatchedFaces'])}") # 0

API response:

json
{
  "matchedFaces": [
    {
      "boundingBox": {"topLeft": {"x": 0.31, "y": 0.12}, "bottomRight": {"x": 0.72, "y": 0.88}},
      "landmarks": {"eyeLeft": {"center": {"x": 0.43, "y": 0.41}}, "eyeRight": {"center": {"x": 0.62, "y": 0.42}}}
    }
  ],
  "unmatchedFaces": []
}

Result: matched. Different photo, different angle, different lighting. matchedFaces contains the face with its bounding box and landmarks. unmatchedFaces is empty.

Test 2: Two different people

Two photos of different people compared with face matching API showing NO MATCH result
python
files = {
    "source_image": open("person_a.jpg", "rb"),
    "target_image": open("person_b.jpg", "rb"),
}
result = requests.post(URL, headers=HEADERS, files=files).json()["body"]
print(f"Matched: {len(result['matchedFaces'])}")    # 0
print(f"Unmatched: {len(result['unmatchedFaces'])}") # 1

API response:

json
{
  "matchedFaces": [],
  "unmatchedFaces": [
    {
      "boundingBox": {"topLeft": {"x": 0.28, "y": 0.10}, "bottomRight": {"x": 0.75, "y": 0.92}},
      "landmarks": {"eyeLeft": {"center": {"x": 0.40, "y": 0.42}}, "eyeRight": {"center": {"x": 0.60, "y": 0.43}}}
    }
  ]
}

Result: not matched. matchedFaces is empty. The target face appears in unmatchedFaces because it does not match the source.

Where Developers Use This

  • KYC onboarding: compare a selfie to the photo on an uploaded ID document
  • Access control: verify an employee against a face database before granting entry
  • Duplicate detection: block users who create multiple accounts with the same face
  • Photo apps: automatically group photos by person using the face repository endpoints

Sources

Frequently Asked Questions

How does face matching work?
Face matching works in three steps. First, a face detector locates the face in the image and draws a bounding box around it. Second, a neural network converts the cropped face into a numerical vector called an embedding (typically 128 or 512 numbers). Third, the system compares two embeddings using cosine similarity or Euclidean distance. If the distance is below a threshold, the faces match.
What is a face embedding?
A face embedding is a numerical vector (a list of numbers like [0.23, -0.45, 0.78, ...]) that represents a face. Two photos of the same person produce similar vectors, while photos of different people produce distant vectors. Models like FaceNet and ArcFace are trained to produce these embeddings.
Can I do face matching without installing dlib or a GPU?
Yes. A Face Comparison API runs the detection and embedding models on cloud GPUs. You send two images via HTTP and get back a match or no-match result. No local model, no dlib, no GPU needed. The free tier includes 30 requests per month.

Ready to Try Face Analyzer?

Check out the full API documentation, live demos, and code samples on the Face Analyzer spotlight page.

Related Articles

Continue learning with these related guides and tutorials.