How does face matching work?

Face matching works in three steps. First, a face detector locates the face in the image and draws a bounding box around it. Second, a neural network converts the cropped face into a numerical vector called an embedding (typically 128 or 512 numbers). Third, the system compares two embeddings using cosine similarity or Euclidean distance. If the distance is below a threshold, the faces match.

What is a face embedding?

A face embedding is a numerical vector (a list of numbers like [0.23, -0.45, 0.78, ...]) that represents a face. Two photos of the same person produce similar vectors, while photos of different people produce distant vectors. Models like FaceNet and ArcFace are trained to produce these embeddings.

Can I do face matching without installing dlib or a GPU?

Yes. A Face Comparison API runs the detection and embedding models on cloud GPUs. You send two images via HTTP and get back a match or no-match result. No local model, no dlib, no GPU needed. The free tier includes 30 requests per month.

Face Matching in 5 Lines of Python: Detect, Encode, Compare

This tutorial uses the Face Analyzer API. See the docs, live demo, and pricing.

Your fintech app needs KYC. A user uploads a selfie and a photo of their ID. Your backend needs to answer one question: is this the same person? Between the shutter click and the green checkmark, three things happen.

This article breaks down those three steps, compares the models used at each stage, and shows how to implement the whole pipeline in 5 lines of Python.

The 3 Steps of Face Matching

Every face matching system, from Apple Face ID to airport e-gates, follows the same pipeline:

Three steps of face matching: 1. Detect face with bounding box, 2. Generate embedding vector, 3. Compare two vectors for similarity

Step 1: Detect

Find the face in the image. The detector outputs a bounding box (x, y, width, height) around each face. This crops out the background, hair, and clothing so the next step only sees the face.

Detector	Accuracy (WIDER FACE)	Speed	Best for
RetinaFace	96.9% (easy)	~3.8s/face (CPU)	Max accuracy, batch processing
MTCNN	84.8% (easy)	~0.4s/face	Good balance, widely used
MediaPipe	Lower	~0.04s/face	Real-time, mobile, edge
YuNet	Good	~0.03s/face	Fastest, real-time on CPU

The tradeoff is always accuracy vs speed. RetinaFace catches faces that others miss (small, occluded, angled), but it is 100x slower than MediaPipe. For KYC where the user takes a clear selfie, any detector works. For surveillance with crowded scenes, RetinaFace is the only reliable choice.

Step 2: Encode

Convert the cropped face into a numerical vector called an embedding. A neural network takes the face pixels and outputs a list of numbers, typically 128 or 512 values.

text

Photo of Alice → [0.23, -0.45, 0.78, 0.12, -0.33, ...]  (128 numbers)
Photo of Alice → [0.21, -0.47, 0.76, 0.14, -0.31, ...]  (similar!)
Photo of Bob   → [-0.67, 0.89, -0.12, 0.55, 0.44, ...]  (very different)

The key property: two photos of the same person produce similar vectors. Two photos of different people produce distant vectors. The model learns this during training on millions of face pairs.

Model	LFW Accuracy	Embedding Size	Notes
FaceNet (Google, 2015)	99.63%	128	Introduced triplet loss. Still competitive
ArcFace (2018)	99.83%	512	Current standard. Best on hard benchmarks
SFace (2021)	99.60%	128	Better on low-res and surveillance images

LFW (Labeled Faces in the Wild) is the standard benchmark. All modern models score above 99%, which means the difference in practice comes down to how well they handle edge cases: bad lighting, extreme angles, low resolution.

Step 3: Compare

Measure the distance between two embedding vectors. If the distance is below a threshold, the faces match.

Metric	How it works	Match threshold	Used by
Cosine similarity	Angle between vectors. 1.0 = identical	> 0.6	ArcFace, most APIs
Euclidean distance	Straight-line distance. Lower = closer	< 1.1	FaceNet (original paper)

The threshold is the critical parameter. Too low and you get false rejections (same person not matched). Too high and you get false accepts (different people matched). In production, most systems use cosine similarity with a threshold around 0.6, tuned based on their tolerance for each error type.

That is the entire pipeline. Detect, encode, compare. Everything else (liveness detection, anti-spoofing, lighting normalization) is built on top of these three steps.

The Code (5 Lines)

The Face Analyzer API runs detect + encode + compare on cloud GPUs. You send two images, it returns match or no match.

python

import requests

response = requests.post(
    "https://faceanalyzer-ai.p.rapidapi.com/compare-faces",
    headers={"x-rapidapi-key": "YOUR_API_KEY", "x-rapidapi-host": "faceanalyzer-ai.p.rapidapi.com"},
    files={"source_image": open("selfie.jpg", "rb"), "target_image": open("id_photo.jpg", "rb")},
)
print(response.json()["body"])

The response tells you if the faces match:

json

{
  "matchedFaces": [
    {
      "boundingBox": {"topLeft": {"x": 0.47, "y": 0.27}, "bottomRight": {"x": 0.89, "y": 0.64}},
      "landmarks": {"eyeLeft": {...}, "eyeRight": {...}, "mouth": {...}}
    }
  ],
  "unmatchedFaces": []
}

matchedFaces is not empty = same person. unmatchedFaces lists faces in the target image that did not match the source face.

Real Test: Same Person vs Different Person

We tested with real photos to verify the results.

Test 1: Same person, two different photos

Two different photos of the same person compared with face matching API showing MATCH result

python

files = {
    "source_image": open("photo_1.jpg", "rb"),
    "target_image": open("photo_2.jpg", "rb"),
}
result = requests.post(URL, headers=HEADERS, files=files).json()["body"]
print(f"Matched: {len(result['matchedFaces'])}")    # 1
print(f"Unmatched: {len(result['unmatchedFaces'])}") # 0

API response:

json

{
  "matchedFaces": [
    {
      "boundingBox": {"topLeft": {"x": 0.31, "y": 0.12}, "bottomRight": {"x": 0.72, "y": 0.88}},
      "landmarks": {"eyeLeft": {"center": {"x": 0.43, "y": 0.41}}, "eyeRight": {"center": {"x": 0.62, "y": 0.42}}}
    }
  ],
  "unmatchedFaces": []
}

Result: matched. Different photo, different angle, different lighting. matchedFaces contains the face with its bounding box and landmarks. unmatchedFaces is empty.

Test 2: Two different people

Two photos of different people compared with face matching API showing NO MATCH result

python

files = {
    "source_image": open("person_a.jpg", "rb"),
    "target_image": open("person_b.jpg", "rb"),
}
result = requests.post(URL, headers=HEADERS, files=files).json()["body"]
print(f"Matched: {len(result['matchedFaces'])}")    # 0
print(f"Unmatched: {len(result['unmatchedFaces'])}") # 1

API response:

json

{
  "matchedFaces": [],
  "unmatchedFaces": [
    {
      "boundingBox": {"topLeft": {"x": 0.28, "y": 0.10}, "bottomRight": {"x": 0.75, "y": 0.92}},
      "landmarks": {"eyeLeft": {"center": {"x": 0.40, "y": 0.42}}, "eyeRight": {"center": {"x": 0.60, "y": 0.43}}}
    }
  ]
}

Result: not matched. matchedFaces is empty. The target face appears in unmatchedFaces because it does not match the source.

Where Developers Use This

KYC onboarding: compare a selfie to the photo on an uploaded ID document
Access control: verify an employee against a face database before granting entry
Duplicate detection: block users who create multiple accounts with the same face
Photo apps: automatically group photos by person using the face repository endpoints

Sources

FaceNet: A Unified Embedding for Face Recognition and Clustering : the Google paper (2015) that introduced triplet loss for face embeddings
ArcFace: Additive Angular Margin Loss for Deep Face Recognition : the current standard loss function for face embedding models (2018)
NIST Face Recognition Vendor Test (FRVT) : the benchmark used to evaluate face recognition systems worldwide
LearnOpenCV: Face Detection Model Comparison : benchmark of face detectors (RetinaFace, MTCNN, MediaPipe, YuNet) on WIDER FACE dataset
RetinaFace and MTCNN Performance Comparison : WIDER FACE accuracy benchmarks (96.9% vs 84.8% on easy set)
IEEE: Comparison of ArcFace, FaceNet and FaceNet512 on DeepFace Framework : LFW accuracy benchmarks for embedding models (99.83% ArcFace, 99.63% FaceNet)

Frequently Asked Questions

How does face matching work?: Face matching works in three steps. First, a face detector locates the face in the image and draws a bounding box around it. Second, a neural network converts the cropped face into a numerical vector called an embedding (typically 128 or 512 numbers). Third, the system compares two embeddings using cosine similarity or Euclidean distance. If the distance is below a threshold, the faces match.
What is a face embedding?: A face embedding is a numerical vector (a list of numbers like [0.23, -0.45, 0.78, ...]) that represents a face. Two photos of the same person produce similar vectors, while photos of different people produce distant vectors. Models like FaceNet and ArcFace are trained to produce these embeddings.
Can I do face matching without installing dlib or a GPU?: Yes. A Face Comparison API runs the detection and embedding models on cloud GPUs. You send two images via HTTP and get back a match or no-match result. No local model, no dlib, no GPU needed. The free tier includes 30 requests per month.

Ready to Try Face Analyzer?

Check out the full API documentation, live demos, and code samples on the Face Analyzer spotlight page.

Face Matching in 5 Lines of Python: Detect, Encode, Compare

The 3 Steps of Face Matching

Step 1: Detect

Step 2: Encode

Step 3: Compare

The Code (5 Lines)

Real Test: Same Person vs Different Person

Test 1: Same person, two different photos

Test 2: Two different people

Where Developers Use This

Sources

Frequently Asked Questions

Ready to Try Face Analyzer?

Related Articles

Build Face-Based Access Control with a Comparison API

RF-DETR vs YOLO vs Cloud API: Which Should You Actually Use in 2026?

Content Moderation for Marketplace Seller Listings