AI
Popular article

Computer Vision and Deep Learning: Behind the Scenes of SCARFACE

Wapiki Team
January 10, 2026
11 min read
Computer VisionTensorFlowDeep LearningOpenCVGPU

The SCARFACE Challenge

SCARFACE must analyze 20+ simultaneous video streams in real-time:

  • Face detection (<100ms)
  • Facial recognition (<200ms)
  • Document OCR (ID cards, passports)
  • Anomaly detection
  • All with 99%+ accuracy.

    Computer Vision Pipeline

    1. Face Detection (MTCNN)

    We use MTCNN (Multi-task Cascaded Convolutional Networks) for detection:

    python
    import cv2
    from mtcnn import MTCNN
    
    detector = MTCNN()
    
    def detect_faces(frame):
        # RGB conversion
        rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    
        # Detection
        faces = detector.detect_faces(rgb_frame)
    
        return faces

    Optimization: Detection only every 5 frames, tracking between detections.

    2. Facial Recognition (FaceNet)

    Once the face is detected, we generate a 128D embedding with FaceNet:

    python
    from tensorflow.keras.models import load_model
    import numpy as np
    
    facenet_model = load_model('facenet_keras.h5')
    
    def get_face_embedding(face_image):
        # Resize 160x160 (required by FaceNet)
        face_pixels = cv2.resize(face_image, (160, 160))
    
        # Normalization
        face_pixels = face_pixels.astype('float32')
        mean, std = face_pixels.mean(), face_pixels.std()
        face_pixels = (face_pixels - mean) / std
    
        # Expand for batch
        samples = np.expand_dims(face_pixels, axis=0)
    
        # Embedding
        embedding = facenet_model.predict(samples)[0]
    
        return embedding

    3. Database Matching

    python
    from scipy.spatial.distance import cosine
    
    def find_match(embedding, database_embeddings, threshold=0.6):
        min_distance = float('inf')
        matched_person = None
    
        for person_id, db_embedding in database_embeddings.items():
            distance = cosine(embedding, db_embedding)
    
            if distance < min_distance:
                min_distance = distance
                matched_person = person_id
    
        if min_distance < threshold:
            return matched_person, 1 - min_distance  # Confidence
    
        return None, 0

    Document OCR

    For ID card analysis, we use Tesseract OCR combined with a custom model:

    python
    import pytesseract
    from PIL import Image
    
    def extract_id_info(id_card_image):
        # Preprocessing
        gray = cv2.cvtColor(id_card_image, cv2.COLOR_BGR2GRAY)
        denoised = cv2.fastNlMeansDenoising(gray)
    
        # OCR
        text = pytesseract.image_to_string(denoised, lang='eng')
    
        # Structured extraction with regex
        patterns = {
            'number': r'No\s*([A-Z0-9]+)',
            'name': r'Name\s*:\s*([A-Z\s]+)',
            'surname': r'Surname\s*:\s*([A-Z\s]+)',
            'birth_date': r'(\d{2}/\d{2}/\d{4})'
        }
    
        extracted = {}
        for key, pattern in patterns.items():
            match = re.search(pattern, text)
            if match:
                extracted[key] = match.group(1)
    
        return extracted

    GPU Optimizations

    TensorFlow GPU with CUDA

    python
    import tensorflow as tf
    
    # GPU configuration
    gpus = tf.config.list_physical_devices('GPU')
    if gpus:
        tf.config.experimental.set_memory_growth(gpus[0], True)
    
        # Mixed precision for x2 performance
        policy = tf.keras.mixed_precision.Policy('mixed_float16')
        tf.keras.mixed_precision.set_global_policy(policy)

    Batch Processing

    Instead of processing each face individually, we batch:

    python
    def process_batch(faces, batch_size=32):
        embeddings = []
    
        for i in range(0, len(faces), batch_size):
            batch = faces[i:i+batch_size]
            batch_embeddings = facenet_model.predict(np.array(batch))
            embeddings.extend(batch_embeddings)
    
        return embeddings

    System Architecture

    Backend: Spring Boot with gRPC for high-performance communication

    GPU Servers: NVIDIA RTX 3090 (24GB VRAM)

    Database: PostgreSQL for metadata, Redis for embeddings cache

    Queue: RabbitMQ for asynchronous processing

    Production Results

  • 🎯 **Accuracy**: 99.2%
  • ⚡ **Detection latency**: 85ms
  • ⚡ **Recognition latency**: 180ms
  • 📹 **Simultaneous cameras**: 20+
  • 💾 **Database**: 50,000+ indexed faces
  • 🔍 **False positive rate**: 0.3%
  • Conclusion

    Production Computer Vision requires a combination of performant pre-trained models, aggressive GPU optimizations and robust system architecture.


    *A Computer Vision project? [Let's discuss](/contact).*

    Did you like this article?

    Share it with your network!