Background

SAFR can be used to measure attention span or gaze duration of people in the camera view.  It does this by reporting when people are looking directly at the camera or within some angle away from the camera in any direction.  It is assumed that the camera is co-located with some object for which measuring gaze or attention span is of interest such as digital signage. 


SAFR face orientation measures

SAFR has two measures of face orientation - both relative to the line of sight from the persons face to the camera lens.  They are:

  1. Pitch, roll and yaw - SAFR can report the angle that the face deviates from the line of sight to the camera in three different planes:
    • Pitch: Vertical angle above or below the direct line of sight to the camera
    • Yaw: Horizontal angle left or right of the direct line of sight to the camera
    • Roll: Rotational angle clockwise or counterclockwise to the camera
  2. Center Pose (A.k.a. Center Pose Quality or CPQ) - A measure how directly a person is looking at the camera as a single easy to use number.  Center pose combines the three measures of pitch, yay and roll into a single easy to use metric from 0 (looking away from camera) to 1 (looking directly towards camrea)


Where to find face orientation

The ability for SAFR to report these values is an advanced feature.  It is available in the following ways:


SAFR Desktop Recognition Details

If you enable "Recognition Details" in the View menu of the SAFR Desktop application, you will see a list of faces wtih metrics printed for each in the bottom panel.  This information is very useful in troubleshooting and tuning Facial Recognition.


You can read off the following from above overlay text:

  • Q - Center pose quality.  From 0-1. Any value <0.7 suggests not looking at camera
  • Y - Yaw.  From 0-1.  Negative values face is looking left.  Positive means face is to right.
  • P - Pitch.  Same but - is down / + is up
  • R - Roll.  Same but "-" is counterclockwise / "+" is clockwise

Other values are S (Sharpness), C (Contrast), O (Occlusion) and in upper right are width x height.


?For mapping of angles to values for center pose, yaw, pitch and roll see Face Angles.


SAFR Event JSON data

While note exposed in the Events window, you can find Direct Gaze Duration, a measure of how long the subject was looking at the screen.  The direct gaze duration is controlled by "Direct gaze duration" setting in Recognition Preferences (recognizer.minimum-center-pose-direct-gaze in VIRGO or SDK) as shown below.

The default value of 0.7 represents subjects looking approximately 20° in any direction from center line.


The direct gaze duration is reported in raw event data.  It can be obtained along with overall event duration.  Events are exposed thru a REST API and delivered in JSON format.  Following shows an event with a select subset of records including direct gaze duration and total event duration:

Sample Event Jin SON data

    {
        "eventId": "e146244d-ec68-4034-8065-488ab77cd7e3",
        "siteId": "Branch0245",
        "sourceId": "CustomerService",
        "date": 1605926724476,
        "startTime": 1605926719616,
        "endTime": 1605926727819,
        "age": 47.0,
        "gender": "male",
        "maxSentiment": 0.08132152,
        "minSentiment": 0.08132152,
        "avgSentiment": 0.08132152,
        "directGazeDuration": 8203,
       ...
}

The duration is derived from the start and end times.  Times are in EPOCH milliseconds format.


The above event has an end time so the subject has left the view of the camera.  You can also get events in realtime if you are interested in taking action while a subject is still in view of the camera.   This is done by configuring SAFR to resend events at configurable intervals using the "Update in-progress event attributes (reporter.update-images in VIRGO or SDK)


Update events will have same data but there will not be an end time included.  The date attribute will indicate the current time of the event.


SDK

SAFR offers 2 SDKs (See SAFR SDK Documentation).  Both SDKs (SAFR SDK or the Embedded SDK) offer frame by frame reporting of center pose, yaw, pitch and roll.  With each frame of video SAFR detects all faces and reports the center pose, yaw, pitch and roll for each face.  The developer is then responsible for accumulating the data in ways that are useful.  For example, one could keep a running total of the attention in the last 60 second if developing an online learning application that wanted to alert a student if they were not paying attention.


Event duration and dwell time

A SAFR event has a start and end time.  And event is considered started when a subject first is detected by SAFR and ended when SAFR is no longer able to track the subject.  During the entire time SAFR is able to detect a subject, it tracks it from frame to frame to keep a continuous record of a single person.  This record is known as an event.


SAFR is able to track the total duration of a subject in view in two fundamentally different ways - either by viewing face or by viewing the entire body.  These are explained below along with the trade-offs of each.


Face Detection

When SAFR is performing face detection, it will only detect a face when it is visible to the camera.  With the latest high sensitivity face detector, SAFR is able to detect a face well beyond a profile pose (i.e. faces are detected when subjects are looking away from the camera and only a sliver of a face is visible).  Given this, SAFR is able to track subjects for a long duration while in view of the camera but there are times when subjects will disappear from view and re-appear.  When performing anonymous tracking, this results in two separate individuals being reported.


Face + Body Detection

SAFR is also able to detect both the face and a person's body and its able to combine these into a single entity.  This way SAFR can better track a person as they walk around in view of the camera, even if they are faced entirely away form the camera.


If performing anonymous detection, this means more accurate unique counts are obtained.  If performing recognition, SAFR is able to associate a person with the body and face and thus report full dwell time even if a known person is looking away from camera.


The downside to using person detection is it is significantly more resource intensive than face detection alone.  For example, a server capable of 60 face recognition cameras may only be able to do 12 person (face+body) detection cameras.


Summary

SAFR can provide gaze information by person in a few different ways (App, REST API and SDKs).  Further, SAFR allows for collection of dwell time based on simply visibility of the face or presence of the person for entire time they are in view of the camera.