Below is a DRAFT Article based on a single response to a speciric.  More work is needed to finish it.  It should be generalized.


Question:

I wish to build a Digital Signage System that performs following functions:

  • Face Detection / Recognition
  • Person Characterization
  • Event Generation and Reporting
  • Viewer Attention Metrics for our Digital screens


Resposne:

Some thoughts on this application:

  • You should perform a site survey to asses the quality of input face image we'll be getting with their cameras (a few months back I filed bugs on android that was preventing it from using full camera resolution and they were fixed. So now we should use full camera resolution).
  • Our android app can be used as a test tool but it does not report direct gaze duration or person detection.  To test, they should use the Desktop client.   These are possible with the Android SDK.
  • See  SAFR SDKs Comparison for comparison of the SDKs.  For this use case either Embedded SDK or SAFR SDK are viable but SAFR SDK is preferred because it already has tracking implemented which is important to accurately track each person as they traverse the scene.  Embedded sdk is better suited to "snapshot counting" where you means "volume" at unique points in time and are not as concerned about "unique subject" counts
  • The key differences between the SDKs are:
    • Embedded SDK takes image input - SAFR SDK takes video input 
    • SAFR SDK tracks each subject and includes local unique id for each subject being reported on
      • This makes it much easier to measure total dwell time, attention and unique traffic counts
    • Both can perform recognition.  
    • SAFR SDK relies upon connection to server for recognition (very low bandwidth).  Embedded SDK performs recognition locally.
    • Both can provide all needed metrics: age, gender, sentiment, recognition, person detection

 

 

Choice of features used (Face vs Person+Face detection) to perform desired analytics depends upon how important it is to get a complete picture of viewer attention.  With face detection, total duration will only include times when subject's face is in view where with person detection you get a true representation of how long subject is in view of camera.  Things get better if you add recognition to face detection because you can combine multiple events (with person detection you are more likely to get a single event for entire time a person is in view of camera whereas with face detection you get multiple events and if you are not performing recognition then you have no way to combine those and your metrics become skewed to higher count of shorter visits.).

 

Auto-enrolling faces will allow true unique counting (repeat vs. new visitors). As noted, with person detection you generally get single event for entire time person is in view of the camera (as long as there are not major occlusion and camera angle is high enough) but auto-enrolling can allow combining events form multiple cameras or multiple days.  With face recognition only, auto-enrolling will allow for combining multiple "face appearances" within a single camera view or across cameras/days.

 

We need to discuss with the developer about the requirements of auto-enrolling faces ("learning on the fly").  They need to understand this is only possible if faces at some point are >220px ear to ear and subjects are within 20° in direction of camera.  Trying to go lower results in higher FMR and thus skewed stats.

 

We have person detection on android embedded sdk.  Not on video sdk.

 

Here is a page Mark compiled: SDK Features that compares features in the SDKs.  @Mark Molina, can you confirm that Android SDK has person tracking (the android app does not have this but maybe its only in the sdk).  Could you also ask about directGazeDuration reporting (I'm pretty sure it is not part of the SDK but worth asking).