Sahha Data Science Report 003
Data science report and model status as of 1st June 2022. New model numbers.
Sahha’s mental health status API utilizes phone sensor data to determine depression risk using a combination of statistical modelling and machine learning (If you want to build on Sahha you can join here).
This data is collected passively using Sahha’s API/SDKs after they are installed into mobile apps.
Once this data is analyzed, it provides an objective assessment of depression risk that can supplement or replace subjective assessments, questionnaires, and surveys.
We are making accurate, objective assessments of different mental illnesses available to all digital healthcare providers in a simple-to-integrate SDK and at low cost.
This report provides an overview of how the Sahha depression risk API was created.
Report 003
This report is detailing Sahha’s latest binary classification models (as of 1st June 2022), which will provide a ‘depressed’ or ‘not depressed’ state for a person based on their behavioral data collected through the Sahha SDK.
The data the technology is trained on comes from over 2700 participants in 20 mostly Western countries. These participants feature diverse demographics in age, gender, education, location and income, to name a few.
We chose a highly heterogeneous dataset to ensure our models apply to a diverse group of people and, subsequently, are as commercially viable to the businesses who use them.
Participant data is collected via Sahha research apps on iOS and Android which access native phone sensors and software data as well as through a weekly PHQ9 survey with two additional questions added to it.
Sahha’s research initiative will be ongoing as we look to increase our participant pool and broaden the mental illnesses the technology can determine.
What’s changed between 002 and 003?
Slightly decreased accuracy but significantly higher support, why is this important? More support means a higher representation of the population.
What do we look for?
The phone sensor and software data being collected includes phone lock/unlock state as well as sleep time, but we will continue to explore non-personally identifiable sensor and software inputs as they become available.
In total we compute 67 features:
30 screen
28 sleep
7 sleep-screen interaction
2 demographic
The types of features computed for each phenotype (screen/sleep) are as follows:
Averages of count data, such as average minutes slept
Standard deviations of count data
Similarity/regularity
Entropy
Time of day features, such as proportion of sleep spent at night
Null count
After data cleaning and feature extraction our participant pool is split as such:
Depression level:
None 72.50%
Mild 18.34%
Moderate 5.27%
Severe 3.89%
The Model
Our best-performing commercial models are all CatBoost classifiers on binary classification.
We label classes:
Depressed = 1 (positive class)
Not-Depressed = 0 (negative class)
What follows are the metrics for our best performing binary classifier to date, each with their advantage depending on the desired balance of specificity and sensitivity.
Model: ‘Catboost Classifier Model’:
Key terms:
TP = True positives
The number of correctly predicted positive samples
TN = True negatives
The number of correctly predicted negative samples
FP = False positives
The number of samples incorrectly predicted to be positive
FN = False negatives
The number of samples incorrectly predicted to be negative
Accuracy Score: 0.71 (71%)
Defined as the number of correct predictions (TP + TN) divided by the total number of predictions (TP + TN + FP + FN). This metric can be misleading for imbalanced classification
Balanced Accuracy Score: 0.71 (71%)
Defined as the average of recall obtained on each class. This is a more meaningful statistic for imbalanced classification
Depressed class (1)
Precision: 0.48
Recall (or sensitivity): 0.71
Specificity: 0.71
F1: 0.57
Support: 225
Not-Depressed class (0)
Precision: 0.86
Recall (or sensitivity): 0.71
Specificity: 0.71
F1: 0.78
Support: 592
The production model as of 1st June 2022 is this model, documented here.
What’s coming?
Improved binary classification specificity and sensitivity (on going)
Models that factor in device motion and pedometer data
Multi-class classification models (The ‘Berg’ Model) featuring the ability to determine the probability of a sub-state (e.g: mild depression).
Population baseline model - sending back an immediate prediction of the likelihood of depression using population metrics in combination with user demographic information.
If you have further questions, please reach out to engineering@sahha.ai or join our Slack Developer Community