The response from the Author to both Reviews
Mobile applications have the advantage of possibly reaching all digitalized parts of society. They can be easily provided since their scalability does not depend on limited resources such as healthcare professionals. Their effect on the general population as part of an insurance-covered stress-prevention program has not yet been evaluated.
Users of the app "7Mind" who participated in a mindfulness-based stress management course (ABSM) answered questionnaires about their mental state. This survey-based, pre-post, single-arm design aimed to measure the user's mindfulness and the overall effects on the participants. To collect parameters regarding mindfulness, the Freiburg Mindfulness Inventory was used.
559 paired and 9746 unpaired datasets were analyzed for pre-post differences. The Wilcoxon signed-rank test for paired groups and the Wilcoxon rank-sum test for unpaired groups returned significant improvements in mindfulness, perceived pain, and other variables.
The results show that meditation apps could play a role in improving mental health in the general population. However, the drop-out rates demand further investigation of reasons for non-adherence and a control group design. Additionally, future research needs to investigate whether certain social groups benefit less from digital mindfulness interventions than others.
Mindfulness techniques have shown significant effects on healthy individuals  and might positively affect patients suffering from chronic pain  and mental disorders. 'Mindfulness' generally describes the awareness of the present moment by observing and accepting unfolding emotions, experiences, thoughts, and sensations non-judgementally. Mobile phone applications can be easily provided since their scalability does not depend on limited resources such as healthcare professionals. They have the potential to reach all digitalised parts of society, also those who cannot attend in-person meditation sessions due to their physical limitations.
According to a recent meta-analysis of mindfulness meditation applications, many papers investigated the influence of app-based mindfulness-based stress reduction programs on specific groups of people such as students, employees, and patients with chronic diseases (e.g., cancer) in an intervention with more than approx. 100 participants. The general population was only investigated in a small number of studies/studies with small sample sizes. This primary prevention measure has exceptional potential compared to programs that only address particular groups for the mental and physical health of the general population. This study investigates whether digital mindfulness-based stress management training offered to the general population as a health insurance-covered prevention program can improve their mental health and life quality. Moreover, it is indicated to determine whether all social groups can benefit from such an application to the same degree.
The meditation app '7Mind', which provided the content for the study, has been publicly available since 2015 and was subject to stress and mindfulness research before.
This was a prospective, survey-based, pre/post, single-arm, open-label study. It aimed to investigate whether an app-based intervention can change the Freiburg Mindfulness Inventory (FMI) score that measures mindfulness significantly within the general population as the target group.
No usage data was collected within the smartphone application for this study. A Customer Relationship Management system sent emails with links to the survey platform SurveyMonkey® to all participants after completing the first module (t0) of the ABSM course. The second questionnaire was exclusively sent out after the final module — this marked t1. The questionnaires are deposited in the appendix (online only).
A user-created anonymous 4-digit identification code was used to match both datasets (at t0 and t1) and merge them. This connected dataset will be referred to as the merged dataset in this paper and in the Rtranscript (in the appendix). The data was collected by 7Mind® between 13.12.2021 and 26.05.2022. No later responses were considered for this study. Since the course was self-timed, users could start the course independently from the start and end of this data collection.
The course was available on Android and IOS devices. The participation required the installation of the 7Mind® app. Inclusion criteria were that participants were of age and knew German (to understand the course content and answer the questions). Every user who submitted the questionnaire was included in the study. No power calculation has been conducted.
All participants submitted the questionnaires voluntarily, agreed to their data procession, and could drop out of the study anytime. The users did not receive compensation for participation. The costs for the course (75€) were - depending on their specific insurance - refunded after completion, or they were covered in advance. This coverage was independent of participating in the study. All participants received access to the app's entire meditation library. The course encouraged users to deepen their practice with the additional content, but the additional practice did not affect the coverage by the insurance.
The course consisted of eight 45-minute-long audio modules. Each included lessons about stress, mindfulness exercises, and a mindful meditation session. The participants received a handout with a content summary after each session. They also had to pass a quiz about the module to continue. Following the guidelines of ZPP (the central German institution for prevention that certifies insurance-covered prevention programs), participants could only finish one module per week.
The participants were free to opt-out at any point in the study. No disadvantages have arisen from the discontinuation of participation. Personally identifiable data was not collected during the study. The course content was certified as a prevention course by ZPP and examined for potential adverse effects for its participants by this institution (course-ID: KU-ST-NAKHWV). Due to these circumstances, no ethics application was submitted to the Ethics Committee.
SurveyMonkey® is ISO 27001 certified, and its technology aligns with the GDPR (General Data Protection Regulation of the EU). Furthermore, SurveyMonkey® is EU US Privacy Shield certified.
The t0 questionnaire assessed 42, and the t1 questionnaire 46 variables. Thirty-six questions were asked at t0 and t1 for pre/post comparison and included the FMI and questions recommended by ZPP for evaluating prevention courses. These questions mainly addressed stress and mental well-being and asked whether chronic pain affects the daily life of the participants. Other questions asked for socio-demographic data (at t0 only) and a rating of the course's success (at t1 only). No data about nationality or ethnicity was collected.
The German FMI short form was used to measure mindfulness. The original FMI short form (in English) is a 14-item assessment that can be used for participants with no experience in mindfulness meditations and was developed by Walach et al. based on the FMI invented by Buchheld et al..  The FMI measures the factors "presence" and "acceptance". Higher scores indicate higher mindfulness.
Following the FMI Rasch analysis results of an item response analysis that evaluated the homogeneity of the questionnaire, item 13 ("I am impatient with fellow human beings.") was left out for improved internal consistency and construct validity. Sauer et al. concluded that the adjusted two-factorial FMI-13 has an acceptable approximation to Rasch requirements. The questionnaire for this study asked about patience with fellow human beings in addition (apart from the FMI score) to cover this subject too.
The CSV file created by SurveyMonkey® was imported into Excel® at first to delete data not required for the study and to blind the variable titles for the analysis. This process was done by the author, who later analyzed the data. The raw data, the codes for unblinding, and the transcript of all R-operations can be found in the appendix.
The blinded variable ID consisted of a random 4-digit algorithm-generated code and two codes that provided necessary information about the scale level and applicable tests.
The explorative analysis was conducted twice. First, we examined participants' data whose questionnaires could be linked based on their ID (paired samples in the merged dataset). Then, the analysis was repeated with all submitted data sets so that participants who submitted only one questionnaire or coincidentally generated the same ID as another participant could also be considered. The data of t0 and t1 in the latter analysis were handled as unpaired samples.
All statistical analyses for individual question pre/post comparisons were blinded. Unblinding was done before the FMI score calculation because the calculation required awareness of which parameters needed to be summed. The score was calculated for all participants with matched datasets in the first analysis and for all submitted questionnaires in the second analysis.
Since the FMI asks questions on a Likert scale, the means of analysis were limited to methods suitable for a discrete, ordinal measurement scale.
All variables with observations at t0 and t1 in the merged dataset were compared using the two-sided Wilcoxon signed-rank test for paired groups with continuity correction. The Null-Hypothesis was "t0 and t1 are equal”.
The analysis of all submitted questionnaires (with no matched IDs between t0 and t1) was conducted using the two-sided Wilcoxon rank sum test (equivalent to the Mann-Whitney test) since the sample groups were unpaired.
For both analyses, Alpha was set to 0.05. To address the family-wise error, alpha was Bonferroni adjusted for variables involved in multiple tests.
Rstudio Version "2022.02.0+443 "Prairie Trillium" with "psych" packages and "ggplot2" packages were used for the analysis. A filter pipeline was used to address duplicates. The R analysis code was transcripted and is attached in the appendix.
All users who responded to the invitation email participated in the study. The 4-digit code for identification was based on only three questions, which led to the possibility that the same code was created more than twice. This caused the inability to match the data of this person in the datasets. These cases had to be deleted in the tests for paired samples due to the ambiguity of the ID, which lowered the number of analyzed observations.
Furthermore, it was possible that participants did not provide their code correctly in the second questionnaire at t1, which made matching impossible. A control group was not part of the study due to the practical limitations of the study design. The absence of a control group does not allow a comparison between the intervention and no intervention.
7117 participants submitted the first questionnaire. 2629 sent the second form too. 829 datasets were able to be matched in RStudio by the user-generated ID, which created the merged dataset. 270 of these matched IDs and their data had to be excluded from analysis for paired samples due to ambiguity in their ID (duplicates). The process of exclusion of datasets is depicted in graph 1. All questionnaires (also those excluded from the analysis for paired samples) were included in the analysis of unpaired samples.
The participants were predominantly female (76,7%) and highly educated (72,8%), which means they had at least a general qualification for university entrance. The average age was 41 ± 12 years. The age and gender distribution are depicted in graph 2.
82% of the participants have not participated in another health prevention course within the last 12 months. The course ratings at t0 indicated high satisfaction at the end of the program for those who finished it. Bar charts and boxplots of all variables - including pre/post comparisons if applicable - are provided within the transcript in the appendix.
The gender and educational level-specific non-completion rates were similar apart from participants with a degree lower than secondary school (see table 1). Since people were not asked about gender and education at t1, the only comparison can be between t0 and the merged file (that includes t1 participants combined with the information they gave at t0). It remains unknown how many participants of each gender and educational status submitted the second form whose data could not be matched.
The study's primary aim was to determine whether an app-based intervention can change the FMI score significantly within a general population. There was a median shift of the FMI Score from 28 at t0 to 36 at t1 in the analysis of the merged, paired dataset (table 2). The median shift of the FMI score in the unpaired dataset from 29 at t0 to 35 at t1 was similar (table 3).
The two-sided Wilcoxon signed-rank test for paired groups (merged dataset) returned a significant result that the location shift in the FMI is not equal to 0 (p < 0,01). The 95 per cent confidence interval lies between 6.5 and 7.5. The analysis of Cronbach Alpha returned 0.88 at t0 and t1. The data is depicted in table 2.
The two-sided Wilcoxon signed-rank sum test for unpaired samples (all questionnaires dataset) returned a significant result that the location shift in the FMI is not equal to 0 (p < 0,01). The 95 per cent confidence interval lies between 6.9 and 7.0. The analysis of Cronbach Alpha returned 0.88 at t0 and 0.87 at t1. The data is available in table 3.
The violin plots depict the distribution of responses to individual FMI questions at t0 and t1 of the merged dataset with paired samples.
The participants rated their state of general health, pain level, and the degree of restriction by pain in daily life significantly lower after the course. All these variables improved significantly. The dataset also includes statements about improved stress management, and other related subjects that cannot be fully addressed here are provided in the appendix.
The primary study aim was to find out whether an app-based intervention can change the FMI score significantly within a general population. While the results seem to indicate a clear improvement, they should be interpreted with care due to the multiple limitations of the study design described in 2.7. Methodological limitations.
All statistical tests turned out to be significant, and all differences indicated improvements. Due to the large sample size, significant test results were to be expected even for small differences between the surveys and do not indicate clinical relevance. The median shifts between t0 and t1 were similar for the merged dataset with matched IDs and the dataset that included all questionnaires (including participants that only submitted one answer form).
The results suggest that mindfulness-based stress management programs and meditation can also be taught via an app. The identified improvements of this prevention course will be followed up by sending out questionnaires after six months to examine the long-term effects. Future studies should investigate whether these improvements sustain over long-term periods.
Since the course was in German and only German public health insurance companies covered its costs, the author assumes that most participants lived in Germany, although no data about nationality or ethnicity was collected.
The dominant user group was female and highly educated. This user group might benefit the most, while males and less educated social classes might be reached less. These findings confirm other research concerning prevention measures in general and especially with users of meditation apps.
Since it remains unknown whether the group that has signed up for the course is representative of the general population, the gender- and education-specific dropout analysis (which indicates similar non-completion rates for all groups) does not address this issue. The concern appears rather be how many lower educated people and males sign up than how many complete the program. Nevertheless, it might be that those who have signed up differ from the general population in other variables (such as spiritual interest or similar) that were not measured in this study. Future studies need to investigate the underlying reasons for user group homogeneity and strategies for the inclusion of all social groups.
This study does not provide data that explains the magnitude of the dropout rate. Although the absence of a (voluntarily submitted) t1-questionnaire is not equal to the program's termination, a noticeable non-completion rate is to be expected. The matching process between the datasets of t0 and t1 limits the information about those who dropped out. Future identification codes of the questionnaires need to consist of more digits to reduce the risk of duplicates. We do not have statistical confidence whether they are significantly different from those that remained. However, at least for women and men and for all educational levels above no degree, the remaining percentages did not vary more than two per cent.
Non-adherence is a common limitation of digital mental health interventions. It is reasonable to assume that those who had negative experiences during the program were less likely to finish it. On the other hand, participants' willingness to complete a program might be decreased in the absence of a severe urge to address one's condition. Future studies need to investigate specifically why participants abort courses.
Since apps have no in-person contact via an instructor, negative experiences and confusion can be less addressed (although users could write emails to the provider in this case). The major advantage of mobile applications is that they are scalable, easily accessible, cheap for the public health system, and automized. This turns against them in this context because no in-person advisor can approach occurring difficulties during the course.
The pre/post design for the paired samples comes with the virtue that no between-person variability played into the comparison of the groups. However, the absence of a placebo group combined with the open-label design does not control the Hawthorne effect (change of behaviour or response to questionnaires due to the awareness of being observed).
There were multiple approaches in mindfulness science to introduce a control group. Cognitive-behavioural therapies, massages, stress management, or stretching exercises were used in different studies to compare placebo probands with participants in meditation intervention groups. Such control measures come along with the limitation that they probably have their own effect on the examined variables. Zeidan et al. have addressed this issue with so-called "Sham Meditations". Sham meditations are based on breathing exercises and the propagation of the belief that participants would be meditating. Participants were not taught how to accept their sensations and thoughts to return to the present moment as in mindfulness exercises. Zeidan et al. have also shown on the fMRI that Sham meditation (along with placebo and book-listening control groups) activates different neural correlates than mindfulness meditation.
Future research on mindfulness prevention courses should be conducted. It would ideally have three control groups - one with people who receive instructions similar to the mentioned "Sham Meditation" and one with people who engage in an activity such as audiobook listening and a waiting list group that does not receive any intervention. Moreover, there needs to be further research that measures biological markers of stress and its neural correlates instead of only relying on what the participants state.
The observed reduction of perceived pain after the mindfulness program is in line with other findings in that field. There might be neurophysiological explanations for this improvement. These mechanisms seem to be unique, non-opioidergic and work differently than placebo. May et al. used naloxone as an opioid antagonist to investigate whether endogenous opioids are responsible for meditation analgesia. The blockage of opioid receptors did even enhance meditations' analgesic effects. The underlying mechanisms for meditation's pain-relieving effects are still to be discovered. The fact that participants also stated in this study a reduction of pain underlines the need for further investigation that includes measurements of neurophysiological correlates for pain.
The analysis results of the intervention indicate an effect on participants that increases mindfulness. This possible mindfulness increase is represented by the FMI score improvement. The major caveats are the dropouts of participants in the study, the absence of a control group, and the selective audience that is currently reached by programs comparable to the investigated mindfulness-based stress management course. Therefore, solutions need to be found to diversify the user group and measures to increase completion rates. Although the limitations of this study and the program need to be considered, application-based interventions appear to be beneficial for their users and need their established place in modern healthcare systems.
All data used in this publication can be openly accessed via Rico Schmitt’s OSF repository. The R script for the statistical evaluation can also be found in the appendix below.
Note from the editors: In the near future a tutorial in R on the statistical method that was used in this paper will be published in the BEM tutorial collection Stat-o-Sphere. We will also include an extra paragraph explicitly dedicated to this publication to educationally make use of open data publications within BEM.
The corresponding author of the manuscript ensured disclosure of any conflict of interest during the creation of the manuscript. The guidelines of Berlin Exchange Medicine concerning conflicts of interest were taken into account.
Rico Schmitt has been employed by 7Mind® since 3/2022.
Background image by Victor Lu