Skip to main content

Our data

By August 2021 we had:

  • 918000

    text conversations

  • 35.8m

    text messages exchanged

  • 361000

    texters across the UK

The importance of Big Data

Because our conversations are text-based, they can be analysed using computational methods, which offer many exciting opportunities when complemented by human-in-the-loop coding and other qualitative approaches, including input from our clinical team of supervisors.

Big data such as this is important for several reasons. Many mental health studies are based on relatively small sample sizes, which come with limitations. The dataset we have contains a large range of issues, from many different texters, which can be examined with increasing granularity as the dataset grows.

The large size of our dataset combined with high temporal precision offers several opportunities. We can explore how issues raised by particular groups vary by time of day or in response to particular events - seen very clearly in response to pandemic-related government announcements in late 2020 and early 2021. In addition, these features allow us to see mental health trends in our data. For example, we saw that mentions of the word virus in conversations began early in March 2020 as cases of Covid-19 began to increase in the UK, but well before the first national lockdown.

big data.png
digitalinnovation.jpg

With a dataset of this scale, it becomes feasible to apply advanced Natural Language Processing (NLP) and machine learning approaches, including deep learning, to analyse the data and build predictive models. Indeed, such approaches are generally necessary as it becomes increasingly impractical for humans to review all of the data and use thematic analyses. These predictive models can be used for a number of exciting purposes, including predicting risk and identifying conversation themes.

These approaches also offer the opportunity to conduct research at scale, providing mental health insights based on data from many thousands of Shout service texters, especially when complemented with qualitative, thematic coding and analyses conducted by humans. Early results from our research projects at Imperial College London show that the latest NLP and machine learning approaches can be used to build models that accurately predict conversation features, including the main issues someone will text us about and texter demographics.