Volume 39 - Article 22 | Pages 647–670
WhatsApp usage patterns and prediction of demographic characteristics without access to message content
|Date received:||06 Jan 2017|
|Date published:||27 Sep 2018|
|Keywords:||demographics, social media, social network, usage prediction, WhatsApp|
Background: Social networks on the Internet have become ubiquitous applications that allow people to easily share text, pictures, and audio and video ﬁles. Popular networks include WhatsApp, Facebook, Reddit, and LinkedIn.
Objective: We present an extensive study of the usage of the WhatsApp social network, an Internet messaging application that is quickly replacing SMS (short message service) messaging. To better understand people’s use of the network, we provide an analysis of over 6 million encrypted messages from over 100 users, with the objective of building demographic prediction models that use activity data but not the content of these messages.
Methods: We performed extensive statistical and numerical analysis of the data and found significant differences in WhatsApp usage across people of different genders and ages. We also entered the data into the Weka and pROC data mining packages and studied models created from decision trees, Bayesian networks, and logistic regression algorithms.
Results: We found that different gender and age demographics had signiﬁcantly different usage habits in almost all message and group attributes. We also noted differences in users’ group behavior and created prediction models, including the likelihood that a given group would have relatively more ﬁle attachments and if a group would contain a larger number of participants, a higher frequency of activity, quicker response times, and shorter messages.
Conclusions: We were successful in quantifying and predicting a user’s gender and age demographic. Similarly, we were able to predict different types of group usage. All models were built without analyzing message content.
Contribution: The main contribution of this paper is the ability to predict user demographics without having access to users’ text content. We present a detailed discussion about the speciﬁc attributes that were contained in all predictive models and suggest possible applications based on these results.
Avi Rosenfeld - Jerusalem College of Technology, Israel
Sigal Sina - Bar-Ilan University, Israel
David Sarne - Bar-Ilan University, Israel
Or Avidov - Bar-Ilan University, Israel
Sarit Kraus - Bar-Ilan University, Israel
Most recent similar articles in Demographic Research
Estimating abortion incidence using the network scale-up method
Volume 43 - Article 56 | Keywords: social network
Traditional versus Facebook-based surveys: Evaluation of biases in self-reported demographic and psychometric information
Volume 42 - Article 5 | Keywords: social media
Happy parents’ tweets: An exploration of Italian Twitter data using sentiment analysis
Volume 40 - Article 25 | Keywords: social network
Identifying interaction effects using random fertility shocks
Volume 40 - Article 10 | Keywords: social network
Using Twitter data for demographic research
Volume 37 - Article 46 | Keywords: social media
Cited References: 24
»View the references of this article
Download to Citation Manager