Case Study: Bellabeat

Project Overview

This case study of Bellabeat is my capstone project completing the Google Data Analytics Certificate. Bellabeat is a high-tech company that manufactures health-focused smart products focusing on women.

The aim of the project was to generate data-driven recommendations regarding the marketing strategy for Bellabeat, focusing on one of Bellabeat’s products: the Bellabeat app. The recommendations in this report were built upon observations from analysing smart device data to gain insight into how consumers are using their smart devices.

The report follows the six steps of the data analysis process: ask, prepare, process, analyse, share, and act.

Read More

Please click on the link below to read the case study in full:
https://fridakronquist.github.io/bellabeat_case_study/

Or here to see it on GitHub:
https://github.com/fridakronquist/bellabeat_case_study

Softwire Used

Ask

Identify business task

The business task was to analyze non-Bellabeat smart device usage data in order to gain insight into how people were already using their smart devices and based on the results of the analysis deliver high-level recommendations for how these trends could inform Bellabeat’s marketing strategy.

Consider key stakeholders

The identified key stakeholders were:

  • Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer
  • Sando Mur: Mathematician and Bellabeat’s cofounder; key member of the Bellabeat executive team
  • Bellabeat marketing analytics team

Prepare

Data source

The data used for this analysis is the FitBit Fitness Tracker Data. The data set is stored on Kaggle and was made available through Mobius.

Data accessability & privacy

The dataset is licensed CC0: Public Domain, which means the owner has dedicated the work to the public domain by waiving all of their rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.

This means the dataset is open-source and can be copied, modified, and distributed, even for commercial purposes, without asking permission.

Data content

The dataset was generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016 – 05.12.2016. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. Individual reports can be parsed by export session ID (column A) or timestamp (column B). Variation between output represents use of different types of Fitbit trackers and individual tracking behaviors / preferences.

Data organisation

The data was available as 18 different CSV-files. Each file contained different quantitative data generated from Fitbit health trackers on either long or wide format.

Data credability & integrity

The datasets has a small sample size (33 users or less). Also, the gender and other demographic information of the users are unknown. The dataset might include users with another gender identity than woman, which is Bellabeats target customer. There might also be an unknown sampling bias in the data. Partly based on demographics, partly based on that the survey was only distributed via Amazon Mechanical Turk. Additionally, the survey was only open for two months and the data is from 2016 and might not be relevant anymore. Because of above mentioned reasons, the results of this analysis might not be representative and this case study should be seen as an operational approach.

Process

Packages

The following packages were installed and opened:

  • janitor
  • lubridate
  • Rcmdr
  • scales
  • tidyverse

Import datasets

According to the central limit theorem, given a sufficiently large sample size from a population with a finite level of variance, the mean of all sampled variables from the same population will be approximately equal to the mean of the whole population. What is a sufficient sample size varies depending on industry and business, but sample sizes equal to or greater than 30 are often considered sufficient for the central limit theorem to hold. This analysis only used datasets fulfilling that sample size, which means heart rate data (14 users), sleep data (24 users), and weight data (8 users) were not used.

For this analysis, minutial data would not provide more insights than hourly and/or daily data could provide. Hence, datasets containing minutial data was not used.

The dataset called dailyActivity_merged contains daily calories, intensities, and steps, which made the datasets dedicated specifically to those data redundant for this analysis.

To summarise, this analysis used the following datasets:

  • dailyActivity_merged
  • hourlySteps_merged

Clean and format datasets

  • Verify number of users
  • Identify and remove potential duplicates

  • Clean and rename columns

  • Date and time

Analyse & Share

Summary statistics

Initial thoughts regarding the dataset:

  • Mean total steps per day was approx 7638 steps (SD=5087), median total steps per day was slightly under mean (MDN=7406). Min recorded steps is 0 and max is as much as 36019 steps! Seems like there were some very active users that increased the average, but also some sedentary users that balanced the very active. Need to investigate further.
  • Mean very active minutes is 21 min (SD=33), mean fairly active minutes is 13 min (SD=20). This means on average users reach the recommended active minutes to gain significant health benefits. However, large standard deviation. Investigate further.
  • The activity minutes did not add up to 1440 min (60 min per hour, 24 hours in a day) on each row. Indicates that the users did not have their tracking devises on during the whole day. Investigate further.

Lifestyle type based on daily steps

As the pie chart shows, all lifestyle types were represented in the tracker data. However, most users had a somewhat active (27%), low active (27%), or sedentary lifestyle (25%). Only 21% of the users were considered to have an active (15%) or very active (6%) lifestyle based on their total daily steps.

Active minutes

Reaching the recommended active minutes per day to gain health benefits was achievable regardless of what lifestyle in terms of steps the user had. However, it seems like it is more likely to reach the recommended steps to gain high health benefits when having a somewhat active, active, or highly active lifestyle in terms of daily steps. Hence, encouraging sedentary and low active users to be more active during the day seems like a good idea. Emphasis spending more time being fairly active and very active rather that just increasing the steps. For example, rather a brisk walk than a slow one.

Timing of steps

In general, it seems like a good idea to promote more activity throughout the whole day. Maybe a little reminder in the app to have an activity break and move the body a bit. Encourage users to commute by foot or bike if they are able, to have a lunch time walk/work out, and to continue to move after work. I.e. to integrate movement into their everyday life.

Tracker usage

Recommendations

High-level recommendations for Bellabeat’s marketing strategy.

  • Encourage sedentary and low active users to be more active during the day. Emphasis spending more time being fairly active and very active rather that just increasing the steps. For example, rather a brisk walk than a slow one.
  • Custom reminders on being active based on user lifestyle.
  • Promote more activity throughout the whole day. Maybe a little reminder in the app to have an activity break and move the body a bit. Encourage users to commute by foot or bike if they are able, to have a lunch time walk/work out, and to continue to move after work. I.e. to integrate movement into their everyday life.
  • Add a little reminder in app to use fitness tracker if there are long periods of inactivity.
  • Remake analysis with own data; Considering the demographics of the Fitbit users were unknown, it is unsure if the results of this analysis are applicable to Bellabeat. Also, the sample of the analysis is very small. Hence, it is recommended to use Bellabeat’s own data to perform the same or a similar analysis.

More Projects