7  Streaming Service Analytics

7.1 Introduction

In this assignment, I worked as a data analyst for the Streaming Analytics Division. I was tasked with discovering whether age group impacted individuals’ streaming platform preferences.

7.2 Data Preparation

knitr::opts_chunk$set(
echo = TRUE,
message = FALSE,
warning = FALSE
)

library(readxl)
library(tidyverse)
library (ggplot2)
library(ggthemes)
library(pheatmap)
Table 7.1: Count of Age Categories for TV Data
tvdata <- read_excel("Streaming Services and Age.xlsx")
count(tvdata,AgeCat)
# A tibble: 3 × 2
  AgeCat     n
  <chr>  <int>
1 18–25    100
2 26–40    100
3 41+      100
Table 7.2: Count of Platforms for TV Data
count(tvdata,Platform)
# A tibble: 5 × 2
  Platform     n
  <chr>    <int>
1 Amazon      54
2 Disney+     61
3 Hulu        46
4 Netflix    111
5 Other       28
Table 7.3: Age and TV Platform Data
contingtab <- table(tvdata$AgeCat, tvdata$Platform)
contingtab
       
        Amazon Disney+ Hulu Netflix Other
  18–25      4      22   23      47     4
  26–40     11      25   16      41     7
  41+       39      14    7      23    17

7.3 Visualization

stacked<- ggplot(tvdata, aes(x = AgeCat, fill = Platform)) +
  geom_bar(position = "fill") +
  labs(
    title = "Streaming Platform by Age Range",
    y = "Streaming Platform Proportion",
    x = "Age Range"
  ) +
  theme_fivethirtyeight()
stacked
Figure 7.1: Streaming Platform by Age Range
clustered <- ggplot(tvdata, aes(x = Platform, fill = AgeCat)) +
  geom_bar(position = "dodge") +
  labs(
    title = "Platform Preference by Age Range",
    x = "Streaming Platform",
    y = "Respondent Number",
    fill = "Age Range"
  ) +
  theme_economist()
clustered
Figure 7.2: Platform Preference by Age Range

7.4 Chi-Square Test of Independence

chitest <- chisq.test(contingtab)
chitest

    Pearson's Chi-squared test

data:  contingtab
X-squared = 68.044, df = 8, p-value = 1.203e-11

A Chi-Square test of independence was conducted to examine the relationship between age range and streaming platform preference. The test produced a Chi-Square statistic of χ²(8,N=300) = 68.04, p <.001. Consequently, we can reject the null hypothesis. There is a statistically significant relationship between age range and streaming platform preference.

7.5 Observed, Expected, and Residual Values

Table 7.4: Observed Values for TV Data Chi Test
observed <- chitest$observed
observed
       
        Amazon Disney+ Hulu Netflix Other
  18–25      4      22   23      47     4
  26–40     11      25   16      41     7
  41+       39      14    7      23    17
Table 7.5: Expected Values for TV Data Chi Test
expected <- chitest$expected
expected
       
        Amazon  Disney+     Hulu Netflix    Other
  18–25     18 20.33333 15.33333      37 9.333333
  26–40     18 20.33333 15.33333      37 9.333333
  41+       18 20.33333 15.33333      37 9.333333
Table 7.6: Residual Values for TV Data Chi Test
residuals <- chitest$residuals
residuals
       
            Amazon    Disney+       Hulu    Netflix      Other
  18–25 -3.2998316  0.3696106  1.9578900  1.6439899 -1.7457431
  26–40 -1.6499158  1.0349098  0.1702513  0.6575959 -0.7637626
  41+    4.9497475 -1.4045204 -2.1281413 -2.3015858  2.5095057
Note

I am using two standard deviations as the cutoff for “notable deviations.”

For 18 - 25 year olds, far fewer chose Amazon than expected. For 26 - 40 year olds, there were no particularly unusual deviations. For 41+ year olds, substantially more than expected chose Amazon and more than expected chose Other. Fewer than expected chose Hulu and Netflix.

7.6 Contributions to the Chi-Square Statistic

Table 7.7: Cell Contributions to Chi Square Statistic
cellcontributions <- (observed - expected)^2 / expected
cellcontributions
       
             Amazon     Disney+        Hulu     Netflix       Other
  18–25 10.88888889  0.13661202  3.83333333  2.70270270  3.04761905
  26–40  2.72222222  1.07103825  0.02898551  0.43243243  0.58333333
  41+   24.50000000  1.97267760  4.52898551  5.29729730  6.29761905
chi_sq_total <- as.numeric(chitest$statistic)
Table 7.8: Cell Contributions to Chi Square Statistic (x 100)
cellcontrib_pct <- 100 * cellcontributions / chi_sq_total
round(cellcontrib_pct, 1)
       
        Amazon Disney+ Hulu Netflix Other
  18–25   16.0     0.2  5.6     4.0   4.5
  26–40    4.0     1.6  0.0     0.6   0.9
  41+     36.0     2.9  6.7     7.8   9.3
pheatmap(
  cellcontrib_pct,
  cluster_rows = FALSE,
  cluster_cols = FALSE,
  display_numbers = TRUE,
  number_format = "%.1f",
  main = "Contribution of Each Cell to Total Chi-Square (in Percent)"
)
Figure 7.3: Contribution of Each Cell to Total Chi-Square (in Percent)

The cell that contributes the most is Amazon for 41+ year olds. This suggests that Amazon either has more content geared at adults than the other platforms, or that it enables access to a greater variety of adult-oriented subsidiary platforms (PBS Documentaries, Max, etc.) 41+ adults may also use Amazon more frequently since they are more likely to have Amazon Prime accounts already.

7.7 Effect Size (Cramer’s V)

n <- sum(contingtab)
chi_sq <- as.numeric(chitest$statistic)
r <- nrow(contingtab)
c <- ncol(contingtab)

cramers_v <- sqrt(chi_sq / (n * (min(r - 1, c - 1))))
cramers_v
[1] 0.3367584

Cramer’s V = .34, indicating a moderate association between age and platform preference.

7.8 Final Interpretation

The Chi-Square test revealed a significant relationship between age and platform preference χ²(8, N=300) = 68.04, p <.001. The largest contributions came from the 41+ Amazon viewers and 18-25 Amazon viewers. 41+ viewers were far more likely to use Amazon for streaming than expected, while 18-25 viewers were far less likely to. 41+ year olds watching Other also accounted for a substantive amount of the association. Cramer’s V was .34 for the Chi-Square test, indicating a moderate association between age and streaming preference. It is likely that other factors (stylistic preference, cost, etc.) are also important.

In order to better understand the mechanisms behind these deviations, streaming services should conduct further survey research on user preferences. In particular, they should consider the subsidiary streaming services that Amazon prime viewers might be subscribing to through Amazon prime, as well as the exact services within the Other category. It is possible that there is a fair amount of overlap between subsidiary Amazon platforms and Other.