As President Trump received criticism for how he responded to the death of Senator John McCain, I noticed that his approval rating took a small but noticeable dip. Coincidence or correlation? This presidency has upended the traditional understanding of how Americans respond to political events. Is it possible to identify any signal or patterns in Trump’s approval ratings, or has fake news penetrated so deeply that the American public is responding to untracked rumors?
In this post, I link google search trends for “Trump twitter” to his approval ratings. Scroll to the bottom if you want to skip the statistical details :)
Approval ratings: FiveThirtyEight compiles approval polls and calculates a daily weighted average to estimate Trump’s true approval rating. The data are available on FiveThirtyEight’s GitHub. I used the “All polls” model output. It is important to note that building a model on another model’s output is akin to making copies of a copy—it adds noise (and possibly bias). This is a key limitation of these data.
Google search trends: I used the gtrendsR package to extract Google trends data for the search terms “Trump” and “Trump twitter.” Unfortunately, daily trends are only available if querying the past three months from present; any other queries provide weekly data. This is the major limitation of the search trends data. While it is possible to interpolate between the weekly data points or aggregate the daily approval ratings, more granular data generally allows for better models. The “interest index” provided by Google is relative to all other searches and the maximum value in the set.
Let’s quickly take a look at these datasets. First is the FiveThirtyEight estimated approval rating for President Trump. Looks like it ranges from about 37% to 47%. I’ve included the confidence bands in this plot, but ignored them in the analysis.
Next are the Google search trends. I included some state-level data, which show remarkably similar trends despite very different political leanings. Thus I only used the Overall US “Trump Twitter” search data for the rest of the analysis. I chose the search term “Trump Twitter” over “Trump” since this is how many people (including myself) access the President’s Twitter account (rather than navigating through Twitter). A quick comparison of “Trump” vs. “Trump twitter” (not shown) suggested to me that “Trump twitter” may be more related to his approval rating.
When do people go to the President’s Twitter? A reasonable guess: when they hear he tweeted something controversial.
Lastly, here is a combined look at Approval rating and “Trump Twitter” search trends. Both have been normalized so we can look at them on a similar scale. It looks like there’s some relationship between the signals! We can see that spikes in search activity often correspond to dips or spikes in approval.
Based on the last plot, it seems reasonable to build a regression model where a unit change in search activity leads to some change in approval. Or better yet, a lagged regression model where, for example, a unit change in search activity the week before leads to some change in approval.
I fit a number of ARIMA (autoregressive integrated moving average) time series lagged regression models to the data and they were lackluster. In general, these models identified some effect of search activity, but with very small coefficients.
Below is one such model, an ARIMA(0,1,1). I trained it on a subset of the data and forecasted the most recent period. The grey line represents the true approval rating and the blue line is the prediction based on search activity.
It doesn’t look great. While there may be some use to this model, I think a regression model is likely inappropriate for this problem because:
There can be a positive or negative effect on approval from a bump in search activity.
The magnitude of the effect is highly context-dependent. For example, a ridiculous but harmless tweet may generate a lot of traffic but not affect Pres. Trump’s approval rating.
A Random Walk with Pres. Trump
Okay, maybe we can’t predict Pres. Trump’s approval rating. Maybe his approval rating is just a random walk process, where at each point in time, his approval goes up or down by a draw from some distribution (e.g., a normal distribution).
If this were the case and we looked at changes in approval, we would see a symmetric squiggle. For a normal random walk, we would see the squiggle contained in a uniform band.
We definitely do not see a normal random walk. We can see “bursts” where there are large changes in approval. If we make a histogram of the changes in approval, we can see there is a “long left tail” and a non-symmetric distribution.
To me, this suggests Trump’s approval is not a random walk.
The Changepoints Paradigm
Here is another way to think about approval ratings:
Let’s assume that Trump’s approval is constant plus some noise in some intervals.
There are external shocks/bursts to his approval that lead to some new equilibrium.
This is a changepoints approach to modeling our problem. Changepoints detection has a long history in statistics, but has recently heated up. The most basic changepoint model assumes independent normal observations with a change in mean (see Hinkley 1970). I used a non-parametric changepoint detection method based on empirical distributions (Haynes et al. 2017). So let’s change our model:
Assume that Trump’s approval follows some (fixed) distribution in intervals.
And let’s add another assumption: only large shocks above some threshold lead to a new equilibrium.
Looking at the search trends data, it looks like a search interest index above 40 may qualify as a large shock. I wrote an algorithm that attempts to identify only the beginning of a “shock,” since there may be sustained activity afterwards.
Putting this model together, we have these shocks identified by a vertical line and my best guess for the reason for the search traffic. The horizontal lines represent Trump’s mean approval between the identified changepoints. Click on the graphic to enlarge!
Looks pretty good! The “shocks” in search activity are generally aligned with changepoints in approval rating. A back-of-the-envelope estimate of RMSE using the search “shocks” to identify changepoints in approval is 13.9 days. This suggests there is room for improvement, perhaps by attempting to measure sentiment on social media. But not everyone tweets—and almost everyone googles!
This analysis suggests that when the Google search interest index for “Trump Twitter” exceeds 40 (see note below), we should expect to see a change in Trump’s approval rating either up or down.
Political scientists have long theorized the “bully pulpit” (i.e., the fact that media will report the president’s speeches, press releases, etc.) serves as an “agenda setting” tool for presidents. It is unprecedented that a president use the bully pulpit to damage their own popularity and agenda. This analysis suggests that is the case for Pres. Trump.
Note on Google Trends Search: Due to how trends are adjusted, the time period that you select (and any comparisons to other search terms) will determine what this threshold value is. It is approximately the April 8th 2018 value.
Code for this analysis can be found on my GitHub.