started 28th June 2021
Google Analytics gives page views per day for websites, looking at the data I thought it would be fun to see if Fourier transforming it would pick out the periodicity.
The data runs from 26th June 2011 to 26th June 2021, 3654 values (2012, 2016 and 2020 were leap years). Saved from Analytics as .csv the data has the format:
Which is correct, as the .csv format goes, put the value in quotes if it contains a comma. But the numpy routine I tried did not implement this rule. If the number of views is less than 1000, Analytics saves it as:
All of which is why there is code to read and wrangle the values.
This figure shows the page views against day (tick dates are shown in US format month/day/year):
To set the scene, Fourier analysis deconstructs a curve as a sum of sinusoidal curves of varying frequency. I read the data into an array, and passed the array to the numpy real data fast Fourier transform function. There's a helper function which calculates the frequencies, I used matplot lib to display the Fourier components against frequency and the result makes no sense at all.
The only timescale here is the day. Frequencies have units of 1/Time, so the frequencies are things like 0.000273 cycles per day. Eh?
It is clearer to consider what is going on in terms of period. The data set is 3654 days long. The zeroth Fourier component has infinite period (or zero frequency) and represents a constant value. The first component has a period of 3654 days. The second a period of 3654/2 days. The third a period of 3654/3 days. And so on...
In terms of frequency the components are equally spaced, at 1/3654 cycles per day. The lowest frequency 1/3654 takes 3654 days to complete one cycle. If one is looking for long periodicity like a year the corresponding frequency will be low, typically at the left hand side of the plot.
This figure shows the component amplitudes plotted against period in days. Periods of 365 and 365/2 days are shown with red vertical lines.
There is clearly periodicity at year and half year intervals.
At the other end of the spectrum, there is periodicity around weeks (seven days) and half weeks.
The data comes from a wiki devoted to ornamental plants. Plotting a 28 day rolling average shows that interest peaks about a month after the equinoxes - around mid April and October. Perhaps this is due to gardeners in both hemispheres trying to work out what is sprouting. Since there are far more viewers in the North, more likely is that they have two seasons of interest.
But is the twin peaks each year phenomena due to the two hemispheres or not? One way to find out is to look at page views for individual North and South hemisphere countries. Google Analytics has filters to do this, but they only apply to new data, which makes for a long term project. Quicker is to use the 'segment' feature. This is the 28 day average graph for the USA.
And this is the graph for Australia.
Both have the twin peaks of Spring and Autumn, it is just about possible to argue that the big and small peak positions are swapped - gardeners Google more in Spring and Autumn, but Google most in Spring.
The rolling 28 day average values are plotted for the date at the end of the 28 days, the peaks are closer to the equinox than above.
I fiddled the figures slightly by picking a whole number of years. It is not as easy to pick out a period of 365 days if the data size is far from a multiple of 365.
Wait there's more
The Fourier analysis gives both amplitude and phase information for the components. The phase shift is about 82 days which relative to the start of the data is the 4th April.
It is possible to see the phase is correct by plotting the components with the 28 day average, first the 365 day one:
Second the 365/2 day component:
Finally the two components combined.
Here the 28 day averaged data is plotted at a date in the middle of the 28 days - 14 days different to the earlier figures. The right hand y axis is for the component and the left hand y axis for the data. Code for plotting the data with the components is available below.