Analyzing Google trends data

Yassaman Djafari, AMIMechE
5 min readJul 28, 2020

--

Google allows you to track search interest towards a specific keyword since 2004. It reports the searched term’s popularity in terms of percentage relative to the highest point. Google trends’ features help you to narrow your search to a specific country. Furthermore, the keywords are categorized into several groups (e.g. arts, business, sports, etc.) or it can represent the searched popularity independent from its category in a general format. Finally, it gives you a list of related keywords that possess the top trend or they are receiving the public’s attention recently (top and rising related queries respectively).

Remember the search interest trends here are based on google web search, google news, google shopping, or youtube.

PyTrends in Python

Python has a developed API to connect to google trends and drive data directly from there for further analysis. PyTrends is available in PyPi and you can install it via pip.

If you are using Jupyter Notebook as your editor you can install the package with following codes:

!python -m pip install — user pytrends

(putting — user is necessary when user permission is needed for any changes)

Trends’ Raw Data

Let’s take a look at raw data and charts that google gives you and what you can do to drive meaningful and helpful data from them.

You can simply put a single keyword and take a look at its trend or you can put up to 5 keywords and compare their trends. Also, you can set the time range, to find the keywords’ popularity during the past hour, past week, past month, or even the past 5 years or even further, since 2004.

For instance, I want to talk about exercise and bodybuilding but I am not sure which term is the most popular one in people’s search. So I started comparing: exercise, workout, conditioning, training, and bodybuilding.

Figure1. Trends timeline since 2004, in a specific category, based on google web search data around the world

The first graph demonstrates the trend in the chosen time range and location. You are faced with a fluctuating diagram where you find the date of the peaks and minimums by moving your cursor on the graph.

You can download the data associated with each graph (timeline, geoMap, and related queries) in .csv(comma-separated value) format. I divided the timeline graph into two separate graphs.

First, I averaged the search interest in each month in 16 years to represent the most search popularity that occurs in which month for 12 months of a year.

Figure2. import required libraries and .csv file data

Keep in mind that it is better to better to prepare your .csv file in advance to define the headers clearly in the data frame. As you can see in the code, I separated rows that were the month of January in each year and convert the row number to an integer (since it is float number by default), and then I averaged the values. In the resulting graph, I highlighted the maximum search interest by the built-in function maximum and annotated it in the plot.

Figure 3. average interest since 2004 toward the term “training” in the sports category in each month the maximum interest occurred in August

Second, to ensure the keyword has an upward trend during recently, I averaged the search interest in 12 months of each year. In this manner, the local fluctuations were omitted and a smoother line was remained to represent the ascending or descending behavior. You may encounter an error here due to values of less than 1 percent. These values are saved in the format of <1, so they are not recognizable in arithmetic algorithms. You can neglect to analyze these columns or you can replace the <1 values with 0.5.

Figure4. average search interest in each year since 2004

As you can see, I changed the Y-axis range from 0% to 100% so the percentage of popularity was more tangible. I always prefer to save the quantified values in a data frame so I can make sure no error has occurred during the process.

table1. average search interest in each year since 2004

For the geoMap file, I preferred bar charts to represent each city search interest in the specified keyword. Although this percentage is highly dependant on the city population, in digital marketing the more the number of the regions interested in the term, the higher the number of audiences or clients.

Figure 5. search interest based on region

Results

Now that you have analyzed data and reform them to a more tangible format, you can start planing based on the driven data. First, you need a big chart to determine which month the market accepts your product or your content with more enthusiasm. And then make sure that this chosen keyword or product has a rising trend, based on that you can predict and plan your 6–12 months product/ content delivery. You always need the best time to attract the highest level of attention. Always check the list of top and rising related queries it gives you new ideas over your future products/contents or even how to narrow your idea or change some aspects to make it more presentable.

--

--