How to Use Python and the GSC API to Automate Your SEO Reporting

Table of Contents

If you are still exporting CSVs from Google Search Console every Monday morning, opening them in Excel, manually comparing this week’s impressions against last week’s, copying numbers into a client report template, and sending it before your 9am call — you are spending three to five hours a week on a process that a 60-line Python script can run in 45 seconds, automatically, on a schedule, and deliver to a Looker Studio dashboard, a Slack channel, or a formatted PDF without any human involvement whatsoever.

This is not a post about learning Python from scratch. It is a post about the specific, practical intersection between Python and the Google Search Console API that transforms SEO reporting from a manual, error-prone, agency-hours-consuming obligation into an automated intelligence layer that runs invisibly in the background and surfaces the right information at the right time.

The methodology here is used by SEO professionals at UK agencies and in-house marketing teams at British brands that cannot afford to have skilled analysts spending their Thursday afternoons copy-pasting data. It is accessible to anyone who can follow a set of instructions, install a library, and run a script — no prior Python experience required to get the foundational setup working.

Why Manual Search Console Reporting Is Costing UK Businesses More Than They Realise

The cost of manual SEO reporting is not just the time it takes to produce. It is the analytical quality that gets sacrificed when the production process is so burdensome that the analyst is focused on completing the report rather than interpreting it.

When a report takes three hours to produce, the last thing anyone wants to do is spend another hour interrogating the data. The numbers get inserted, the template gets filled, the email gets sent. Insights that would have been visible to a fresh analyst looking at the data with genuine curiosity get buried under the fatigue of the production process.

Automated reporting inverts this entirely. When data extraction and formatting takes 45 seconds instead of three hours, the analyst’s entire cognitive budget is available for interpretation — for spotting the anomaly in impressions that started six weeks ago and correlates with a specific Google algorithm update, for identifying the keyword cluster that has quietly doubled its average position over the past month without anyone noticing, for catching the client’s most valuable page dropping from position three to position eleven before the client notices it themselves.

Beyond analytical quality, there is a competitive positioning argument. UK agencies that deliver automated, real-time SEO dashboards with alert-based anomaly detection are providing a fundamentally different service from agencies delivering manually assembled monthly PDF reports. The former demonstrates operational sophistication that justifies premium retainer pricing. The latter is increasingly commoditised.

The investment in building the automated reporting infrastructure — a day or two of setup, largely following the steps in this guide — pays back within the first month of operation for any agency or in-house team reporting on more than three client or site domains.

Understanding the Google Search Console API

The Google Search Console API is Google’s official programmatic interface for accessing the same performance data available in the Search Console UI — queries, pages, countries, devices, impressions, clicks, average position, and CTR — but without the limitations of the UI and with full programmable control over how the data is queried, filtered, and exported.

The key advantages of the API over the manual UI export:

Date range flexibility — The Search Console UI limits data exports to 1,000 rows per export and has a fixed 16-month data retention window. The API allows you to pull data across any date range within that window in a single programmatic request, and to store the results in your own database for indefinite retention.

Dimension combinations — The UI allows you to view data by one dimension at a time: queries, pages, countries, devices, or search type. The API allows you to combine up to five dimensions in a single query — for example, pulling query + page + device + country + date in one request, giving you the granular breakdown needed for sophisticated segmentation analysis.

Automation and scheduling — API requests can be triggered on a schedule via Python scripts running locally, via cloud functions (Google Cloud Functions, AWS Lambda), or via scheduled notebooks in Google Colab. The data flows automatically on whatever cadence you define.

Volume — Each API request returns up to 25,000 rows. Multiple requests can be paginated to pull complete datasets for large sites without the row limits that cap UI exports.

The API uses OAuth 2.0 for authentication, meaning your Python script authenticates as your Google account and accesses only the Search Console properties you own or have been given access to. It is secure, auditable, and fully aligned with Google’s terms of service.

Step 1: Setting Up Your Google Cloud Project and API Credentials

Before writing a single line of Python, you need to configure the Google Cloud project that will authenticate your API requests.

Create a Google Cloud Project:

Navigate to console.cloud.google.com. If you do not have a Google Cloud account, create one — it is free and does not require payment details for the services used here. Click “New Project,” give it a descriptive name (“SEO Reporting Automation” works), and create it.

Enable the Google Search Console API:

In your new project, navigate to “APIs & Services” → “Library.” Search for “Google Search Console API” and enable it. This takes about 30 seconds.

Create OAuth 2.0 Credentials:

Navigate to “APIs & Services” → “Credentials.” Click “Create Credentials” → “OAuth client ID.” If prompted, configure the OAuth consent screen first: select “External” as the user type, fill in the application name and your email address, and add your Google account email as a test user.

Back in Credentials, select “Desktop application” as the application type. Download the resulting JSON credentials file and save it as client_secrets.json in the directory where your Python scripts will live. This file contains the client ID and secret that allow your script to authenticate with Google’s OAuth system.

A note on service accounts for UK agencies:

If you are building automated reporting for client sites rather than your own, OAuth credentials require each client to complete an authentication flow, granting your application access to their Search Console property. For a cleaner agency setup, Google Cloud service accounts — which authenticate without interactive login — are preferable. Create a service account under “IAM & Admin” → “Service Accounts,” download its JSON key file, and add the service account’s email address as a user with “Full” permission in each client’s Search Console property. The script then uses the service account key rather than OAuth credentials, enabling fully automated access without client involvement in the authentication process.

Step 2: Installing Python Libraries and Authenticating

With credentials in place, install the required Python libraries. If Python is not already installed, download Python 3.10 or higher from python.org — on macOS and most Linux distributions, it comes pre-installed.

Open a terminal and run:

bash
pip install google-auth google-auth-oauthlib google-auth-httplib2 google-api-python-client pandas

These five libraries cover: Google’s authentication flow (google-auth, google-auth-oauthlib, google-auth-httplib2), the API client itself (google-api-python-client), and data manipulation (pandas).

Create a new Python file called gsc_auth.py and add the following authentication setup:

python
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from googleapiclient.discovery import build
import os
import pickle

SCOPES = ['https://www.googleapis.com/auth/webmasters.readonly']

def authenticate_gsc():
    creds = None
    # Load saved credentials if they exist
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    # If no valid credentials, run the OAuth flow
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'client_secrets.json', SCOPES)
            creds = flow.run_local_server(port=0)
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)
    service = build('webmasters', 'v3', credentials=creds)
    return service

The first time you run this, a browser window opens asking you to log in with your Google account and grant the application access to your Search Console data. After completing this once, the credentials are saved in a token. pickle and subsequent runs authenticate silently without browser interaction — which is what enables scheduled automation.

Step 3: Writing Your First Data Query

With authentication working, you can query any Search Console property you have access to. The core API endpoint is the searchAnalytics.query method.

Here is a complete function for pulling the top 1,000 queries by clicks for a specified date range:

python
import pandas as pd
from datetime import datetime, timedelta

def get_query_data(service, site_url, start_date, end_date):
    """Pull query performance data from GSC API."""
    request_body = {
        'startDate': start_date,
        'endDate': end_date,
        'dimensions': ['query', 'page', 'device', 'country'],
        'rowLimit': 25000,
        'startRow': 0,
        'dimensionFilterGroups': [{
            'filters': [{
                'dimension': 'country',
                'operator': 'equals',
                'expression': 'gbr'  # Filter for UK traffic only
            }]
        }]
    }
    
    response = service.searchanalytics().query(
        siteUrl=site_url,
        body=request_body
    ).execute()
    
    rows = response.get('rows', [])
    
    data = []
    for row in rows:
        data.append({
            'query': row['keys'][0],
            'page': row['keys'][1],
            'device': row['keys'][2],
            'country': row['keys'][3],
            'clicks': row['clicks'],
            'impressions': row['impressions'],
            'ctr': round(row['ctr'] * 100, 2),
            'position': round(row['position'], 1)
        })
    
    return pd.DataFrame(data)

# Example usage
service = authenticate_gsc()
site_url = 'https://seosyrup.co.uk/'
end_date = datetime.today().strftime('%Y-%m-%d')
start_date = (datetime.today() - timedelta(days=28)).strftime('%Y-%m-%d')

df = get_query_data(service, site_url, start_date, end_date)
print(df.head(20))

Note the country: gbr filter. For UK businesses, filtering API responses to UK traffic specifically — rather than global totals — produces more actionable reporting, particularly for businesses whose brand queries have significant international volume that would otherwise dilute the UK-specific performance picture.

Step 4: Building the Automated Reporting Analyses

Raw data from the API becomes reporting value through the analyses built on top of it. Here are the five analyses that deliver the most consistent SEO insight for UK businesses, with the Python code for each.

Analysis 1: Position Band Migration — Who Has Moved Up or Down This Week?

This analysis compares average positions for your top 200 queries this week versus last week and categorises them as significant movers — up or down by two or more positions.

python
def position_movers(service, site_url):
    today = datetime.today()
    
    # This week
    this_week_end = today.strftime('%Y-%m-%d')
    this_week_start = (today - timedelta(days=7)).strftime('%Y-%m-%d')
    
    # Last week
    last_week_end = (today - timedelta(days=8)).strftime('%Y-%m-%d')
    last_week_start = (today - timedelta(days=14)).strftime('%Y-%m-%d')
    
    this_week = get_query_data(service, site_url, 
                               this_week_start, this_week_end)
    last_week = get_query_data(service, site_url, 
                               last_week_start, last_week_end)
    
    # Merge and calculate position change
    merged = this_week[['query', 'position', 'clicks', 'impressions']].merge(
        last_week[['query', 'position']].rename(
            columns={'position': 'position_last_week'}),
        on='query', how='inner'
    )
    
    merged['position_change'] = (
        merged['position_last_week'] - merged['position']
    )
    
    # Filter for significant movers (2+ positions)
    movers = merged[abs(merged['position_change']) >= 2].copy()
    movers = movers.sort_values('position_change', ascending=False)
    
    improvers = movers[movers['position_change'] > 0].head(10)
    decliners = movers[movers['position_change'] < 0].head(10)
    
    return improvers, decliners

improvers, decliners = position_movers(service, site_url)
print("Top improvers this week:")
print(improvers[['query', 'position', 'position_last_week', 
                  'position_change', 'clicks']].to_string())
print("\nTop decliners this week:")
print(decliners[['query', 'position', 'position_last_week', 
                  'position_change', 'clicks']].to_string())

This analysis, run weekly and delivered automatically to a Slack channel or email, is the single most time-sensitive SEO insight available. A page dropping from position four to position nine on a high-volume commercial query needs to be investigated immediately — not discovered three weeks later during a monthly report review.

Analysis 2: CTR Anomaly Detection — Pages Ranking Well But Converting Poorly

Pages ranking in positions one through five with below-average CTR are either suffering from poor title and meta description writing, being suppressed by SERP features (AI Overviews, featured snippets) that appear above them, or targeting queries where the intent does not match the page content.

python
def ctr_anomalies(df, position_threshold=5, ctr_threshold=5.0):
    """
    Identify pages ranking well but with suspiciously low CTR.
    Average UK CTR for position 1-3 is typically 15-30%.
    Pages below ctr_threshold in top positions warrant investigation.
    """
    high_position = df[
        (df['position'] <= position_threshold) & 
        (df['impressions'] >= 100)  # Minimum impressions for statistical relevance
    ].copy()
    
    anomalies = high_position[
        high_position['ctr'] < ctr_threshold
    ].sort_values('impressions', ascending=False)
    
    return anomalies[['query', 'page', 'position', 'ctr', 
                       'impressions', 'clicks']].head(20)

df = get_query_data(service, site_url, start_date, end_date)
anomalies = ctr_anomalies(df)
print("CTR anomalies (good rankings, low click-through):")
print(anomalies.to_string())

For a UK digital marketing agency, this analysis consistently surfaces title tag and meta description optimisation opportunities — pages ranking in positions two or three that are being ignored because the title is generic or the meta description fails to communicate a clear value proposition to a UK searcher.

Analysis 3: Keyword Cannibalisation Detector — Multiple Pages Competing for the Same Query

python
def cannibalisation_report(df, min_impressions=50):
    """
    Find queries where multiple pages are ranking simultaneously,
    indicating potential cannibalisation.
    """
    # Group by query and count unique pages ranking for each
    query_page_counts = df[df['impressions'] >= min_impressions].groupby(
        'query'
    ).agg(
        page_count=('page', 'nunique'),
        total_impressions=('impressions', 'sum'),
        avg_position=('position', 'mean'),
        pages=('page', lambda x: ', '.join(x.unique()))
    ).reset_index()
    
    # Filter for queries where more than one page ranks
    cannibalised = query_page_counts[
        query_page_counts['page_count'] > 1
    ].sort_values('total_impressions', ascending=False)
    
    return cannibalised.head(25)

cannibalisation = cannibalisation_report(df)
print("Potential cannibalisation issues:")
print(cannibalisation.to_string())

Real-world example: A UK professional services firm running this analysis discovered that their homepage and a blog post titled “What Does an Employment Solicitor Do?” were competing for the query “employment solicitor London” — with the homepage ranking at position six and the blog post at position eleven. Neither was optimised to be the canonical ranking target. After consolidating the intent onto a dedicated service page with proper internal linking from both the homepage and the blog post, the service page reached position three within eight weeks. The API-detected cannibalisation was invisible in manual Search Console browsing.

Analysis 4: Content Decay Tracker — Pages Losing Impressions Over Time

python
def content_decay_report(service, site_url):
    """
    Compare page-level impressions over two 28-day periods
    to identify pages with significant impression decline.
    """
    today = datetime.today()
    
    recent_end = today.strftime('%Y-%m-%d')
    recent_start = (today - timedelta(days=28)).strftime('%Y-%m-%d')
    prior_end = (today - timedelta(days=29)).strftime('%Y-%m-%d')
    prior_start = (today - timedelta(days=56)).strftime('%Y-%m-%d')
    
    recent_data = get_query_data(service, site_url, 
                                  recent_start, recent_end)
    prior_data = get_query_data(service, site_url, 
                                 prior_start, prior_end)
    
    # Aggregate to page level
    recent_pages = recent_data.groupby('page').agg(
        recent_impressions=('impressions', 'sum'),
        recent_clicks=('clicks', 'sum')
    ).reset_index()
    
    prior_pages = prior_data.groupby('page').agg(
        prior_impressions=('impressions', 'sum'),
        prior_clicks=('clicks', 'sum')
    ).reset_index()
    
    merged = recent_pages.merge(prior_pages, on='page', how='inner')
    merged['impression_change_pct'] = (
        (merged['recent_impressions'] - merged['prior_impressions']) / 
        merged['prior_impressions'] * 100
    ).round(1)
    
    # Pages with >20% impression decline and meaningful prior volume
    decaying = merged[
        (merged['impression_change_pct'] < -20) & 
        (merged['prior_impressions'] >= 500)
    ].sort_values('impression_change_pct')
    
    return decaying[['page', 'recent_impressions', 
                      'prior_impressions', 'impression_change_pct',
                      'recent_clicks']].head(20)

decay_report = content_decay_report(service, site_url)
print("Content decay alert — pages losing impressions:")
print(decay_report.to_string())

Step 5: Automating Delivery — Scheduling, Dashboards, and Alerts

Analyses that run manually on demand are useful. Analyses that run automatically and deliver outputs without human initiation are transformative. Here are three delivery mechanisms suited to different UK business contexts.

Scheduled execution with Windows Task Scheduler or cron (macOS/Linux)

For scripts running on a local machine or an always-on server, schedule your Python reporting script to run automatically on a defined cadence.

On macOS or Linux, add a cron job by running crontab -e and adding a line like:

bash
# Run SEO report every Monday at 7:30am
30 7 * * 1 /usr/bin/python3 /path/to/your/seo_report.py

On Windows, use Task Scheduler to create a basic task that runs python.exe with your script path as the argument, triggered weekly on Monday morning.

The script writes its output to a CSV, updates a Google Sheet via the Sheets API, or sends a formatted email via Python’s smtplib library — whichever delivery format your team or client prefers.

Google Looker Studio integration

The most polished reporting delivery for UK agency clients is a live Looker Studio dashboard connected directly to a Google Sheet that is updated automatically by your Python script. The workflow: your script writes weekly data to a dedicated Google Sheet (via the gspread Python library), Looker Studio reads from that Sheet as its data source, and the dashboard updates automatically whenever the Sheet is refreshed.

Clients access a live URL at any time and see current SEO performance data without requesting a report, without waiting for a human to produce one, and without any discrepancy between what they see and what their account manager sees.

This delivery model — a permanent, auto-updating dashboard — changes the nature of the client relationship. Monthly report reviews shift from “here is what happened last month” to “you can see the data any time; let’s discuss what we are doing about the interesting patterns.” That is a more strategic, more valuable conversation.

Automated Slack alerts for anomalies

For time-sensitive insights — a significant ranking drop on a commercial keyword, a spike in 404 errors in the log analysis, a CTR anomaly on a high-impression page — automated Slack alerts eliminate the latency between the data event and the response.

Using the slack-sdk Python library and a Slack incoming webhook URL from your workspace settings:

python
from slack_sdk import WebClient

def send_slack_alert(message, webhook_url):
    client = WebClient()
    client.chat_postMessage(
        channel='#seo-alerts',
        text=message
    )

# Example: Alert when a top-10 keyword drops 3+ positions
if not decliners.empty:
    top_decliner = decliners.iloc[0]
    alert_message = (
        f"⚠️ *SEO Alert — Position Drop Detected*\n"
        f"Query: `{top_decliner['query']}`\n"
        f"Position: {top_decliner['position_last_week']} → "
        f"{top_decliner['position']} "
        f"({top_decliner['position_change']:+.1f})\n"
        f"Weekly clicks: {int(top_decliner['clicks'])}"
    )
    send_slack_alert(alert_message, webhook_url)

A Monday morning Slack message listing the week’s top ranking movers, CTR anomalies, and content decay alerts replaces the manual reporting process entirely while delivering better, more timely information than the manual process ever could.

Step 6: Scaling to Multiple Sites

For UK agencies managing multiple client sites, the reporting infrastructure needs to operate across all properties simultaneously without manual intervention per site.

Structure your script to iterate over a list of site URLs and generate separate report outputs for each:

python
CLIENT_SITES = {
    'Client A': 'https://clienta.co.uk/',
    'Client B': 'https://clientb.com/',
    SEO Syrup': 'https://seosyrup.co.uk/
}

service = authenticate_gsc()

for client_name, site_url in CLIENT_SITES.items():
    print(f"\nGenerating report for {client_name}...")
    
    df = get_query_data(service, site_url, start_date, end_date)
    
    # Run all analyses
    improvers, decliners = position_movers(service, site_url)
    anomalies = ctr_anomalies(df)
    cannibalisation = cannibalisation_report(df)
    
    # Write to separate Google Sheet tabs per client
    # (gspread implementation here)
    
    print(f"✓ {client_name} report complete")

A script structured this way generates reports for ten client sites in the time it previously took to produce one — and every report contains more granular, more actionable analysis than a manually assembled monthly PDF could ever deliver.

The Compounding Advantage of Data-Driven SEO Operations

The businesses and agencies that invest in API-driven SEO reporting infrastructure do not just save time. They build a compounding analytical advantage over competitors operating on manual, monthly reporting cycles.

When you can detect a ranking drop within 48 hours rather than three weeks, you can investigate and respond before the client notices and before the traffic impact is severe. When your content decay analysis runs automatically every week, you catch declines at 20% impressions loss rather than 60%. When cannibalisation is flagged algorithmically across your entire site, you find structural issues that would never surface in a routine monthly review.

This operational advantage is not visible to clients in the form of a flashy presentation or a compelling pitch deck. It is visible in results: faster responses to algorithm updates, fewer surprises in quarterly reviews, and a pattern of catching and fixing issues before they become problems. That track record — delivered consistently over a 12-month retainer — is what generates referrals, renewals, and the kind of client relationships that underpin a sustainable UK agency business.

The Python and Search Console API setup described in this guide takes one to two days to build from scratch. Every week thereafter, it returns that investment multiple times over in analyst time saved and insight quality gained.

Want Automated SEO Reporting That Actually Tells You Something?

At SEO Syrup, we build custom SEO reporting automation for UK businesses and agencies — from Search Console API integrations through to multi-site Looker Studio dashboards with anomaly alerting. We design the analyses around your specific business questions, not generic templates, and we build the infrastructure to run reliably on your hosting setup without ongoing maintenance overhead.

If your current SEO reporting is manual, monthly, and primarily backwards-looking, you are missing the real-time intelligence layer that separates reactive SEO management from proactive performance optimisation.

Book your free consultation today →

Tell us about your current reporting setup, your site volume, and the SEO questions you most need answered on a regular basis — and we will show you exactly what an automated reporting architecture would look like for your specific business, and what it would take to build it.

Boost Your Rankings & Get Found on Google

Grow your business with powerful SEO strategies that drive real traffic, leads, and conversions. Let’s turn your website into a consistent growth machine.

 

Ready to Grow Your Online Visibility?

Get expert SEO, paid ads, and digital marketing solutions tailored to your business goals. Start attracting the right customers today with proven strategies.