How-to-use-Google-indexing-API-in-bulk-with-Python-250px

How to use Google’s Indexing API in bulk with Python

March 7, 2023 - 7  min reading time - by Morteza Najafi
Home > Technical SEO > How to use Google's Indexing API with Python

Gone (well almost) are the days when you needed to wait around for the Googlebot to crawl your site, discover your new or freshly updated pages, and then proceed to indexing them. Initially released in 2018, and reserved for job posting URLs, Google’s Indexing API allows site owners to directly notify Google when they have added or removed pages on their site.

The Indexing API can be used to:

  • Update URLs in the index and keep search results up-to-date for higher-quality user traffic.
  • Remove any old or unnecessary URLs from the index.
  • Send batch requests to reduce the number of times you call the API.

 

As you may know, in Indexing API QuickStart, Google recommends using Indexing API instead of a sitemap, because in comparison to sitemaps, the API receives requests to update the page or publish a new page faster.

Sitemaps and Indexing

In this article, we will look at how to use Python for building a script in order to send your site’s URL index requests to Google in bulk and also ask Google to crawl your pages faster.

As a bonus, you can also use this script to send requests for pages you have updated or want to remove from Google.

Libraries we’ll need

To build this script in Python we will use Google Colab and we will also need the following libraries:

  • oauth2client
  • GoogleApiClient
  • httplib2
  • JSON
  • Google Colab
  • OS

To install these libraries on Google Colab, you can use the following command:

 !pip install oauth2client

Or, if you wish to install the libraries in Windows, you will need to enter the following command in ‘Command Prompt’ or in ‘Terminal’ if you are using MacOS:

 pip install oauth2client

Using the libraries

After installing the necessary libraries, to use and call them, you will need to use the following codes:

from oauth2client.service_account import ServiceAccountCredentials
from googleapiclient.discovery import build
from googleapiclient.http import BatchHttpRequest
import httplib2
import json
from google.colab import files
import os

Preparing the URLs

Next, we need to copy our URLs in a text file from which you can inform Google of your edits, new releases and deleted pages. Keep in mind that you are limited to 100 links or less per day when using the Indexing API. The text file should look like this:

URL Notepad

In Google Colab, you can use the following code to upload and call the corresponding text file of your URLs:

uploaded_file = files.upload()

After that, we need to create a dictionary and prepare the URLs for sending requests. We can do so using the following code:

list=[]
for filename in uploaded_file.keys():
lines = uploaded_file[filename].splitlines()
for line in lines:
list.append(line.decode('utf-8'))

requests ={}
for i in list :
requests[i]="URL_UPDATED"
print(requests)

It is necessary to mention that in the code, for updating or publishing new content, the needed dictionary will be made. If you need to remove URLs, you could simply use the URL_DELETED command instead of URL_UPDATED.

Creating and activating Indexing API

To create and activate the API, you will need to go to the Google Developer Console, then click on ‘Select a project’ and use the option, ‘New project’ to create a new project.

Google Cloud_Creating and activating Indexing API step 1

Google Cloud_Creating and activating Indexing API step 2

Then, you will need to pick a name for your project and select ‘Create’.

Google Cloud_Creating and activating Indexing API step 3

After creating the project, you will then need to select it from the project section of the menu, then select IAM & ADMIN from the left menu and finally pick ‘Service Accounts’.

Google Cloud_Creating and activating Indexing API step 4

After this step, click on ‘Create service account” and proceed to create your account. In the first section, pick a name for your account then click on ‘Create and continue’. Once completed, you can move on to the second step.

In the section, ‘Grant this service account access to project’, pick a role for your account and make sure to pick ‘Owner’ from the Quick access menu, followed by the ‘Basic’ section. Then, click on ‘Continue’ and in the next step you won’t need to change anything, just click on ‘Done’.

Google Cloud_Creating and activating Indexing API step 5

On the opened page, save the email address that is in the ‘Email’ field, because we are going to need it later. Click on ‘Actions’ and then click on ‘Manage Keys’.

Google Cloud_Creating and activating Indexing API step 6

In the opened page, from the ‘Add Key’ section, click on ‘Create new key’ and then create a JSON file and save the downloaded file.

Google Cloud_Creating and activating Indexing API step 7

Now is the time to activate the Indexing API; to do so, from the ‘APIs & Services’ section, click on ‘Enable APIs & services’.

Google Cloud_Creating and activating Indexing API step 8

On the next page, search ‘Indexing API’. Once you have selected it, click on ‘Enable’ to activate your API.

Adding an email to Search Console

To use the Indexing API, you need to create an access account for it in your website’s Google Search Console. To do so, you need to open your Google Search Console and from the ‘Settings’ section, click on ‘Users and permissions’ and then add a new user from the ‘Add user’ section. When the new page opens up, enter the email you saved earlier and change its permission to ‘Owner’.

Adding an email to Search Console

If you didn’t save your email, you will just need to go back to service accounts and copy the email.

Your email should also be in the JSON file that you downloaded earlier and you can access it by opening the file and looking for ‘client_email’.

[Case Study] Find and fix indexing issues

From zero traffic to exponential growth: discover how GroupM Turkey used Oncrawl to solve Unilever's indexing issues.

Uploading the JSON file

To upload the JSON file in Google Colab, you will need to use the following code:

JsonKey = files.upload()

Then, you will have to find the path to which the files have been uploaded. This can be done using the OS library. You can use the following code and to make sure that the file has been uploaded, you will need to use an ‘if’ at the beginning:

if JsonKey:
path_to_json = '/content'
json_files = [pos_json for pos_json in os.listdir(path_to_json) if pos_json.endswith('.json')]
path = "/content/" + json_files[0]

Authorizing requests

As you pay be aware, Google has announced that in order to use Indexing API, our program and script needs to use OAuth 2.0 to authorize requests.

About authorization protocols

To use OAuth 2.0 and send requests, you will need to search for some information and to get it, you should use the following link from the Indexing API. For more information on this topic, you can refer to the Authorize Requests page.

SCOPES = [ "https://www.googleapis.com/auth/indexing" ]

Sending requests

Google has explained in Using the Indexing API page that for sending requests, we need to use an end point which has the following conditions and the requests should be sent using the post method:

If you want to send a single request:

ENDPOINT = "https://indexing.googleapis.com/v3/urlNotifications:publish"

If you want to send batch requests:

ENDPOINT = "https://indexing.googleapis.com/batch"

In this script, we send the requests one by one, but the difference is that we have already made a dictionary that contains 100 URLs so all the requests are sent at once.

Now, we need to make the variables of ‘Authorize credentials’ and we do so by using oauth2client library and ServiceAccountCredentials to create the variables using the following codes:

credentials = ServiceAccountCredentials.from_json_keyfile_name(path,scopes=SCOPES)
http = credentials.authorize(httplib2.Http())

In the next step, we will build service tools and then build a function. After that, we handle the final request. For more information on how the codes work, you can read the Class BatchHttpRequest page.

The final codes in this section look like the following:

service = build('indexing', 'v3', credentials=credentials)

def index_api(request_id, response, exception):
if exception is not None:
print(exception)
else:
print(response)
if JsonKey:
batch = service.new_batch_http_request(callback=index_api)

for url, api_type in requests.items():
batch.add(service.urlNotifications().publish(
body={"url": url, "type": api_type}))

batch.execute()

After sending the request, if everything was done correctly, requests will be printed like the example below:

Final codes printed example

You can check the final Google Colab here.

Conclusion

Python is a pretty versatile tool that can help to automate certain tasks, it can be used to extract and analyze your site’s data and, when used properly, it can help to analyze and improve how our sites are crawled and indexed. I hope you’ll be able to make good use of this tutorial for the next time you use Google’s API index.

Although the API is still reserved for websites that have pages with quick turnover – like job posting sites – it’s still a powerful tool. And maybe, who knows, it could be expanded to more things in the future and this tutorial will prove to be even more useful.

Morteza Najafi See all their articles
Morteza Najafi is an SEO specialist with a passion for automating tasks and analyzing data with Python. He focuses on technical and semantic SEO and is also interested in creating scripts to automate repetitive SEO tasks.
Related subjects: