Google Search Competitor Analysis with Python & Flask

How I built a simple web app to analyze and summarize the top Google search results for any keyword

🧠 The Problem

When trying to write SEO-friendly content or understand what ranks well in search results, we often open dozens of tabs, scan articles manually, and take scattered notes. It’s slow, repetitive, and inefficient.

Whether you’re a blogger, digital marketer, content strategist, or curious researcher, there’s a real need for a tool that answers:

  • What kind of content is ranking for this keyword?
  • How detailed are the top articles?
  • What keywords and headings are they using?
  • Are they heavy on images, videos, or text?

💡 The Solution

I built a lightweight Flask web app that automates this whole process:

📌 Just type in a keyword and number of articles to analyze (10, 20, or 30) — and the app does the rest.


✅ The App Does the Following:

  • Searches Google for relevant URLs
  • Scrapes each article (text, metadata, images, headings, etc.)
  • Filters out short/low-quality content
  • Extracts keywords using spaCy NLP
  • Displays the results in a clean, scroll-free dashboard

⚙️ Tech Stack

Here’s what powers this tool:

  • Flask: Handles the web app’s structure, routing, and form processing.
  • Newspaper3k: Extracts clean article text and metadata from URLs.
  • spaCy: Performs keyword and bigram extraction using NLP.
  • BeautifulSoup: Parses HTML to get meta tags and count heading elements.
  • GoogleSearch-Python: Retrieves relevant URLs from Google based on user queries.
  • Bootstrap: Provides styling and ensures responsive, mobile-friendly layout.

🧱 Features

  • 🔍 Google Search Integration: Uses real-time search results
  • ✂️ Article Filtering: Skips short/invalid articles (<500 words)
  • 🧠 NLP Keyword Extraction: Finds most frequent terms & bigrams
  • 🧾 Meta + Structure Analysis: Includes media count, headers, word count
  • 💻 Responsive UI: One-screen layout with dropdowns and styling
  • 🌀 Loading Indicator: So you know when it’s working
  • 🧾 User Query Display: Shows what keyword was searched
  • 📊 Keyword Summary on Top: High-level stats shown first

🖼 UI Preview


🔧 How I Built It

The backend logic revolves around scraping and filtering articles from Google search results, like this:

for url in search_results:
article = Article(url)
article.download()
article.parse()

if len(article.text.split()) >= 500:
keywords = extract_keywords_spacy(article.text)
metadata = extract_meta_tags_and_headings(article.html)
...

The front end uses Bootstrap for layout and a bit of JavaScript for the loading spinner. I also used <details> tags in the README and styled the UI to reduce horizontal scrolling.


🧾 Output Specifications of the app.py Flask App

Only high-quality articles are analyzed and summarized. Below are the detailed specifications of what the app processes and outputs:


✅ Inclusion Criteria

  • Minimum Word Count: Only articles with at least 500 words are accepted.
  • Parse Success: Articles must be successfully downloaded and parsed using newspaper3k.

📊 Output Per Article

For each valid article, the app extracts and displays the following details:

  • Index: A serial number identifying the article in the result list.
  • URL: The direct link to the article.
  • Title: The article’s title, retrieved from the <title> tag.
  • Meta Description: A short description from the <meta name="description"> tag, if available.
  • Character Count: The total number of characters in the main text body of the article.
  • Word Count: The total number of words in the article (must be 500 or more).
  • Media Count: The total number of <img> and <video> elements found in the HTML.
  • Keywords: The top 10 most common words in the article, filtered using spaCy to exclude stopwords.
  • Bigrams: The top 10 most common two-word combinations (e.g., “climate change”).
  • Headings: The count of each HTML heading tag (H1 through H6), shown in a format like H1: 2, H2: 5, ....

📈 Summary Row

At the bottom of the results table, a summary row shows the average values across all valid articles:

  • Average Character Count
  • Average Word Count
  • Average Media Count

Other fields like keywords, bigrams, meta description, and headings are not averaged, and are instead displayed as em dashes ().


📌 Extra Output

  • Top 10 Keywords across all accepted articles.
  • Top 10 Bigrams across all accepted articles

🌐 Try It Locally

You can run the app on your own machine:

git clone https://github.com/paarishaemilie/GoogleSearchCompetitorAnalysis.git
cd GoogleSearchCompetitorAnalysis
pip install -r requirements.txt
python -m spacy download en_core_web_sm
python app.py

Then open http://localhost:5000 in your browser to use the app.


📦 File Structure

GoogleSearchCompetitorAnalysis/
├── app.py # Flask application logic
├── templates/
│ └── index.html # Frontend HTML with Bootstrap styling
├── requirements.txt # Python package list
└── README.md # Project documentation

🔗 GitHub Repo

You can find the full source code here:
👉 paarishaemilie/GoogleSearchCompetitorAnalysis


🚀 What’s Next?

Some cool features I’d love to add:

  • CSV/PDF export of results
  • Word cloud visualizations
  • Sentiment and tone analysis
  • Duplicate domain filtering
  • Timeline trend view across dates

🎯 Final Thoughts

This project taught me how to blend data scrapingnatural language processing, and web development into a fast, visual tool. It’s simple but powerful — and could be extended in many ways for SEO research, journalism, or even academic purposes.

Give it a try, fork it, improve it — and let me know what you think!

Scroll to Top