Google Search Competitor Analysis with Python & Flask

How I built a simple web app to analyze and summarize the top Google search results for any keyword

🧠 The Problem

When trying to write SEO-friendly content or understand what ranks well in search results, we often open dozens of tabs, scan articles manually, and take scattered notes. It’s slow, repetitive, and inefficient.

Whether you’re a blogger, digital marketer, content strategist, or curious researcher, there’s a real need for a tool that answers:

What kind of content is ranking for this keyword?
How detailed are the top articles?
What keywords and headings are they using?
Are they heavy on images, videos, or text?

💡 The Solution

I built a lightweight Flask web app that automates this whole process:

📌 Just type in a keyword and number of articles to analyze (10, 20, or 30) — and the app does the rest.

✅ The App Does the Following:

Searches Google for relevant URLs
Scrapes each article (text, metadata, images, headings, etc.)
Filters out short/low-quality content
Extracts keywords using spaCy NLP
Displays the results in a clean, scroll-free dashboard

⚙️ Tech Stack

Here’s what powers this tool:

Flask: Handles the web app’s structure, routing, and form processing.
Newspaper3k: Extracts clean article text and metadata from URLs.
spaCy: Performs keyword and bigram extraction using NLP.
BeautifulSoup: Parses HTML to get meta tags and count heading elements.
GoogleSearch-Python: Retrieves relevant URLs from Google based on user queries.
Bootstrap: Provides styling and ensures responsive, mobile-friendly layout.

🧱 Features

🔍 Google Search Integration: Uses real-time search results
✂️ Article Filtering: Skips short/invalid articles (<500 words)
🧠 NLP Keyword Extraction: Finds most frequent terms & bigrams
🧾 Meta + Structure Analysis: Includes media count, headers, word count
💻 Responsive UI: One-screen layout with dropdowns and styling
🌀 Loading Indicator: So you know when it’s working
🧾 User Query Display: Shows what keyword was searched
📊 Keyword Summary on Top: High-level stats shown first

🖼 UI Preview

🔧 How I Built It

The backend logic revolves around scraping and filtering articles from Google search results, like this:

for url in search_results:
    article = Article(url)
    article.download()
    article.parse()

    if len(article.text.split()) >= 500:
        keywords = extract_keywords_spacy(article.text)
        metadata = extract_meta_tags_and_headings(article.html)
        ...

The front end uses Bootstrap for layout and a bit of JavaScript for the loading spinner. I also used <details> tags in the README and styled the UI to reduce horizontal scrolling.

🧾 Output Specifications of the `app.py` Flask App

Only high-quality articles are analyzed and summarized. Below are the detailed specifications of what the app processes and outputs:

✅ Inclusion Criteria

Minimum Word Count: Only articles with at least 500 words are accepted.
Parse Success: Articles must be successfully downloaded and parsed using newspaper3k.

📊 Output Per Article

For each valid article, the app extracts and displays the following details:

Index: A serial number identifying the article in the result list.
URL: The direct link to the article.
Title: The article’s title, retrieved from the <title> tag.
Meta Description: A short description from the <meta name="description"> tag, if available.
Character Count: The total number of characters in the main text body of the article.
Word Count: The total number of words in the article (must be 500 or more).
Media Count: The total number of <img> and <video> elements found in the HTML.
Keywords: The top 10 most common words in the article, filtered using spaCy to exclude stopwords.
Bigrams: The top 10 most common two-word combinations (e.g., “climate change”).
Headings: The count of each HTML heading tag (H1 through H6), shown in a format like H1: 2, H2: 5, ....

📈 Summary Row

At the bottom of the results table, a summary row shows the average values across all valid articles:

Average Character Count
Average Word Count
Average Media Count

Other fields like keywords, bigrams, meta description, and headings are not averaged, and are instead displayed as em dashes (—).

📌 Extra Output

Top 10 Keywords across all accepted articles.
Top 10 Bigrams across all accepted articles

🌐 Try It Locally

You can run the app on your own machine:

git clone https://github.com/paarishaemilie/GoogleSearchCompetitorAnalysis.git
cd GoogleSearchCompetitorAnalysis
pip install -r requirements.txt
python -m spacy download en_core_web_sm
python app.py

Then open http://localhost:5000 in your browser to use the app.

📦 File Structure

GoogleSearchCompetitorAnalysis/
├── app.py               # Flask application logic
├── templates/
│   └── index.html       # Frontend HTML with Bootstrap styling
├── requirements.txt     # Python package list
└── README.md            # Project documentation

🔗 GitHub Repo

You can find the full source code here:
👉 paarishaemilie/GoogleSearchCompetitorAnalysis

🚀 What’s Next?

Some cool features I’d love to add:

CSV/PDF export of results
Word cloud visualizations
Sentiment and tone analysis
Duplicate domain filtering
Timeline trend view across dates

🎯 Final Thoughts

This project taught me how to blend data scraping, natural language processing, and web development into a fast, visual tool. It’s simple but powerful — and could be extended in many ways for SEO research, journalism, or even academic purposes.

Give it a try, fork it, improve it — and let me know what you think!