How I built a simple web app to analyze and summarize the top Google search results for any keyword
🧠 The Problem
When trying to write SEO-friendly content or understand what ranks well in search results, we often open dozens of tabs, scan articles manually, and take scattered notes. It’s slow, repetitive, and inefficient.
Whether you’re a blogger, digital marketer, content strategist, or curious researcher, there’s a real need for a tool that answers:
- What kind of content is ranking for this keyword?
- How detailed are the top articles?
- What keywords and headings are they using?
- Are they heavy on images, videos, or text?
💡 The Solution
I built a lightweight Flask web app that automates this whole process:
📌 Just type in a keyword and number of articles to analyze (10, 20, or 30) — and the app does the rest.
✅ The App Does the Following:
- Searches Google for relevant URLs
- Scrapes each article (text, metadata, images, headings, etc.)
- Filters out short/low-quality content
- Extracts keywords using spaCy NLP
- Displays the results in a clean, scroll-free dashboard
⚙️ Tech Stack
Here’s what powers this tool:
- Flask: Handles the web app’s structure, routing, and form processing.
- Newspaper3k: Extracts clean article text and metadata from URLs.
- spaCy: Performs keyword and bigram extraction using NLP.
- BeautifulSoup: Parses HTML to get meta tags and count heading elements.
- GoogleSearch-Python: Retrieves relevant URLs from Google based on user queries.
- Bootstrap: Provides styling and ensures responsive, mobile-friendly layout.
🧱 Features
- 🔍 Google Search Integration: Uses real-time search results
- ✂️ Article Filtering: Skips short/invalid articles (<500 words)
- 🧠 NLP Keyword Extraction: Finds most frequent terms & bigrams
- 🧾 Meta + Structure Analysis: Includes media count, headers, word count
- 💻 Responsive UI: One-screen layout with dropdowns and styling
- 🌀 Loading Indicator: So you know when it’s working
- 🧾 User Query Display: Shows what keyword was searched
- 📊 Keyword Summary on Top: High-level stats shown first
🖼 UI Preview

🔧 How I Built It
The backend logic revolves around scraping and filtering articles from Google search results, like this:
for url in search_results:
article = Article(url)
article.download()
article.parse()
if len(article.text.split()) >= 500:
keywords = extract_keywords_spacy(article.text)
metadata = extract_meta_tags_and_headings(article.html)
...
The front end uses Bootstrap for layout and a bit of JavaScript for the loading spinner. I also used <details> tags in the README and styled the UI to reduce horizontal scrolling.
🧾 Output Specifications of the app.py Flask App
Only high-quality articles are analyzed and summarized. Below are the detailed specifications of what the app processes and outputs:
✅ Inclusion Criteria
- Minimum Word Count: Only articles with at least 500 words are accepted.
- Parse Success: Articles must be successfully downloaded and parsed using
newspaper3k.
📊 Output Per Article
For each valid article, the app extracts and displays the following details:
- Index: A serial number identifying the article in the result list.
- URL: The direct link to the article.
- Title: The article’s title, retrieved from the
<title>tag. - Meta Description: A short description from the
<meta name="description">tag, if available. - Character Count: The total number of characters in the main text body of the article.
- Word Count: The total number of words in the article (must be 500 or more).
- Media Count: The total number of
<img>and<video>elements found in the HTML. - Keywords: The top 10 most common words in the article, filtered using spaCy to exclude stopwords.
- Bigrams: The top 10 most common two-word combinations (e.g., “climate change”).
- Headings: The count of each HTML heading tag (H1 through H6), shown in a format like
H1: 2, H2: 5, ....
📈 Summary Row
At the bottom of the results table, a summary row shows the average values across all valid articles:
- Average Character Count
- Average Word Count
- Average Media Count
Other fields like keywords, bigrams, meta description, and headings are not averaged, and are instead displayed as em dashes (—).
📌 Extra Output
- Top 10 Keywords across all accepted articles.
- Top 10 Bigrams across all accepted articles
🌐 Try It Locally
You can run the app on your own machine:
git clone https://github.com/paarishaemilie/GoogleSearchCompetitorAnalysis.git
cd GoogleSearchCompetitorAnalysis
pip install -r requirements.txt
python -m spacy download en_core_web_sm
python app.py
Then open http://localhost:5000 in your browser to use the app.
📦 File Structure
GoogleSearchCompetitorAnalysis/
├── app.py # Flask application logic
├── templates/
│ └── index.html # Frontend HTML with Bootstrap styling
├── requirements.txt # Python package list
└── README.md # Project documentation
🔗 GitHub Repo
You can find the full source code here:
👉 paarishaemilie/GoogleSearchCompetitorAnalysis
🚀 What’s Next?
Some cool features I’d love to add:
- CSV/PDF export of results
- Word cloud visualizations
- Sentiment and tone analysis
- Duplicate domain filtering
- Timeline trend view across dates
🎯 Final Thoughts
This project taught me how to blend data scraping, natural language processing, and web development into a fast, visual tool. It’s simple but powerful — and could be extended in many ways for SEO research, journalism, or even academic purposes.
Give it a try, fork it, improve it — and let me know what you think!



