AI Web Scraper

An intelligent web scraping tool that extracts specific information from websites using natural language descriptions. Simply describe what data you want to extract, and the AI will find it for you.

Features

Natural Language Parsing: Describe what you want to extract in plain English
Interactive Web Interface: Built with Streamlit for easy use
AI-Powered Extraction: Uses Ollama with Gemma3 model for intelligent content parsing
Content Preview: View cleaned DOM content before parsing
Batch Processing: Handles large websites by chunking content

Prerequisites

Python 3.8+
Chrome browser installed
Ollama installed with Gemma3 model (Can change the model as you wish)

Installation

Clone the repository:

git clone <repo_url>
cd ai_web_scraper

Install dependencies:

pip install -r requirements.txt

Install and run Ollama with Gemma3:

# Install Ollama from https://ollama.ai
ollama pull gemma3

Usage

Start the application:

streamlit run ai_scraper.py

Enter a website URL
Click "Scrape Website" to extract content
Describe what information you want to parse (e.g., "all email addresses", "product prices", "contact information")
Click "Parse Content" to get AI-extracted results

Example Use Cases

Extract contact information from business websites
Gather product details from e-commerce sites
Collect news headlines and summaries
Parse job listings for specific requirements
Extract research paper abstracts

Tech Stack

Frontend: Streamlit
Web Scraping: Selenium, BeautifulSoup
AI Processing: LangChain + Ollama (Gemma3)
Language: Python

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
README.md		README.md
ai_scraper.py		ai_scraper.py
parse.py		parse.py
requirements.txt		requirements.txt
scrape.py		scrape.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Web Scraper

Features

Prerequisites

Installation

Usage

Example Use Cases

Tech Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Web Scraper

Features

Prerequisites

Installation

Usage

Example Use Cases

Tech Stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages