Skip to content

Unknownx007/DomainScrapeX

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

☠️ DomainScrapeX: Automated Target Footprint Data Ingestion & Directory Spider Framework

                                           [     DEDSEC      ]

📝 Description

DomainScrapeX is a fully dynamic, domain-agnostic intelligence gathering and footprint discovery framework engineered natively for Kali Linux setups. Inspired by the tactical, real-time target mapping capabilities seen in Watch Dogs and Mr. Robot, this framework completely avoids hardcoded static path assumptions. It is built to map out any internet domain globally—whether a corporate enterprise network, an educational portal, or a tech startup ecosystem.

By running a pure browser-masked, homepage link-seeding loop, the spider recursively discovers the website's actual internal navigation links, footers, and page nodes on the fly. As the paths are traversed, the core engine runs strict data-scrubbing filters that discard template artifacts and vector noise, capturing only verified corporate emails, international telephony numbers, and complete, non-truncated executive social footprint anchors (LinkedIn, Facebook, TikTok, Twitter/X, Instagram, YouTube, and GitHub).


⚙️ Core Operational Processing Architecture

The core harvesting engine drives its extraction operations dynamically through three distinct technical steps:

  • Step 1: The Homepage Link-Seeding Loop Instead of executing a loud brute-force dictionary pass, the script queries the target homepage using an engineered HTTP browser session. It parses the initial HTML layout structure to harvest every internal link dynamically, automatically seeding the crawling queue with actual valid directory paths.
  • Step 2: Dynamic Asynchronous Page Sweeping The framework traverses the queue up to a strict maximum safety depth boundary. Each page's raw HTML source bytes are ingested cleanly using persistent session structures that successfully bypass automated anti-bot connection drops.
  • Step 3: High-Precision Token Filtering Matrix The raw text content is fed instantly into specialized Regular Expression (Regex) extraction engines that handle formatting edge cases, isolate regional telephone codes, clean away structural CSS code calculations, and retain critical URL parameter handles.

🛠️ Installation & Environment Configuration

1. Clone the Framework Repository Workspace

git clone https://github.com/Unknownx007/DomainScrapeX
cd DomainScrapeX

2. Configure Dependencies

Recreate your python package environment manifest inside your virtual environment using these commands:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

🚀 Operational Workflow Instructions

DomainScrapeX handles secure TLS layers natively, manages link duplication queues automatically, and streams highlights down your screen in real time.

Method A: Interactive Query Menu

python3 domain_scrape.py
  • Enter your target profile route when prompted by the menu (e.g., example.com).

Method B: Direct Command Line Argument Pass

python3 domain_scrape.py example.com

📂 Forensic Reporting Output

Upon final execution, the framework cleans the stream logs and packages the results:

  • DEDSEC CLASSIFIED DOSSIER Dashboard: A beautiful terminal dashboard classifying your harvested communications lists, telephony channels, and external social network matrices.
  • Machine-Readable State Node (db/intel_<domain>.json): A persistent, structured JSON database file recording target metadata node properties for future infiltration mapping phases.

⚖️ Legal & Ethical Usage Notice

Disclaimer: This software development repository card is built solely for authorized security auditing, defensive gap analysis, educational research, and infrastructure assessment compliance. Executing active scanning sequences against unauthorized production targets without explicit, written mutual contractual permission is strictly prohibited. The framework author assumes zero legal accountability for environmental system downtime or programmatic misuse.

About

Domain-agnostic web infrastructure mapping and data ingestion spider. Dynamically crawls target perimeters via link-seeding to harvest validated corporate emails, global phone systems, and social footprint matrices without hardcoded static path dependencies. Generates structured JSON databases and classified terminal dashboards natively.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages