Google Hacking爬虫修改版

news2025/1/11 13:00:45

这里是个演示

在这里插入图片描述


项目是根据这个项目进行修改的
修改了哪些东西:

  1. 新增个模式,一个Request,一个Selenium
  2. 原版只能读第一页,修改成可以自动判断
  3. 添加了更多的搜索摸板
  4. 输出csv,url+标题+域名

针对第三点:
添加了一下搜索摸板↓

self.search = [
        'site:{domain} intitle:"后台" | intitle:"管理" | intitle:"平台" | intitle:"系统" | intitle:"登录" | intitle:"中心" | intitle:"控制"',
        'site:{domain} inurl:admin',
        'site:{domain} inurl:upload',
        'site:{domain} ext:doc | ext:docx | ext:odt | ext:rtf | ext:sxw | ext:psw | ext:ppt | ext:pptx | ext:pps | ext:csv',
        'site:{domain} intitle:index.of',
        'site:{domain} ext:xml | ext:conf | ext:cnf | ext:reg | ext:inf | ext:rdp | ext:cfg | ext:txt | ext:ora | ext:ini | ext:env',
        'site:{domain} ext:sql | ext:dbf | ext:mdb',
        'site:{domain} ext:log',
        'site:{domain} ext:bkf | ext:bkp | ext:bak | ext:old | ext:backup',
        'site:{domain} inurl:login | inurl:signin | intitle:Login | intitle:"sign in" | inurl:auth',
        'site:{domain} intext:"sql syntax near" | intext:"syntax error has occurred" | intext:"incorrect syntax near" | intext:"unexpected end of SQL command" | intext:"Warning: mysql_connect()" | intext:"Warning: mysql_query()" | intext:"Warning: pg_connect()"',
        'site:{domain} "PHP Parse error" | "PHP Warning" | "PHP Error"',
        'site:{domain} ext:php intitle:phpinfo "published by the PHP Group"',
        'site:pastebin.com | site:paste2.org | site:pastehtml.com | site:slexy.org | site:snipplr.com | site:snipt.net | site:textsnip.com | site:bitpaste.app | site:justpaste.it | site:heypasteit.com | site:hastebin.com | site:dpaste.org | site:dpaste.com | site:codepad.org | site:jsitor.com | site:codepen.io | site:jsfiddle.net | site:dotnetfiddle.net | site:phpfiddle.org | site:ide.geeksforgeeks.org | site:repl.it | site:ideone.com | site:paste.debian.net | site:paste.org | site:paste.org.ru | site:codebeautify.org  | site:codeshare.io | site:trello.com "{domain}"',
        'site:github.com | site:gitlab.com "{domain}"',
        'site:stackoverflow.com "{domain}" ',
        'site:{domain} inurl:signup | inurl:register | intitle:Signup',
        'site:*.{domain}',
        'site:*.*.{domain}',
        '({domain}) (site:*.*.29.* |site:*.*.28.* |site:*.*.27.* |site:*.*.26.* |site:*.*.25.* |site:*.*.24.* |site:*.*.23.* |site:*.*.22.* |site:*.*.21.* |site:*.*.20.* |site:*.*.19.* |site:*.*.18.* |site:*.*.17.* |site:*.*.16.* |site:*.*.15.* |site:*.*.14.* |site:*.*.13.* |site:*.*.12.* |site:*.*.11.* |site:*.*.10.* |site:*.*.9.* |site:*.*.8.* |site:*.*.7.* |site:*.*.6.* |site:*.*.5.* |site:*.*.4.* |site:*.*.3.* |site:*.*.2.* |site:*.*.1.* |site:*.*.0.*)',]

主要代码如下(其他文件可以在原版的github中下载):

# -*- coding:utf-8 -*-
from gevent import monkey;monkey.patch_all()
from colorama import init,Fore
from multiprocessing import Process
from bs4 import BeautifulSoup
import gevent
import asyncio
import random
import time
import requests
import os
import re
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities


'''
基于"site:{domain} inurl:** intext:** "的google搜索工具;
'''

init(wrap=True)  #在windows系统终端输出颜色要使用init(wrap=True)
class Google_query(object):
    def __init__(self,key):
        self.key = key
        self.timeout=3
        self.calc=0
        self.url='/search?q={search}&btnG=Search&safe=active&gbv=1'
        #self.url='/search?q={search}'
        self.Google_domain=[]
        self.search = [
        'site:{domain} intitle:"后台" | intitle:"管理" | intitle:"平台" | intitle:"系统" | intitle:"登录" | intitle:"中心" | intitle:"控制"',
        'site:{domain} inurl:admin',
        'site:{domain} inurl:upload',
        'site:{domain} ext:doc | ext:docx | ext:odt | ext:rtf | ext:sxw | ext:psw | ext:ppt | ext:pptx | ext:pps | ext:csv',
        'site:{domain} intitle:index.of',
        'site:{domain} ext:xml | ext:conf | ext:cnf | ext:reg | ext:inf | ext:rdp | ext:cfg | ext:txt | ext:ora | ext:ini | ext:env',
        'site:{domain} ext:sql | ext:dbf | ext:mdb',
        'site:{domain} ext:log',
        'site:{domain} ext:bkf | ext:bkp | ext:bak | ext:old | ext:backup',
        'site:{domain} inurl:login | inurl:signin | intitle:Login | intitle:"sign in" | inurl:auth',
        'site:{domain} intext:"sql syntax near" | intext:"syntax error has occurred" | intext:"incorrect syntax near" | intext:"unexpected end of SQL command" | intext:"Warning: mysql_connect()" | intext:"Warning: mysql_query()" | intext:"Warning: pg_connect()"',
        'site:{domain} "PHP Parse error" | "PHP Warning" | "PHP Error"',
        'site:{domain} ext:php intitle:phpinfo "published by the PHP Group"',
        'site:pastebin.com | site:paste2.org | site:pastehtml.com | site:slexy.org | site:snipplr.com | site:snipt.net | site:textsnip.com | site:bitpaste.app | site:justpaste.it | site:heypasteit.com | site:hastebin.com | site:dpaste.org | site:dpaste.com | site:codepad.org | site:jsitor.com | site:codepen.io | site:jsfiddle.net | site:dotnetfiddle.net | site:phpfiddle.org | site:ide.geeksforgeeks.org | site:repl.it | site:ideone.com | site:paste.debian.net | site:paste.org | site:paste.org.ru | site:codebeautify.org  | site:codeshare.io | site:trello.com "{domain}"',
        'site:github.com | site:gitlab.com "{domain}"',
        'site:stackoverflow.com "{domain}" ',
        'site:{domain} inurl:signup | inurl:register | intitle:Signup',
        'site:*.{domain}',
        'site:*.*.{domain}',
        '({domain}) (site:*.*.29.* |site:*.*.28.* |site:*.*.27.* |site:*.*.26.* |site:*.*.25.* |site:*.*.24.* |site:*.*.23.* |site:*.*.22.* |site:*.*.21.* |site:*.*.20.* |site:*.*.19.* |site:*.*.18.* |site:*.*.17.* |site:*.*.16.* |site:*.*.15.* |site:*.*.14.* |site:*.*.13.* |site:*.*.12.* |site:*.*.11.* |site:*.*.10.* |site:*.*.9.* |site:*.*.8.* |site:*.*.7.* |site:*.*.6.* |site:*.*.5.* |site:*.*.4.* |site:*.*.3.* |site:*.*.2.* |site:*.*.1.* |site:*.*.0.*)',]
        self.target_domain=[]
        self.Proxies_ip=[]
        self.coroutine=[]
        self.ua=[]
        self.header=[]

    def get_total_pages(self, soap):
        # Get the total number of pages from Google search results
        page_info = soap.select('#foot td b')
        if page_info:
            pages = page_info[-1].get_text()
            return int(pages)
        return 1
    def google_search(self,ua,url,proxies,sleep,page=0):
        try:
            time.sleep(int(sleep))
            # Add page parameter for pagination
            url_page = url + '&start={}'.format(page*10)
            resp=requests.get(url=url_page,headers={'user-agent':ua},proxies=proxies,allow_redirects=False,timeout=30)
            if '302 Moved' in resp.text:
                print(Fore.YELLOW + '[!] ' + Fore.WHITE + '发现Google验证码!!!')
                exit()
            else:
                soap = BeautifulSoup(resp.text, 'html.parser')
                soap = soap.find_all("a")
                results_exist = self.handle(soap,url)
                # If there are results, repeat search on next page
                if results_exist and page < 10:  # Set max pages to 10
                    self.google_search(ua, url, proxies, sleep, page+1)
        except Exception as r:
            print(Fore.RED+'[-] '+Fore.WHITE+'Error {}'.format(r))
    def google_search2(self, url, sleep, page=0):
            try:
                # 配置Selenium
                #driver = webdriver.Chrome(service=Service('path_to_your_chromedriver'))
                driver = webdriver.Chrome()
                time.sleep(int(sleep))
                # Add page parameter for pagination
                url_page = url + '&start={}'.format(page*10)
                driver.get(url_page)

                # 检查是否有验证码
                if '我们的系统检测到您的计算机网络中的流量异常' in driver.page_source:
                    print(Fore.YELLOW + '[!] ' + Fore.WHITE + '发现Google验证码!!!')
                    input('Press enter after you have solved the captcha...')
                
                # 解析结果
                soap = BeautifulSoup(driver.page_source, 'html.parser')
                soap = soap.find_all("a")
                results_exist = self.handle(soap, url)
                # If there are results, repeat search on next page
                if results_exist and page < 10:  # Set max pages to 10
                    self.google_search2(url, sleep, page+1)
                    
                # 关闭webdriver实例
                driver.quit()

            except Exception as r:
                print(Fore.RED+'[-] '+Fore.WHITE+'Error {}'.format(r))
    def handle(self,soap,url):
        # Returns True if results exist on the page
        count=0
        for data in soap:
            res1 = "/url?q"
            res2 = str(data.get('href'))
            if (res1 in res2):
                title=data.find('span')
                result = re.findall(".*>(.*)<.*", str(title))
                if title==None:
                    break
                for x in result:
                    title=x
                url=res2.replace('/url?q=', '')
                head, middle, behind = url.partition('&sa')   #去除多余查询字符串
                print(Fore.GREEN + '[+] ' + Fore.WHITE + 'URL:{} title:{}'.format(
                    head, title))
                # 写入文件
                domain = self.get_domain(head)
                print('{},{},{}'.format(head, title,domain),
                      file=open('result/save.csv', 'a+', encoding='utf-8'))
                count+=1
        if count == 1:
            print('找不到和您查询的: {}相符的内容或信息'.format(url))
            return False  # No results
        else:
            print(Fore.GREEN + '[*] ' + Fore.WHITE + '链接数量:{} 请求的url:{}'.format(count, url))
            return True  # Results exist

    def get_domain(self,head):
        domain = head.replace("https://","").replace("http://","")
        domain = domain.split("/")[0]
        return domain
    # 构造请求
    def Build_Request(self,data):
        for y in data:
            for x in self.search:
                time.sleep(2)
                url='https://'+random.choice(self.Google_domain)+self.url.format(search=str(x).format(domain=y['target_domain']))
                #创建一个普通的Greenlet对象并切换
                if self.key=="1":
                    self.coroutine.append(gevent.spawn(self.google_search,ua=y['user-agent'],url=url,proxies=y['proxies'],sleep=y['sleep']))
                elif self.key=="2":
                    self.coroutine.append(gevent.spawn(self.google_search2,url=url,sleep=y['sleep']))
        #将协程任务添加到事件循环,接收一个任务列表
        gevent.joinall(self.coroutine)
        self.coroutine.clear()

    def Do_query(self):
        data={}
        domain_number=len(self.target_domain)
        if(domain_number==0):
            print(Fore.YELLOW + '[!] ' + Fore.WHITE + '目标domain为空,请赋值!!')
            exit()
        for x in range(domain_number):
            if self.calc==100:
                p=Process(target=self.Build_Request, args=(self.header,))
                p.start()
                self.header.clear()
                self.calc=0
                data={}
            data['user-agent']=random.choice(self.ua)
            data['target_domain']=self.target_domain[x]
            data['proxies']={'http':'http://{}'.format(random.choice(self.Proxies_ip)),'https':'http://{}'.format(random.choice(self.Proxies_ip))}

            data['sleep']=random.choice([x for x in range(1,10)])
            self.header.append(data)
            data = {}
            self.calc+=1


        if len(self.header)>0:
            p = Process(target=self.Build_Request, args=(self.header,))
            p.start()
            self.header.clear()
            self.calc = 0
            data = {}

    def read_file(self,file):
        dk = open(file, 'r', encoding='utf-8')
        for d in dk.readlines():
            data="".join(d.split('\n'))
            yield data

    async def getfile(self):
        if os.path.exists('files/UA.txt') and os.path.exists('files/target_domain.txt') and os.path.exists('files/proxies.txt') and os.path.exists('files/Google_domain.txt'):
            print(Fore.BLUE+'[+] '+Fore.WHITE+'加载所需文件中...')
        else:
            print(Fore.RED+'[-] '+Fore.WHITE+'缺少所需文件..请填补文件')
            exit()

        print(Fore.GREEN+'[~] '+Fore.WHITE+'开始执行google hacking')

        for u in self.read_file('files/UA.txt'):
            self.ua.append(u)

        for t in self.read_file('files/target_domain.txt'):
            self.target_domain.append(t)

        for p in self.read_file('files/proxies.txt'):
            self.Proxies_ip.append(p)

        for g in self.read_file('files/Google_domain.txt'):
            self.Google_domain.append(g)

        self.Do_query()


if __name__ == '__main__':
    author_info = '''
                    *****************************************************                                                  
                    *                Code By littlebin404               *
                    *                   Modify By jq12                  *
                    *              Google Hacking Spider v2!            *                                                  
                    *****************************************************
    '''
    print(author_info)
    key = input("[1]Request(快速)\n[2]Selenium(稳定)\nSelect:")
    obj=Google_query(key)
    loop=asyncio.get_event_loop()
    tk=loop.create_task(obj.getfile())
    loop.run_until_complete(tk)

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/694319.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

自学黑客(网络安全),一般人我劝你还是算了吧(自学网络安全学习路线--第十三章 网络应用安全上)【建议收藏】

文章目录 一、自学网络安全学习的误区和陷阱二、学习网络安全的一些前期准备三、自学网络安全学习路线一、网络攻击的步骤1、搜集初始信息2、搜确定攻击目标的IP地址范围3、扫描存活主机开放的端口4、分析目标系统 二、口令安全1、口令破解2、口令破解方法3、设置安全的口令4、…

【pycharm】 Anaconda3 和 pycharm 安装配置1

anaconda3 下载地址 Anaconda3-2023.03-1-Windows-x86_64.exeC:\ProgramData\anaconda3 安装路径解释器默认是从online下载 或者3.10 实际上我在tbuild下有python3.9

python spider 爬虫 之 解析 xpath 、jsonpath、BeautifulSoup (二)

Jsonpath 安装&#xff1a; pip install -i https://pypi.tuna.tsinghua.edu.cn/simple jsonpath 使用&#xff1a;jsonpath 只能解析本地文件&#xff0c;跟xpath不一样 objjson.load(open(‘json文件’&#xff0c;‘r’, encoding‘utf-8’)) json.load(是文件&#xff0c;…

关于云服务器CentOS7.6版本安装宝塔面板后,点击终端无响应解决方案

问题再现: 下面是我沟通宝塔客服后&#xff0c;给的解决方案。 我在百般无奈的情况下、卸载了宝塔后&#xff0c;最终躺平&#xff0c;选择了问宝塔官方客服 1、从华为提供的远程登录方式选一种 二、输入服务器密码通过ssh远程登录 服务器 二、执行宝塔官方提供的 命令执…

centos7 配置jenkins run docker

本机环境已有jdk11 一、安装配置maven环境 1、下载maven wget https://dlcdn.apache.org/maven/maven-3/3.9.3/binaries/apache-maven-3.9.3-bin.tar.gz 2、解压 tar -zxvf apache-maven-3.9.3-bin.tar.gz 3、移动位置 mv apache-maven-3.9.3 /usr/local/ 4、加入环境变…

Nvidia官方视频编解码性能

NVIDIA VIDEO CODEC SDK | NVIDIA Developer 1080P解码性能&#xff1a; 720P解码性能&#xff1a; 详细的参见官方的链接地址&#xff0c;对于GPU的解码fps能力&#xff0c;可以作为评估参照&#xff01;

Intellij IDEA 插件开发 | 京东云技术团队

写在前面 很多idea插件文档更多的是介绍如何创建一个简单的idea插件&#xff0c;本篇文章从开发环境、demo、生态组件、添加依赖包、源码解读、网络请求、渲染数据、页面交互等方面介绍&#xff0c;是一篇能够满足基本的插件开发工程要求的文章。 如有疏漏欢迎指正&#xff0…

汇编端口

输出年月日 时分秒 assume cs:code , ds:data data segmentdb 0 data ends code segment start:mov ax,datamov ds,axmov cx,3mov di,100mov bl,0mov ah,0mov byte ptr ds:[0],0 st1:mov al,blout 70h,alin al,71hcall showptrsub di,2add bl,2add byte ptr ds:[0],1loop st1mo…

STM32 标准库 任意长度收发

void UART5_IRQHandler(void) //串口1中断服务程序 {u8 Res;if(USART_GetITStatus(UART5, USART_IT_RXNE) ! RESET) //接收中断(接收到的数据必须是0x0d 0x0a结尾){Res USART_ReceiveData(UART5);//(USART1->DR); //读取接收到的数据USART_RX_BUF[USART_RX…

记一次PHP的laravel框架数据库查询报错500

错误的原因 是有个给第三方回调的接口&#xff0c;由于第三方接口是不需要传token的因此在本地测试的时候&#xff0c;我们是将Header加入token的字段&#xff0c;测试一切正常&#xff0c;但是到显示调试过程中一直出现500。 经过不断地定位&#xff0c;发现是在通过订单号到…

almalinux下卸载并升级安装10.9的mariadb(实操)

MariaDB 简介 MariaDB Server 是一个通用的开源关系数据库管理系统。 它是世界上最受欢迎的数据库服务器之一&#xff0c;拥有包括 Wikipedia、WordPress.com 和 Google 在内的知名用户。 MariaDB Server 在 GPLv2 开源许可下发布&#xff0c;并保证保持开源。 它可用于高可用…

一、枚举类型——用枚举实现职责链模式

职责链&#xff08;Chain of Responsibility&#xff09;设计模式先创建了一批用于解决目标问题的不同方法&#xff0c;然后将它们连成一条“链”。当一个请求到达时&#xff0c;会顺着这条链传递下去&#xff0c;直到遇到链上某个可以处理该请求的方法。 可以很容易地用常量特…

魔兽世界自己架设私人服登录不了服务器

要在自己的计算机上架设魔兽世界私人服服务器需要进行如下步骤&#xff1a; 1. 下载和安装魔兽世界服务器文件 首先需要去官网下载和安装魔兽世界服务器文件并进行配置。这一步很关键&#xff0c;因为要确保服务器安装和配置正确&#xff0c;才能保证能够顺利地登录服务器。 …

怎么将pdf A3版转换为A4版

借助WPS进行分割(需开通会员) 步骤一&#xff1a;在WPS中打开PDF文件&#xff0c;点击菜单栏中的“编辑”——“分割页面”功能&#xff0c;进入分割界面&#xff0c;设置分割线的数量和位置&#xff0c;接着点击“立即分割”就能快速将PDF分割成两页了 步骤二&#xff1a;点击…

Flink-SQL 写入PostgreSQL 问题汇总

​ 1.主键字段为空问题 错误信息 org.apache.flink.table.api.TableException: Column bus_no is NOT NULL, however, a null value is being written into it. You can set job configuration table.exec.sink.not-null-enforcerDROP to suppress this exception and drop …

用Excel生成Sql:

用Excel生成Sql: 以如图为例&#xff1a;点击一行数据的后面一个单元格&#xff0c;在上面的fx部分输入&#xff0c;以等号开头证明这是一个公式。在等号的后面写上想要添加的数据&#xff0c;书写规范是这样&#xff1a;“&A2&”表示varchar类型&#xff1b;"&am…

你知道什么是生成对抗网络吗

生成对抗网络&#xff08;GANs&#xff09;是一种深度学习模型&#xff0c;已经显示出在许多生成相关任务中的卓越性能。最近几年&#xff0c;越来越多的研究人员将注意力集中于 GAN 的隐空间属性&#xff0c;并提出了许多利用这些属性进行语义图像编辑的方法。然而&#xff0c…

STM32CubeMX联合CLion开发环境搭建

STM32CubeMX联合CLion开发环境搭建 文章目录 STM32CubeMX联合CLion开发环境搭建1. STM32CubeMX与CLion简介1.1 STM32CubeMX1.2 HAL库1.3 CLion 2. 部署过程2.1 软件部署环境2.2 STM32CubeMX下载及安装2.2 OpenOCD下载与安装2.3 CLion设置 3. 第一个STM32项目 1. STM32CubeMX与C…

IntelliJ IDEA - 通过依赖名查找 Pom.xml 引入的源头坐标

问题描述 今天在新建项目的时候&#xff0c;发现一个注解&#xff08;JsonInclude&#xff09;不知道是哪个包的源头引入的&#xff0c;后来打开原来的老项目&#xff0c;查看对应的源文件&#xff0c;发现如图所示 但是这个 com.fasterxml.jackson.core:jackson-annotations:…

模型实战(13)之YOLOv8实现手语字母检测与识别+权重分享

YOLOv8实现手语字母检测与识别+权重分享 本文借助yolov8 实现手语字母的检测与识别:先检测手的ROI,进而对手语表达的字母含义进行识别全文将从环境搭建、模型训练及预测来展开对整个算法流程进行讲解文中给出了开源数据集链接及从 Roboflow 上的下载教程实现效果如下: 1. 环…