Python3.9及以上Pyinstaller 反编译教程(exe转py)

文章目录

前言
1.使用pyinstxtractor.py将exe文件转换成pyc文件
2.给pyc文件添加文件头
3.使用pycdc工具反编译pyc文件，获得源码

前言

经常使用pyinstaller将一些写的python程序打包成了各种exe，时间一长，源码丢失，为了恢复一部分源码，只得将先前编译好的exe反编译成py文件。

Pyinstaller在打包过程中会将py文件编译为pyc文件，然后去掉pyc文件开头的16个字节。然后将python解释器、依赖文件和修改后的pyc文件一起，用一种特殊的自解压格式打包起来，形成可执行文件。这16个字节是 Python 字节码文件的一个Magic Number和版本信息，它们用于标识这是一个 Python 字节码文件以及它是由哪个版本的 Python 编译的。

本文以test.exe为例，将其反编译为test.py

反编译过程大致分为以下三步：
1.使用pyinstxtractor.py将exe文件转换成pyc文件
2.给pyc文件添加文件头
3.使用pycdc工具反编译pyc文件，获得源码

1.使用pyinstxtractor.py将exe文件转换成pyc文件

新建一个pyinstxtractor.py（具体代码较长贴在文末）
将test.exe跟pyinstxtractor.py放在同一个目录中，cmd窗体执行如下命令：

python pyinstxtractor.py test.exe

出现如下Successfully字眼则表示成功

在这里插入图片描述

在该路径下已经生成了一个“test.exe_extracted"的文件夹

在这里插入图片描述

在“test.exe_extracted"的文件夹找到test文件（没有后缀名），一般你的程序是xxx.exe，就找xxx文件

在这里插入图片描述

将test文件名改为test.pyc

在这里插入图片描述

2.给pyc文件添加文件头

这里使用到了一个工具Sublime Text，是一个轻量级、跨平台的文本和源代码编辑器，也可以用其他编辑器。
Sublime Text下载地址：https://download.csdn.net/download/qq_41273999/89396003?spm=1001.2014.3001.5503

将test.pyc用Sublime Text工具打开，此时是以16进制的方式打开的

在这里插入图片描述

打开“test.exe_extracted"文件夹下的PYZ-00.pyz_extracted文件夹，还是用Sublime Text随便打开其中一个pyc文件

在这里插入图片描述

将xxx.pyc文件的第一行16个字节复制下来，添加到test.pyc文件的第一行，保存，如下图：

在这里插入图片描述

3.使用pycdc工具反编译pyc文件，获得源码

有个很好用的库Uncompyle 6可以反编译pyc文件（pip install uncompyle6 安装好后，运行：uncompyle6 test.pyc）
但需要注意的是：Uncompyle 6暂时无法反编译Python 3.9和更高版本产生的pyc文件，所以推荐一个pycdc工具可以将.pyc文件转换为.py，适用于 Python 3.9及更高版本。目前笔者已测试python3.9和最高版本python3.12，可以反编译成功。

获取pycdc工具有两种途径：

（1）可以去Github手动下载pycdc安装包(但程序需要编译)：https://github.com/zrax/pycdc
程序的编译需要用到CMake，还比较麻烦。。。
如果想试试的话可以参考文章：《Windows下搭建Cmake编译环境进行C/C++文件的编译》

（2）除此之外可以下载编译好的可执行文件：https://download.csdn.net/download/qq_41273999/89397541

将修改后的test.pyc文件和pycdc.exe放在同目录下

在这里插入图片描述

执行命令 pycdc.exe test.pyc>test.py，同目录下生成转换后的test.py文件，该文件就是转换出来的源码

在这里插入图片描述

附录

参考文章：《Windows下搭建Cmake编译环境进行C/C++文件的编译》
Sublime Text工具：https://download.csdn.net/download/qq_41273999/89396003?spm=1001.2014.3001.5503
pycdc工具：https://download.csdn.net/download/qq_41273999/89397541

pyinstxtractor.py内容如下：

"""
PyInstaller Extractor v1.9 (Supports pyinstaller 3.3, 3.2, 3.1, 3.0, 2.1, 2.0)
Author : Extreme Coders
E-mail : extremecoders(at)hotmail(dot)com
Web    : https://0xec.blogspot.com
Date   : 29-November-2017
Url    : https://sourceforge.net/projects/pyinstallerextractor/

For any suggestions, leave a comment on
https://forum.tuts4you.com/topic/34455-pyinstaller-extractor/

This script extracts a pyinstaller generated executable file.
Pyinstaller installation is not needed. The script has it all.

For best results, it is recommended to run this script in the
same version of python as was used to create the executable.
This is just to prevent unmarshalling errors(if any) while
extracting the PYZ archive.

Usage : Just copy this script to the directory where your exe resides
        and run the script with the exe file name as a parameter

C:\path\to\exe\>python pyinstxtractor.py <filename>
$ /path/to/exe/python pyinstxtractor.py <filename>

Licensed under GNU General Public License (GPL) v3.
You are free to modify this source.

CHANGELOG
================================================

Version 1.1 (Jan 28, 2014)
-------------------------------------------------
- First Release
- Supports only pyinstaller 2.0

Version 1.2 (Sept 12, 2015)
-------------------------------------------------
- Added support for pyinstaller 2.1 and 3.0 dev
- Cleaned up code
- Script is now more verbose
- Executable extracted within a dedicated sub-directory

(Support for pyinstaller 3.0 dev is experimental)

Version 1.3 (Dec 12, 2015)
-------------------------------------------------
- Added support for pyinstaller 3.0 final
- Script is compatible with both python 2.x & 3.x (Thanks to Moritz Kroll @ Avira Operations GmbH & Co. KG)

Version 1.4 (Jan 19, 2016)
-------------------------------------------------
- Fixed a bug when writing pyc files >= version 3.3 (Thanks to Daniello Alto: https://github.com/Djamana)

Version 1.5 (March 1, 2016)
-------------------------------------------------
- Added support for pyinstaller 3.1 (Thanks to Berwyn Hoyt for reporting)

Version 1.6 (Sept 5, 2016)
-------------------------------------------------
- Added support for pyinstaller 3.2
- Extractor will use a random name while extracting unnamed files.
- For encrypted pyz archives it will dump the contents as is. Previously, the tool would fail.

Version 1.7 (March 13, 2017)
-------------------------------------------------
- Made the script compatible with python 2.6 (Thanks to Ross for reporting)

Version 1.8 (April 28, 2017)
-------------------------------------------------
- Support for sub-directories in .pyz files (Thanks to Moritz Kroll @ Avira Operations GmbH & Co. KG)

Version 1.9 (November 29, 2017)
-------------------------------------------------
- Added support for pyinstaller 3.3
- Display the scripts which are run at entry (Thanks to Michael Gillespie @ malwarehunterteam for the feature request)

"""

from __future__ import print_function
import os
import struct
import marshal
import zlib
import sys
import imp
import types
from uuid import uuid4 as uniquename


class CTOCEntry:
    def __init__(self, position, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name):
        self.position = position
        self.cmprsdDataSize = cmprsdDataSize
        self.uncmprsdDataSize = uncmprsdDataSize
        self.cmprsFlag = cmprsFlag
        self.typeCmprsData = typeCmprsData
        self.name = name


class PyInstArchive:
    PYINST20_COOKIE_SIZE = 24  # For pyinstaller 2.0
    PYINST21_COOKIE_SIZE = 24 + 64  # For pyinstaller 2.1+
    MAGIC = b'MEI\014\013\012\013\016'  # Magic number which identifies pyinstaller

    def __init__(self, path):
        self.filePath = path

    def open(self):
        try:
            self.fPtr = open(self.filePath, 'rb')
            self.fileSize = os.stat(self.filePath).st_size
        except:
            print('[*] Error: Could not open {0}'.format(self.filePath))
            return False
        return True

    def close(self):
        try:
            self.fPtr.close()
        except:
            pass

    def checkFile(self):
        print('[*] Processing {0}'.format(self.filePath))
        # Check if it is a 2.0 archive
        self.fPtr.seek(self.fileSize - self.PYINST20_COOKIE_SIZE, os.SEEK_SET)
        magicFromFile = self.fPtr.read(len(self.MAGIC))

        if magicFromFile == self.MAGIC:
            self.pyinstVer = 20  # pyinstaller 2.0
            print('[*] Pyinstaller version: 2.0')
            return True

        # Check for pyinstaller 2.1+ before bailing out
        self.fPtr.seek(self.fileSize - self.PYINST21_COOKIE_SIZE, os.SEEK_SET)
        magicFromFile = self.fPtr.read(len(self.MAGIC))

        if magicFromFile == self.MAGIC:
            print('[*] Pyinstaller version: 2.1+')
            self.pyinstVer = 21  # pyinstaller 2.1+
            return True

        print('[*] Error : Unsupported pyinstaller version or not a pyinstaller archive')
        return False

    def getCArchiveInfo(self):
        try:
            if self.pyinstVer == 20:
                self.fPtr.seek(self.fileSize - self.PYINST20_COOKIE_SIZE, os.SEEK_SET)

                # Read CArchive cookie
                (magic, lengthofPackage, toc, tocLen, self.pyver) = \
                    struct.unpack('!8siiii', self.fPtr.read(self.PYINST20_COOKIE_SIZE))

            elif self.pyinstVer == 21:
                self.fPtr.seek(self.fileSize - self.PYINST21_COOKIE_SIZE, os.SEEK_SET)

                # Read CArchive cookie
                (magic, lengthofPackage, toc, tocLen, self.pyver, pylibname) = \
                    struct.unpack('!8siiii64s', self.fPtr.read(self.PYINST21_COOKIE_SIZE))

        except:
            print('[*] Error : The file is not a pyinstaller archive')
            return False

        print('[*] Python version: {0}'.format(self.pyver))

        # Overlay is the data appended at the end of the PE
        self.overlaySize = lengthofPackage
        self.overlayPos = self.fileSize - self.overlaySize
        self.tableOfContentsPos = self.overlayPos + toc
        self.tableOfContentsSize = tocLen

        print('[*] Length of package: {0} bytes'.format(self.overlaySize))
        return True

    def parseTOC(self):
        # Go to the table of contents
        self.fPtr.seek(self.tableOfContentsPos, os.SEEK_SET)

        self.tocList = []
        parsedLen = 0

        # Parse table of contents
        while parsedLen < self.tableOfContentsSize:
            (entrySize,) = struct.unpack('!i', self.fPtr.read(4))
            nameLen = struct.calcsize('!iiiiBc')

            (entryPos, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name) = \
                struct.unpack( \
                    '!iiiBc{0}s'.format(entrySize - nameLen), \
                    self.fPtr.read(entrySize - 4))

            name = name.decode('utf-8').rstrip('\0')
            if len(name) == 0:
                name = str(uniquename())
                print('[!] Warning: Found an unamed file in CArchive. Using random name {0}'.format(name))

            self.tocList.append( \
                CTOCEntry( \
                    self.overlayPos + entryPos, \
                    cmprsdDataSize, \
                    uncmprsdDataSize, \
                    cmprsFlag, \
                    typeCmprsData, \
                    name \
                    ))

            parsedLen += entrySize
        print('[*] Found {0} files in CArchive'.format(len(self.tocList)))

    def extractFiles(self):
        print('[*] Beginning extraction...please standby')
        extractionDir = os.path.join(os.getcwd(), os.path.basename(self.filePath) + '_extracted')

        if not os.path.exists(extractionDir):
            os.mkdir(extractionDir)

        os.chdir(extractionDir)

        for entry in self.tocList:
            basePath = os.path.dirname(entry.name)
            if basePath != '':
                # Check if path exists, create if not
                if not os.path.exists(basePath):
                    os.makedirs(basePath)

            self.fPtr.seek(entry.position, os.SEEK_SET)
            data = self.fPtr.read(entry.cmprsdDataSize)

            if entry.cmprsFlag == 1:
                data = zlib.decompress(data)
                # Malware may tamper with the uncompressed size
                # Comment out the assertion in such a case
                assert len(data) == entry.uncmprsdDataSize  # Sanity Check

            with open(entry.name, 'wb') as f:
                f.write(data)

            if entry.typeCmprsData == b's':
                print('[+] Possible entry point: {0}'.format(entry.name))

            elif entry.typeCmprsData == b'z' or entry.typeCmprsData == b'Z':
                self._extractPyz(entry.name)

    def _extractPyz(self, name):
        dirName = name + '_extracted'
        # Create a directory for the contents of the pyz
        if not os.path.exists(dirName):
            os.mkdir(dirName)

        with open(name, 'rb') as f:
            pyzMagic = f.read(4)
            assert pyzMagic == b'PYZ\0'  # Sanity Check

            pycHeader = f.read(4)  # Python magic value

            if imp.get_magic() != pycHeader:
                print(
                    '[!] Warning: The script is running in a different python version than the one used to build the executable')
                print(
                    '    Run this script in Python{0} to prevent extraction errors(if any) during unmarshalling'.format(
                        self.pyver))

            (tocPosition,) = struct.unpack('!i', f.read(4))
            f.seek(tocPosition, os.SEEK_SET)

            try:
                toc = marshal.load(f)
            except:
                print('[!] Unmarshalling FAILED. Cannot extract {0}. Extracting remaining files.'.format(name))
                return

            print('[*] Found {0} files in PYZ archive'.format(len(toc)))

            # From pyinstaller 3.1+ toc is a list of tuples
            if type(toc) == list:
                toc = dict(toc)

            for key in toc.keys():
                (ispkg, pos, length) = toc[key]
                f.seek(pos, os.SEEK_SET)

                fileName = key
                try:
                    # for Python > 3.3 some keys are bytes object some are str object
                    fileName = key.decode('utf-8')
                except:
                    pass

                # Make sure destination directory exists, ensuring we keep inside dirName
                destName = os.path.join(dirName, fileName.replace("..", "__"))
                destDirName = os.path.dirname(destName)
                if not os.path.exists(destDirName):
                    os.makedirs(destDirName)

                try:
                    data = f.read(length)
                    data = zlib.decompress(data)
                except:
                    print('[!] Error: Failed to decompress {0}, probably encrypted. Extracting as is.'.format(fileName))
                    open(destName + '.pyc.encrypted', 'wb').write(data)
                    continue

                with open(destName + '.pyc', 'wb') as pycFile:
                    pycFile.write(pycHeader)  # Write pyc magic
                    pycFile.write(b'\0' * 4)  # Write timestamp
                    if self.pyver >= 33:
                        pycFile.write(b'\0' * 4)  # Size parameter added in Python 3.3
                    pycFile.write(data)


def main():
    if len(sys.argv) < 2:
        print('[*] Usage: pyinstxtractor.py <filename>')

    else:
        arch = PyInstArchive(sys.argv[1])
        if arch.open():
            if arch.checkFile():
                if arch.getCArchiveInfo():
                    arch.parseTOC()
                    arch.extractFiles()
                    arch.close()
                    print('[*] Successfully extracted pyinstaller archive: {0}'.format(sys.argv[1]))
                    print('')
                    print('You can now use a python decompiler on the pyc files within the extracted directory')
                    return

            arch.close()


if __name__ == '__main__':
    main()