Four Ways to Reverse Python Malware
The Rise of Python Malware in 2025
In the past year, we have observed an increasing trend of Python-compiled malware spread in LLM-generated phishing campaigns. Those phishing messages most often contain URL shorteners or cloud storage links, such as Dropbox, that lead to the download of malicious ZIP archives. While Python’s reputation for readability and code simplicity might suggest straightforward reverse engineering at first glance, the reality is more nuanced. Malware authors often rely on obfuscation, bytecode-only distribution, and custom decryption implementations, making static analysis challenging or even too impractical, especially as newer Python bytecodes are not particularly well supported by the majority of available tools.
In this blog post, I’ll walk through real-world examples of reversing Python based malware, showcasing tools like uncompyle6, pycdc, and also an LLM based decompiler. When traditional approaches fail, I’ll also demonstrate dynamic analysis techniques for executing .pyc payloads in a controlled environment to extract relevant strings and indicators.
Distribution
The majority of Python compiled malware I’ve analyzed recently is currently being distributed through LLM generated phishing campaigns that abuse legal-related themes such as:
- “A copyright infringement has been detected.”
- “Authentic evidence through investigative methods”
- “Evidence and documents in the file”
- “Verification factors in the investigation”
- “Beweise in der Ermittlungsakte”
- “Full Beweismaterial Urheberrechtsverletzung”
- “Schlussfolgerungen aus der Untersuchung”
- “Pruebas y documentos del expediente de investigación”
- “Rapport sur le contenu et les informations signalées concernant la fanpage”
- “Materiale di prova relativo all’utilizzo non autorizzato di opere musicali”
- “Se ha detectado una infracción de derechos de autor”
- “Rapporto di verifica sull’uso improprio di contenuti protetti da diritto d’autore”
- “Rapport de vérification d’action d’utilisation illégale de contenu sous droit d’auteur”
- “Oznámení o zneužití duševního vlastnictví”
- “Prohlášení o porušení autorských práv”
- “Dopis zastupnika autora u vezi sa neovlašćenim korišćenjem muzičkog materijala”
These phishing emails are often sent in the name of a public institution, law firm, or other legal entity to increase their credibility, in almost all cases from unrelated domains.
Uncompyle6
The first tool on the list is Uncompyle6, an open-source python decompiler that supports versions up to Python 3.8. While the majority of Python-based malware families I analyze today are compiled for newer versions, primarily Python 3.10, older versions are still spreading in the wild. In a recent incident I analyzed a very simple infostealing sample compiled with PyInstaller for Python 3.7. Although Pycdc remains my preferred tool for Python bytecode decompilation, it’s generally always wise to keep a broader set of tools at hand, hopefully only as a backup. The following malware sample particularly proves this point as the Pycdc failed at decompilation, but Uncompyle6 handled its work instead and recovered the malware infostealing code flawlessly.
As already explained, these malware samples are primarily distributed through phishing emails. In this case, the phishing message led to download of a malicious ZIP archive. The original downloaded archive contained a PE32 Windows executable compiled with PyInstaller. Tools like DetectItEasy and Exeinfo PE identify the compiler used. In addition, PyInstaller strings are also present in the binary. For the extraction of its embedded components, we can use the tool Pyinstxtractor with Python version 3.7.
In order to select the right tool for the analysis, it’s first important to identify the python version PYC bytecode file is compiled for. Usually, tools recognize this version from the magic header present at the beginning of the file. In this example, I used the tool ExeInfo PE, which identified Python version 3.7 and suggested uncompyle6 can be used.
Uncompyle6 handles the task and recovers malicious Python code.
Although the primary goal of this blog post is to showcase various tools for analyzing Python compiled malware, I will include a brief analysis of the malicious payload for the sake of completeness. The sample is a simple browser infostealer, designed to extract saved passwords, cookies, browser history, and other user data, which it exfiltrates to a remote FTP server. Decompiled payload code is very lightly obfuscated with a simple character concatenation technique to construct the FTP server’s hostname along with authentication credentials.
Hashes of the files analyzed:
- Original ZIP file downloaded from phishing email: SHA-256: 8267818c68972f72f335f983cc2de1d1302931517703fdc0b6dccd038190e371
- PYC malicious payload contained in the archive (analyzed above) SHA-256: 37b445b9ae8d2d29328428aa9d7ad140ad41d05ec78a136bffc7b17b8ebf423b
Pycdc
For the second example, I chose a widely distributed Telegram infostealer malware, which I observed to be primarily spread through “Copyright Infringement” or other legally themed phishing notices. These phishing emails typically contain a shortened URL or a link to a cloud storage service, which leads to the download of a large ZIP archive, disguised as a PDF document.
The large downloaded file contains a folder named “_” (underscore) which contains three files:
- “Evidence Report.docx.cmd” – Windows script used to initiate extraction and execution.
- “Images.png” – A renamed WinRAR binary used for extraction of the next stage script.
- “Document.pdf” - Not a real PDF file, but a fake PEM certificate-like payload containing base64 encoded string.
The malware abuses a LOLBIN (Living Off The Land Binary) technique by using certutil to decode Document.pdf. Despite its .pdf extension, the file contains a Base64 encoded data blob wrapped in PEM-style headers. The malware executes the following command to decode it:
certutil -decode Document.pdf Invoice.pdf
Result is saved as “Invoice.pdf”, a ZIP archive that malware script extracts into the “C:\Users\Public” folder. The contents of this folder is Python 3.10.11 runtime directory with the “pythonw.exe” renamed into “svchost.exe”. Inside the Lib folder there are legitimate Python files, but also a images.png file, which in reality is a malicious python script – the second stage python decryptor.
Images.png is the Second Stage python payload. Its code is responsible for decrypting, loading and executing the next stage python compiled payload. Decryption consists of:
- Base85 decoding of binary data (base64.a85decode)
- BZ2 decompression (bz2.decompress)
- Zlib decompression (zlib.decompress)
The explained process decrypts a Python marshaled object, containing the malicious python compiled bytecode. It deserializes the decrypted bytecode into a Python code object. Finally, it executes the resulting Python code with exec() call.
We can execute the same decryption steps in a stand alone script and retrieve the decrypted third stage payload – this time as Python compiled bytecode.
Without any additional header manipulations, Pycdc complains about bad header (“Bad MAGIC!”). Generally, a simple workaround to use Pycdc is to patch the PYC header with the following command:
{ printf '\x6f\x0d\x0d\x0a\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'; cat stage2.pyc; } > stage2fixed.pyc
For this sample, I deleted the first 12 bytes of the original PYC file and then inserted the new header using the command mentioned above. Both versions of the PYC file, original and patched, are shown in the screenshot:
Reconstructed code is responsible for decrypting the shellcode, which is injected into the target process Regasm.exe and PE payload, which in this case is stand alone AsyncRAT loader.
Files analyzed:
- Original ZIP archive downloaded from phishing message SHA256: 5a3df84dbb2f080071b5a90b78fc016f02ebe02634fee5c5fbee32ff5d200b56
- Additionally downloded python script SHA256: 56a3caa59e9f76d8536499983c08cce37e186499d2aa9409021f80f0837edad6
Running .pyc file in a sandbox
Sometimes, Python malware cannot be reliably decompiled due to obfuscation or unsupported bytecode. In such cases, we can execute the entire .pyc file in a controlled environment and then manually call specific functions with desired arguments. In this example, a four-function chain is used to decrypt strings.
壾歈甘栮跳徯(佒倨硑轙嘈皻(熟棢冶皵揱鯢(監鋇裫籚啨貐, [data...]))
In many cases, dynamic execution is more efficient than full static analysis, especially when we only need to extract specific values like URLs or other IOCs. By grepping out argument lists and feeding them into the decryption chain, we quickly uncover encrypted lists that are used as arguments for string decryption functions.
Then we can prepare the environment by executing the whole malicious pyc file and after its execution manually call the decryption functions one more time to inspect their returned values - decrypted strings. For the dynamic string decryption, we can run the following script:
pyc_globals = {"__builtins__": __builtins__}
with open(PYC_PATH, "rb") as f:
f.read(16) #skip .pyc header
top_code = marshal.load(f) #load marshaled python bytecode
try:
exec(top_code, pyc_globals) #EXECUTE MALWARE BYTECODE
except Exception as e:
print(f"[!] pyc exec raised: {e} (normal if parts fail)")
for idx, arg_list in enumerate([arg_list_1, arg_list_2, arg_list_3, arg_list_4], 1):
#print(f"\n========== Decoding arg_list_{idx} ==========")
try:
# extract four global functions for string decryption
f1 = pyc_globals['熟棢冶皵揱鯢']
f2 = pyc_globals['佒倨硑轙嘈皻']
f3 = pyc_globals['壾歈甘栮跳徯']
f4 = pyc_globals['監鋇裫籚啨貐']
except KeyError as e:
print(f"[!] Missing function: {e}")
continue
try:
#print(f"[1] 監鋇裫籚啨貐: {f4} ({type(f4).__name__})")
#string decryption in the same series of calls as in malware
r1 = f1(f4, arg_list)
r1_evaluated = list(r1)
r2 = f2(r1_evaluated)
r3 = f3(r2) #decrypted string for each list
print(f"[4] 壾歈甘栮跳徯 result: {r3!r}")
except Exception as e:
print(f"[!] Error in decoding chain: {e}")
In this case, the presence of domains like t.me and urlvanish.com strongly suggests the script functions as a downloader for a next stage payload.
Telegram Bot is passed in as an argument in the parent process and is not directly present in Python compiled payload.
Files analyzed:
- Original ZIP archive downloaded from phishing email: 7687d428fa52985b442e6cf9b6c3f050cb5aa3bb9e615f6b364bf91e60dc3b03
- Malicious DLL contained in the ZIP SHA256: 33cc4793f18098c448641c7de90ae0bd962389c6a3700e0efcbe39034e744f02
- Stage three PYC payload SHA256: dd0f84b95b7bb2c4e803d5e6be836be9e0557fd2202878ee6dfa30afd0b6fab5
PyLingual.io
PyLingual is an LLM based decompiler with which I achieved mixed results. It successfully decompiled certain Python samples that traditional tools like pycdc and uncompyle6 failed to handle. However, it’s important to note that PyLingual is a publicly hosted service, so avoid using it on sensitive or private malware samples to prevent unintended disclosure. In this case, deploying a local instance of the application is recommended.
Conclusion
In this blog post, I presented three distinct Python-based infostealing samples I analyzed over the past months. As demonstrated, the choice of decompilation tool heavily depends on the Python version payload was compiled for, with many tools struggling to support newer versions like Python 3.9+. While I generally recommend Pycdc for bytecode decompilation, relying on a single tool is rarely sufficient. As shown, alternative tools like uncompyle6 or even LLM-based solutions can prove invaluable in specific cases. Maintaining a diverse toolset in a malware analysis environment remains essential for time efficient reversing, especially during the incident.
“Give me six hours to chop down a tree and I will spend the first four sharpening the axe.” – Abraham Lincoln