AI-Powered Recon: Unearthing PII from Non-English Files

Aayush kumar
2 min readJan 18, 2025

--

In the world of bug bounty hunting, thinking outside the box often leads to success. This write-up shares how I combined traditional reconnaissance techniques with artificial intelligence (AI) to discover and report sensitive information leaked through archived PDF files. The journey highlights the importance of leveraging modern tools to enhance efficiency and overcome barriers, such as language differences.

The Setup

During a reconnaissance phase, I focused on a target domain:
https://<target-domain>.

Step 1: Feeding the Domain to Wayback Machine

To maximize my chances of finding interesting content, I queried the domain on the Wayback Machine, a powerful tool for retrieving archived versions of websites. This allowed me to uncover a range of file types previously hosted but no longer available on the live site.
I systematically searched for file extensions such as:

  • .pdf
  • .csv
  • .xls

This process uncovered several archived PDFs stored under paths like:
https://web.archive.org/web/<timestamp>/<target-domain>/<path-to-files>.pdf.

I manually inspected these files for potential sensitive information.But I was not able to understand the language inside the PDF since it was different and I was also not sure that it contained any PII information.

Step 2: Leveraging AI for Analysis

The PDFs were written in a non-English language, making manual analysis time-consuming and prone to errors. To streamline the process, I turned to AI tools capable of multi-language comprehension.

AI-Powered Process

  1. Uploaded the archived PDFs into the AI tool.
  2. Prompted the AI to search for PII information.

The AI quickly identified the PII information and gave it to me.

And finally the bounty :)

Tools Used

Thanks for reading.

Follow me on X : https://x.com/bunny_0417
Follow me on LinkedIn: https://www.linkedin.com/in/bunny0417/

--

--

No responses yet