HUNGARY: How We Use Artificial Intelligence to Turn Messy Paperwork into Useful Information

11.12.2025

In our work at the State Audit Office of Hungary (SAO) we are constantly handed huge stacks of reports, PDFs, scanned letters and other documents that have no consistent format. Going through them by hand is slow, boring, and prone to mistakes.

A few months ago, we tried something different: we created a modern AI‑tool to read the documents for us. The first test was a “proof‑of‑concept” (a small pilot project) in which we fed the AI every audit report the SAO has ever produced. All of those reports are now publicly available on our  GitHub page as part of an open‑government initiative.

The Size of the Job

Since 1989 (the year the SAO was re‑established after the political transition) we have published about 3 500 audit reports. Put together, the text in those files is enormous—roughly 97 million words (in AI‑speak, that was ~153 590 000 “tokens”).

Most large‑language models (the type of AI we used) can only look at a few thousand words at a time. Think of it like a person who can only read some amount before needing a break. Some newer models can handle longer stretches—up to a million words—but even they start to stumble when the text gets too big.

How We Made It Work: One Document at a Time

Imagine a teacher who has to grade an entire class. The teacher doesn’t dump all the exams on the floor, glance at them, and announce “the average grade is 3.8”. Instead, the teacher reads each paper, marks it, and then calculates the class average from those individual scores.

We used the same idea with the AI:

  1. Take a single report (a PDF or scanned file).

  2. Ask the AI a clear, fixed question—e.g., “What were the main findings? What financial figures are mentioned?”

  3. Record the AI’s answer in a structured format.

  4. Move on to the next report and repeat the previous steps.

  5. Summarize the AI’s answers in a spreadsheet.

When every report has been processed, we have a tidy table that summarises the whole collection. From that table we can create charts, spot trends, or answer specific queries—without ever having to read each PDF manually.

In short, we turned a mountain of unstructured paperwork into a searchable, easy‑to‑understand dataset—much like turning a jumbled library into a well‑organized catalogue, all with the help of artificial intelligence.

How does the AI tool looks like for the auditor?

As you may already know, our SAI has an agile internal software development team, so we could quickly start implementing this.

First, in a predetermined directory on a shared drive (we call them “data folders”) auditors place the PDF-s, DOCX-es to process, and a prompt.txt file, in which they put the AI prompt (question, data to extract) to run on all documents.

Hungary pic1

User input files

If for example, we wanted to know how many of our IT audits were about what type of procurements, a simple prompt would be like this:

Hungary pic2

AI prompt example

They then request to start a processing. (Currently in email to IT, but soon automatically, by a designated JIRA ticket type.)

Step 1: OCR

When the run starts, first the OCR-ed text versions of the PDFs and DOCXs appear.

Hungary pic3

OCR-ed files

Step 2: Sending to the AI (LLM)

The AI question in prompt.txt is ran for each file separately, and the AI’s responses are saved in text files matching the original document’s filename.

Hungary pic4AI output files

Step 3: Find JSON in AI response

If the prompt was good, and a valid JSON structure is found in the AI response, then it gets extracted to a separate JSON file.

Hungary pic5JSON files from AI response

Each JSON response looks something like this individually:

Hungary pic6

Example output JSON

Step 4: Build Excel or DB table

Finally, a CSV and an XLSX appears, with each JSON structure in one line of the file. This is done with a very simple SQL in DuckDB, which is great at processing JSON into CSV or DB tables.

Hungary pic7Summarized in XLSX and CSV

Hungary pic8

Example output Excel

Of course, much more complex processings are possible (and happening), some of our prompts are more than 15.000 words and return complex JSONs, especially ones that ask for classification, and need to describe all options (sectors, competence centres, parliamentary committees, etc).

Advantages and disadvantages

The advantages of this approach are the following:

  • It empowers the auditors to be able to devise and execute AI data extractions on troves of audit-related documents autonomously.
  • Although this example was about our past audits, we use this system to process all kinds of unstructured documents.
  • We rely on the shared folders for access management and sharing the results, which is familiar, centrally controlled, and simplifies a lot.
  • We can utilize our in-house GPUs better, since we can schedule batch runs for confidential documents for night-time.
  • But the organization does not need to have its own in-house AI: one can utilize pay-as-you-go cloud LLM services. (We also do, for context-size reasons, or when we need models we can’t run internally, but only for public documents.)
  • Each step of the system is relatively simple (usually just ~100 lines of python or shell code), thus improvable/replaceable independently when needed.
  • There is no “RAG” (Retrieval Augmented Generation), which would include only parts of the documents in the context; instead, all parts of the documents are considered when the AI is giving an answer.
  • It is possible to re-start a batch, and only newly added files get processed, saving costs and time. But re-processing is still possible if auditors just delete the output files they want to be regenerated.
  • JSON is pretty expressive. Unlike CSV, it also allows for “two dimensional” structures (arrays and objects), and AI is good with that. So, for example extracting auditees: [ "Auditee 1", "Auditee 2", "..." ] would work well to extract multiple auditees to a list field, and then DuckDB can easily “unroll” such JSON fields (say, to separate CSVs or DB tables).

Some disadvantages:

  • Admittedly, the JSON part is a bit confusing at first. We initially wanted to use “table-like” or “CSV” output since it would have been easier for us humans. But LLMs are so heavily trained to work well with JSON-style answers, that it was easier to just accept and not fight this. (CSV outputs were full of “not-well-formatted CSV” errors: problems with quoting, extra commas, etc…) When people see examples from their peers, then their actual first results, everybody seems to be able to comprehend it so far.
  • People tend to be over-optimistic about the precision of their questions. Because of this, we usually first do smaller (20-150 document) runs with each new prompt, before allowing it to run fully, especially on pay-per-use LLMs.

Some lessons learned

  • For “yes/no” questions, it’s better to use 1 and 0 (as above) since it’s easier to do quick sums/averages/etc in Excel than for TRUE and FALSE.
  • Instead of “yes/no” questions, sometimes it’s better to ask for “percentage” answers, and one can even ask for explanation in a different field.
  • For more complex queries, it’s often useful to include a few “example” JSON answers for the AI instead of long and detailed explanations. It picks up on your intent pretty well that way.
  • Sometimes it’s surprising what goes well and what is not. Always verify.

We are still at the beginning of this journey like everyone else, and always happy to exchange experiences and learn from others. Our team has “More efficient government through working software” on our flag, so we are always happy to help other SAIs (or other government organizations) when we can. If you want to talk, just reach out to us!

The whole workflow

Hungary pic9

AI document processing