Giulia Paggiola 10/09/2025 Giulia Paggiola 10/09/2025

LLM for Quality tasks

A short story on using AI for a QARA task and coming up with a framework for doing it faster (4h down to 1h) while keeping it under control.

Task at hand:
Client received its inspection report from the authority via the post in the national language and needed it digitalised and in English in order to action it.

1️⃣ Convert scanned pdf to electronic document
ChatGPT 👎 didn’t identify text in the scanned pdf.
Gemini and NotebookLM did it, but I was unconvinced by the accuracy 🧐 .
GoogleDrive did the job, uploaded the pdf and "Open as GoogleDoc". ✅

2️⃣ Translate electronic document
ChatGPT and Gemini kept hallucinating badly 😵‍💫 .
The "Translate document" function of GoogleDocs returned a poor literal translation 🥴 .
NotebookLM was accurate but skipped content 😥 .
Ended up doing section by section via Gemini's in-text "AI Refine" function with a very meticulous prompt and checking it manually in a side-by-side table 🥵 .

3️⃣ Format electronic document similar to the original
ChatGPT and NotebookLM didn’t work 🤕 .
Gemini could do some basic improvements via the in-text "AI Refine" function, but not via the GoogleDocs built-in "Ask Gemini" nor via the browser chat. Interesting how much these differ in capability.
In the end, the formatting fix was mostly manual 🤯 .

Conclusion:
After 4 miserable hours spent on the task with many failed attempts and much too manual input, I achieved a satisfactory document.
But, I still wanted to get to the bottom of this. There must be a better way??

So I restarted from scratch using a different approach, which I could summarise in a way that is inspired by the concept of the PDCA / Agile cycle we use in Quality:

⤵️ Plan: Ask AI for the right tools and prompts to achieve your goal. And importantly, "ask AI to ask you" questions or point out what is unclear in order to help you refine your requirements accurately.
▶️ Do: Approach it step by step. Run your refined prompt for your SUBtask in your selected tool. Quick review of the output, refine the prompt. Change tool if needed.
⏯️ Check: Get AI to verify its results and to help you check it manually by highlighting any discrepancies. For example, “juxtapose the original and translated content in a table section by section and note any discrepancies between the two version of the text”.
🔁 Act: Tell AI to correct the discrepancies, then re-run the verification step to update results.

Eventually, by doing it this way, I could achieve the same result in 1h and with increased confidence on the accuracy. Still not extremely fast, but considerably faster!

I am curious, how would others have approached this dull task?