Everything in life is a compromise.
Including with AI.
AI: long context windows (LCW) or retrieval augmented generation (RAG)?
These approaches can be used to tune the accuracy, latency and cost of each use case when asking questions.
Do you need high accuracy?
Long context window (naive or cached) could be the way forward, but it is expensive, and it takes the longest time to get an answer.
Do you want faster and less expensive answers?
You’d have to sacrifice accuracy, and I assume to still spend a fair bit to implement a RAG stack and pipeline
See further details in the table below.
Original Google’s paper link.