Sun, March 09, 2025

The challenge with current researcher agents

Draft / mood-post-written-in-10-minutes

I've spent the last few days using OpenAI's Deep Research agent with ChatGPT 4.5. I'm quite impressed with the results it produces; and definitely reduces what would have previously been a few hours of browsing - to a 10-minute coffee break whilst I wait for the Researcher Agent to do its thing.

At the same time - it exacerbates the following challenges of doing research on the Web:

The most advertised information is that which features in the result; but that doesn't necessarily mean it is the best or most factually correct information
Unrelated information creeps in under the guise of an answer to the query. A good example of where I saw this is in this discussion of Verifiable Credentials where the model included a lengthy discussion of self sovereign identity and public key infrastructure such as KERI. Whilst this often infrastructure often appears in solution architectures with Verifiable Credentials - it was certainly not appropriate to have an entry for KERI in a table comparing different Verifiable Credential Standards.

I expect (hope) we will see a number of iterative improvements over the next view months to try and:

where applicable, prioritize academic / authorotative sources; such as those found on Google Scholar and standards documents
filter out "derived sources"; for instance blog posts and news articles that are just summarizing an article or publication that the model can access
filter out sources generated by GenAI

As I briefly touched upon in this post; my view is that we can go much further than this by curating a Web of precise and trusted data - where possible using formal semantics, such as in a (RDF) Knowledge Graph. This seems sound given that:

Graphrag is still increasing popularity for grounding models; including to ground Gemini's answers in google queries, and
LLMs are getting better at supporting text to structured data conversions

Beyond soundness - a couple of reasons this is necessary include:

We can be clearer about the semantic commitments we're making, and
We can search for data rather than pages