The challenge with current researcher agents
Draft / mood-post-written-in-10-minutes
I've spent the last few days using OpenAI's Deep Research agent with ChatGPT 4.5. I'm quite impressed with the results it produces; and definitely reduces what would have previously been a few hours of browsing - to a 10-minute coffee break whilst I wait for the Researcher Agent to do its thing.
At the same time - it exacerbates the following challenges of doing research on the Web:
- The most advertised information is that which features in the result; but that doesn't necessarily mean it is the best or most factually correct information
- Unrelated information creeps in under the guise of an answer to the query. A good example of where I saw this is in this discussion of Verifiable Credentials where the model included a lengthy discussion of self sovereign identity and public key infrastructure such as KERI. Whilst this often infrastructure often appears in solution architectures with Verifiable Credentials - it was certainly not appropriate to have an entry for KERI in a table comparing different Verifiable Credential Standards.
I expect (hope) we will see a number of iterative improvements over the next view months to try and:
- where applicable, prioritize academic / authorotative sources; such as those found on Google Scholar and standards documents
- filter out "derived sources"; for instance blog posts and news articles that are just summarizing an article or publication that the model can access
- filter out sources generated by GenAI
As I briefly touched upon in this post; my view is that we can go much further than this by curating a Web of precise and trusted data - where possible using formal semantics, such as in a (RDF) Knowledge Graph. This seems sound given that:
- Graphrag is still increasing popularity for grounding models; including to ground Gemini's answers in google queries, and
- LLMs are getting better at supporting text to structured data conversions
Beyond soundness - a couple of reasons this is necessary include:
- We can be clearer about the semantic commitments we're making, and
- We can search for data rather than pages