Deterministic Data Collection Using Probabilistic Methods
Previously in Automate Thoughtfully with Agents, the key takeaway was to “prioritize using agents as architects over using agents as executors” such that “an upper bound on financial cost” is set, and then employees and entrepreneurs can focus on optimizing for the gains of those investments. With this in mind, this blogpost will elaborate on cases where an agent may be well suited to act as an executor.
In this article, an agent is defined as a computer program which has the following properties:
- Is accessible by using a harness (e.g. Google Antigravity)
- Is powered by an AI model (e.g. Google Gemini 3.1 Pro)
- Is served by a software company (e.g. Alphabet’s Google)1, 2, 3
Agents may be well suited to act as an executor in data collection workflows that involve ambiguity due to, and not limited to, the following dimensions:
- Differences in languages
- Differences in naming conventions
- Differences in data structures
- Differences in data accuracy
- Differences in data usability
- Differences in data completeness
- Differences in data comprehensiveness
- Differences in data freshness
Before the “agents era”, data collection workflows which involved ambiguity involved:
- Understanding the problem to solve
- Planning the data structure to enable eventual analysis
- Designing the information storage, retrieval, extraction, and verification methods
- Formulating process guidelines
- Requesting people to apply the guidelines
By our nature, people do not behave consistently, and by extension it could be written that people behave probabilistically. As a result, before and after the “agents era”, data collection workflows have employed probabilistic methods to create deterministic datasets.