How it Works
Y2 Recursive Research: How It Works
The Research Lifecycle
Understanding how Y2 conducts recursive research requires following a single report from initiation to delivery. This document walks through each phase of the process.
Phase 1: Initialization & Configuration
When a report is scheduled or manually triggered, Y2 first determines how to research the topic:
Profile Analysis
The system loads the InfoOps profile and examines:
Recursion Configuration: Is recursive research enabled? What's the maximum depth?
Freshness Configuration: Should sources be validated for recency?
Search Configuration: Topic-specific domains, time ranges, search depth
Model Configuration: Which LLM to use (default: GLM-4.5)
Topic Classification
Y2 automatically classifies topics to optimize search strategy:
Crypto/Finance Topics
Maximum source age: 24 hours (prices change constantly)
Preferred domains: CoinMarketCap, CoinDesk, Bloomberg
Search focus: Current prices, market sentiment, technical analysis
Security Topics
Maximum source age: 30 days (threats evolve quickly)
Preferred domains: CISA, BleepingComputer, TheHackerNews
Search focus: Vulnerabilities, incidents, threat actors
Stock Analysis Topics
Maximum source age: 7 days (earnings cycles)
Preferred domains: Yahoo Finance, Reuters, MarketWatch
Search focus: Earnings, analyst ratings, corporate news
General Topics
Maximum source age: 90 days
No domain restrictions
Balanced breadth of sources
Execution Strategy Selection
Y2 chooses how to execute the research:
Hybrid Strategy (Default)
Parallel execution at shallow layers (0-1)
Sequential at deep layers (2+)
Best balance of speed and resource efficiency
Breadth-First Strategy
All layers execute in parallel
Fastest completion time (3x faster than sequential)
Higher memory usage
Depth-First Strategy
Each branch completes before starting next
Lowest memory footprint
Slower but more controlled
Phase 2: Layer 0 - Reconnaissance
The first layer always executes a broad scan to understand the topic landscape.
Query Construction
The system builds an initial query using:
The profile's topic
Topic-specific search hints
Current date context
Custom user prompts (if provided)
Example transformation:
Search Execution
Layer 0 uses the most generous search parameters:
10 sources (highest count)
Advanced search depth (includes raw content)
3 chunks per source (full context)
Recent time range (topic-dependent)
What Layer 0 Discovers
The results provide:
Key themes: What are the main discussion points?
Recent developments: What changed recently?
Knowledge gaps: What needs deeper investigation?
Conflicting information: Where do sources disagree?
Example Layer 0 Results
Phase 3: Subtopic Extraction
After Layer 0 completes, GLM-4.5 analyzes the findings to generate subtopics for Layer 1.
The Extraction Process
Y2 sends the Layer 0 results to GLM-4.5 with specific instructions:
Criteria for Good Subtopics:
Specific and actionable (not overly broad)
Related to the main topic
Warrant deeper investigation
Suitable for focused search queries
Number of Subtopics:
Depth 1: 3 subtopics (optimal for parallel execution)
Depth 2: 2-3 subtopics per Layer 1 branch
Depth 3+: 2 subtopics (narrow focus)
Example Extraction
Why This Works
Traditional systems would pre-define subtopics before searching. Y2 discovers what to investigate based on what it finds. This is the core innovation:
If Layer 0 reveals nothing about API security, that subtopic won't be generated
If Layer 0 uncovers an unexpected threat, it becomes a subtopic
The research adapts dynamically to the actual information landscape
Phase 4: Layer 1 - Deep Dive
With subtopics identified, Y2 launches parallel research into each one.
Parallel Execution Flow
All subtopics execute simultaneously:
Layer 1 Search Parameters
More focused than Layer 0:
5 sources per subtopic (total: 15 sources)
Advanced search depth
5 chunks per source (deeper extraction)
Contextual query includes Layer 0 findings
Example Layer 1 Execution
Time and Cost Efficiency
Layer 1 completes in ~60 seconds despite 15 searches because:
Searches execute in parallel (3 simultaneous)
GLM-4.5 has fast inference (~10s per subtopic analysis)
Tavily API responds in 3-5 seconds per search
Phase 5: Layer 2 - Verification (Optional)
If maximum depth is 2 or higher, Y2 enters the verification layer.
Purpose of Layer 2
This layer focuses on:
Cross-referencing: Do multiple sources confirm the same facts?
Recency checking: Are there newer sources that contradict earlier findings?
Fact verification: Can we validate specific claims?
Gap filling: Are there still unanswered questions?
Layer 2 Search Parameters
Most conservative approach:
3 sources per verification query
Advanced search depth
Week time range (recent updates only)
Verification-focused queries
Example Layer 2 Execution
Phase 6: Source Freshness Validation
After all layers complete, Y2 validates source quality.
Age-Based Scoring
Each source receives a freshness score using exponential decay:
Formula:
Interpretation:
1.0 = Published today (perfect freshness)
0.7 = Published at 25% of max age (good)
0.3 = Published at 50% of max age (acceptable)
0.05 = Published at 100% of max age (stale)
Content-Based Staleness Detection
Y2 scans source content for warning signs:
Year references: "2020", "2021", "2022" (outdated)
Temporal phrases: "last year", "previous quarter"
Status indicators: "archived", "deprecated", "superseded"
Aggregate Metrics
The system calculates overall freshness:
Phase 7: Link Health Checking
Y2 validates that all sources are accessible.
Asynchronous Validation
This happens after report delivery to avoid blocking:
Report is generated and sent to subscribers
Background task validates all links in parallel
Results stored in report metadata
Dead links flagged for review
Validation Process
For each source URL:
Send HTTP HEAD request (doesn't download content)
Wait up to 5 seconds for response
Check status code (200-299 = accessible)
If dead, query Web Archive for snapshot
Web Archive Fallback
When a link is dead, Y2 automatically:
Queries archive.org Wayback Machine API
Retrieves most recent snapshot URL
Stores as alternative source
Flags original as "dead with archive available"
Example Link Validation
Phase 8: Aggregation & Synthesis
At the root level (Layer 0), Y2 aggregates all findings.
Source Collection
The system queries each sub-report and collects sources:
Final Report Generation
Y2 generates the final report using:
All unique sources from all layers
Synthesized findings from each subtopic
Original topic and user prompts
BLUF (Bottom Line Up Front) structure
Report Structure
Phase 9: Metadata Enrichment
Y2 stores comprehensive metadata about the research process.
Recursion Metadata
Freshness Metadata
Link Validation Metadata
Cost Metadata
Phase 10: Multi-Channel Delivery
Finally, Y2 delivers the report to subscribers.
Delivery Methods
Full HTML report with CSS styling
Hyperlinked references
Optimized for mobile and desktop
Unsubscribe link and preferences
SMS
140-character summary (smsSummary field)
Link to full web version
Critical insights only
Cost-effective for alerts
Webhook
JSON payload with full metadata
HMAC signature for verification
Idempotency key for deduplication
Retry logic with exponential backoff
Webhook Payload Example
Performance Characteristics
Time Breakdown (Depth 1)
Search Efficiency
The Recursive Advantage
Y2's recursive research system achieves superior results by:
Discovery-Driven Research: Finds what to investigate based on actual findings, not assumptions
Multi-Layer Depth: Broad reconnaissance + focused deep dives + verification
Quality Validation: Freshness scoring + link checking + cross-referencing
Parallel Efficiency: 3x faster than sequential while maintaining quality
Cost Optimization: GLM-4.5 delivers 90% of GPT-4 quality at 10% of cost
Comprehensive Metadata: Full audit trail of research process and costs
This isn't just searching more—it's researching smarter, modeled after how expert human analysts conduct intelligence gathering.
Last updated
