How it Works

Y2 Recursive Research: How It Works

The Research Lifecycle

Understanding how Y2 conducts recursive research requires following a single report from initiation to delivery. This document walks through each phase of the process.

Phase 1: Initialization & Configuration

When a report is scheduled or manually triggered, Y2 first determines how to research the topic:

Profile Analysis

The system loads the InfoOps profile and examines:

  • Recursion Configuration: Is recursive research enabled? What's the maximum depth?

  • Freshness Configuration: Should sources be validated for recency?

  • Search Configuration: Topic-specific domains, time ranges, search depth

  • Model Configuration: Which LLM to use (default: GLM-4.5)

Topic Classification

Y2 automatically classifies topics to optimize search strategy:

Crypto/Finance Topics

  • Maximum source age: 24 hours (prices change constantly)

  • Preferred domains: CoinMarketCap, CoinDesk, Bloomberg

  • Search focus: Current prices, market sentiment, technical analysis

Security Topics

  • Maximum source age: 30 days (threats evolve quickly)

  • Preferred domains: CISA, BleepingComputer, TheHackerNews

  • Search focus: Vulnerabilities, incidents, threat actors

Stock Analysis Topics

  • Maximum source age: 7 days (earnings cycles)

  • Preferred domains: Yahoo Finance, Reuters, MarketWatch

  • Search focus: Earnings, analyst ratings, corporate news

General Topics

  • Maximum source age: 90 days

  • No domain restrictions

  • Balanced breadth of sources

Execution Strategy Selection

Y2 chooses how to execute the research:

Hybrid Strategy (Default)

  • Parallel execution at shallow layers (0-1)

  • Sequential at deep layers (2+)

  • Best balance of speed and resource efficiency

Breadth-First Strategy

  • All layers execute in parallel

  • Fastest completion time (3x faster than sequential)

  • Higher memory usage

Depth-First Strategy

  • Each branch completes before starting next

  • Lowest memory footprint

  • Slower but more controlled

Phase 2: Layer 0 - Reconnaissance

The first layer always executes a broad scan to understand the topic landscape.

Query Construction

The system builds an initial query using:

  • The profile's topic

  • Topic-specific search hints

  • Current date context

  • Custom user prompts (if provided)

Example transformation:

Search Execution

Layer 0 uses the most generous search parameters:

  • 10 sources (highest count)

  • Advanced search depth (includes raw content)

  • 3 chunks per source (full context)

  • Recent time range (topic-dependent)

What Layer 0 Discovers

The results provide:

  • Key themes: What are the main discussion points?

  • Recent developments: What changed recently?

  • Knowledge gaps: What needs deeper investigation?

  • Conflicting information: Where do sources disagree?

Example Layer 0 Results

Phase 3: Subtopic Extraction

After Layer 0 completes, GLM-4.5 analyzes the findings to generate subtopics for Layer 1.

The Extraction Process

Y2 sends the Layer 0 results to GLM-4.5 with specific instructions:

Criteria for Good Subtopics:

  1. Specific and actionable (not overly broad)

  2. Related to the main topic

  3. Warrant deeper investigation

  4. Suitable for focused search queries

Number of Subtopics:

  • Depth 1: 3 subtopics (optimal for parallel execution)

  • Depth 2: 2-3 subtopics per Layer 1 branch

  • Depth 3+: 2 subtopics (narrow focus)

Example Extraction

Why This Works

Traditional systems would pre-define subtopics before searching. Y2 discovers what to investigate based on what it finds. This is the core innovation:

  • If Layer 0 reveals nothing about API security, that subtopic won't be generated

  • If Layer 0 uncovers an unexpected threat, it becomes a subtopic

  • The research adapts dynamically to the actual information landscape

Phase 4: Layer 1 - Deep Dive

With subtopics identified, Y2 launches parallel research into each one.

Parallel Execution Flow

All subtopics execute simultaneously:

Layer 1 Search Parameters

More focused than Layer 0:

  • 5 sources per subtopic (total: 15 sources)

  • Advanced search depth

  • 5 chunks per source (deeper extraction)

  • Contextual query includes Layer 0 findings

Example Layer 1 Execution

Time and Cost Efficiency

Layer 1 completes in ~60 seconds despite 15 searches because:

  • Searches execute in parallel (3 simultaneous)

  • GLM-4.5 has fast inference (~10s per subtopic analysis)

  • Tavily API responds in 3-5 seconds per search

Phase 5: Layer 2 - Verification (Optional)

If maximum depth is 2 or higher, Y2 enters the verification layer.

Purpose of Layer 2

This layer focuses on:

  • Cross-referencing: Do multiple sources confirm the same facts?

  • Recency checking: Are there newer sources that contradict earlier findings?

  • Fact verification: Can we validate specific claims?

  • Gap filling: Are there still unanswered questions?

Layer 2 Search Parameters

Most conservative approach:

  • 3 sources per verification query

  • Advanced search depth

  • Week time range (recent updates only)

  • Verification-focused queries

Example Layer 2 Execution

Phase 6: Source Freshness Validation

After all layers complete, Y2 validates source quality.

Age-Based Scoring

Each source receives a freshness score using exponential decay:

Formula:

Interpretation:

  • 1.0 = Published today (perfect freshness)

  • 0.7 = Published at 25% of max age (good)

  • 0.3 = Published at 50% of max age (acceptable)

  • 0.05 = Published at 100% of max age (stale)

Content-Based Staleness Detection

Y2 scans source content for warning signs:

  • Year references: "2020", "2021", "2022" (outdated)

  • Temporal phrases: "last year", "previous quarter"

  • Status indicators: "archived", "deprecated", "superseded"

Aggregate Metrics

The system calculates overall freshness:

Y2 validates that all sources are accessible.

Asynchronous Validation

This happens after report delivery to avoid blocking:

  1. Report is generated and sent to subscribers

  2. Background task validates all links in parallel

  3. Results stored in report metadata

  4. Dead links flagged for review

Validation Process

For each source URL:

  1. Send HTTP HEAD request (doesn't download content)

  2. Wait up to 5 seconds for response

  3. Check status code (200-299 = accessible)

  4. If dead, query Web Archive for snapshot

Web Archive Fallback

When a link is dead, Y2 automatically:

  1. Queries archive.org Wayback Machine API

  2. Retrieves most recent snapshot URL

  3. Stores as alternative source

  4. Flags original as "dead with archive available"

Phase 8: Aggregation & Synthesis

At the root level (Layer 0), Y2 aggregates all findings.

Source Collection

The system queries each sub-report and collects sources:

Final Report Generation

Y2 generates the final report using:

  • All unique sources from all layers

  • Synthesized findings from each subtopic

  • Original topic and user prompts

  • BLUF (Bottom Line Up Front) structure

Report Structure

Phase 9: Metadata Enrichment

Y2 stores comprehensive metadata about the research process.

Recursion Metadata

Freshness Metadata

Cost Metadata

Phase 10: Multi-Channel Delivery

Finally, Y2 delivers the report to subscribers.

Delivery Methods

Email

  • Full HTML report with CSS styling

  • Hyperlinked references

  • Optimized for mobile and desktop

  • Unsubscribe link and preferences

SMS

  • 140-character summary (smsSummary field)

  • Link to full web version

  • Critical insights only

  • Cost-effective for alerts

Webhook

  • JSON payload with full metadata

  • HMAC signature for verification

  • Idempotency key for deduplication

  • Retry logic with exponential backoff

Webhook Payload Example

Performance Characteristics

Time Breakdown (Depth 1)

Search Efficiency

The Recursive Advantage

Y2's recursive research system achieves superior results by:

  1. Discovery-Driven Research: Finds what to investigate based on actual findings, not assumptions

  2. Multi-Layer Depth: Broad reconnaissance + focused deep dives + verification

  3. Quality Validation: Freshness scoring + link checking + cross-referencing

  4. Parallel Efficiency: 3x faster than sequential while maintaining quality

  5. Cost Optimization: GLM-4.5 delivers 90% of GPT-4 quality at 10% of cost

  6. Comprehensive Metadata: Full audit trail of research process and costs

This isn't just searching more—it's researching smarter, modeled after how expert human analysts conduct intelligence gathering.

Last updated