Data Flow
Understand the transformation layers that turn raw web data into structured intelligence.
The Information Pipeline
Information in Lens is never just "scraped." It moves through a series of high-integrity transformation layers to ensure accuracy and relevance.
Ingestion
Collectors fetch raw bytes from source APIs (JSON, XML, or binary PDFs). This data is immediately stored in the local cache if enabled.
Normalization
Raw data is mapped to the internal ResearchItem schema. Fields like "pub_date" or "repo_url" are standardized to ISO-8601 and absolute URLs.
Refinement
The ResearchGraph is analyzed for duplicates. Cross-source items representing the same entity (e.g., a paper on both arXiv and Crossref) are merged into a single node.
Synthesis
The synthesis engine traverses the DAG to identify key clusters and generates the final summary report with full citations.
Data Integrity
Lens uses a cryptographic hash of the content to detect identical items across collectors, even when metadata (like titles) slightly differs.
Architecture Complete
You've explored the entire technical foundation of the Lens engine.