Pipelines
Transform raw data into structured intelligence through modular processing steps.
Overview
In Lens, a **Pipeline** is a series of transformations applied to the collection results. Think of them as middleware for research data. Each pipeline step receives the current set of results, performs an operation, and passes them to the next step.
Core Pipeline Steps
Deduplication
Removes identical or near-identical results retrieved from different sources (e.g., the same paper on arXiv and Crossref).
Clustering
Groups related results into conceptual clusters, allowing the synthesis engine to identify primary themes in the research.
Summarization
Utilizes advanced LLMs to generate concise summaries of individual items or the entire research cluster.
Custom Pipelines
You can build your own pipeline steps by implementing a simple Python function that accepts a list of ResearchItem objects.
Pipeline Performance
While collection is I/O bound, pipelines are often CPU or LLM-inference bound. For large datasets, consider using our built-in async pipeline processors.