arXiv Scraping Service

arxiv scraping for scientific preprints, metadata, abstracts, and pdfs.

Extract arXiv's vast preprint database for research trend analysis, topic modeling, citation studies, and academic intelligence across physics, math, and computer science.

research repositoryeasy difficultydaily data

About arXiv

What is arxiv.org and what can you scrape

arXiv is an open-access repository hosting electronic preprints and postprints primarily in physics, mathematics, computer science, and related fields. Researchers submit papers for free public access before peer-reviewed publication. Users can search, browse by category, and download PDFs. It serves scientists, academics, and students worldwide for discovering cutting-edge research.

Domainarxiv.org

Industryresearch repository

Difficultyeasy

Data Volumemillions of papers

Freshnessdaily

Anti-Bot Measures

static HTMLpublic OAI-PMH

Extractable Data

arXiv data fields we extract

Structured data categories available from arxiv.org. Fields are configurable to match your schema.

Paper metadata

titleauthorsabstractsubjectsdoi

Submission info

idversionsubmitted dateupdated dateannounced date

Files

pdf urlsource urllog url

Other

journal refcommentsendorsement

Use Cases

arXiv scraping use cases

Scope is tailored to your target pages and business requirements.

Research Discovery

Academics browse and download preprints to stay current in physics and CS

Trend Analysis

Data scientists track publication trends by category and keywords

Corpus Building

ML engineers collect abstracts and texts for NLP training datasets

Citation Tracking

Bibliometricians analyze author networks and impact metrics

Literature Review

Students aggregate papers for thesis surveys and surveys

Related Targets

Alternatives to arXiv scraping

We may also cover these related targets — click to view scraping pages where available.

biorxiv.orgmedrxiv.orgssrn.comResearchGatesemanticscholar.orgzenodo.org

Engagement

Start your arXiv scraping project

Send us a quick inquiry with your target pages, fields, and delivery requirements.