arxiv scraping for scientific preprints, metadata, abstracts, and pdfs.

Extract arXiv's vast preprint database for research trend analysis, topic modeling, citation studies, and academic intelligence across physics, math, and computer science.

research repositoryeasy difficultydaily data

What is arxiv.org and what can you scrape

arXiv is an open-access repository hosting electronic preprints and postprints primarily in physics, mathematics, computer science, and related fields. Researchers submit papers for free public access before peer-reviewed publication. Users can search, browse by category, and download PDFs. It serves scientists, academics, and students worldwide for discovering cutting-edge research.

Domainarxiv.org
Industryresearch repository
Difficultyeasy
Data Volumemillions of papers
Freshnessdaily
Anti-Bot Measures
static HTMLpublic OAI-PMH

arXiv data fields we extract

Structured data categories available from arxiv.org. Fields are configurable to match your schema.

Paper metadata

titleauthorsabstractsubjectsdoi

Submission info

idversionsubmitted dateupdated dateannounced date

Files

pdf urlsource urllog url

Other

journal refcommentsendorsement

arXiv scraping use cases

Scope is tailored to your target pages and business requirements.

Research Discovery

Academics browse and download preprints to stay current in physics and CS

Trend Analysis

Data scientists track publication trends by category and keywords

Corpus Building

ML engineers collect abstracts and texts for NLP training datasets

Citation Tracking

Bibliometricians analyze author networks and impact metrics

Literature Review

Students aggregate papers for thesis surveys and surveys

Alternatives to arXiv scraping

We may also cover these related targets — click to view scraping pages where available.

biorxiv.orgmedrxiv.orgssrn.comResearchGatesemanticscholar.orgzenodo.org

Start your arXiv scraping project

Send us a quick inquiry with your target pages, fields, and delivery requirements.

Request arXiv sample dataset

Enter your email and we will get in touch with a arXiv Scraping Service sample dataset within one business day.