Document type · legal

Legal Word documents

179,730 documents across 67 languages and 9 topics. See /classification for what this label covers.

179,730documents
67languages
9topics

Documents classified as legal in nature. Examples include statutes and regulations, court filings, terms of service, privacy policies, employment agreements, NDAs, licensing terms, regulatory filings, and standard contracts.

Useful for: contract classification benchmarks, named-entity recognition on legal text, multilingual legal NLP, retrieval-augmented generation over legal corpora, OOXML parsing of complex tabular and styled legal documents.

TopicCount
Government 96,557
Legal / Judicial 35,496
Finance 14,165
Education 13,105
Healthcare 9,131
Environment 5,862
Nonprofit 2,058
General 1,945
Technology 1,411
LangCountShare
en 43,475 25.4%
lt 26,031 15.2%
ru 20,637 12.1%
sk 11,575 6.8%
es 11,071 6.5%
pl 8,131 4.8%
+ 61 more

Share is computed against the top 20 languages for this type (171,047 docs), matching what the API returns. A handful of documents fall outside the top 20 or have no detected language.

ID Filename Topic Lang Conf
424538a69217 1588.docx Government ru 0.99
726d97d50ab3 4030.docx Government ru 0.99
47904b59583d 1538.docx Government ru 0.99
b12b11baf0d4 AFM-Delegation-Loire_Convention-mise-a-dispo-01042023-31032035.docx Government fr 0.99
7b13f40eb89a AFM-Delegation-Loire_Convention-mise-a-dispo-35.docx Government fr 0.99

ID column shows the first 12 characters of the SHA-256 content hash; the full hash is the stable reference. Real public-web filenames vary widely: descriptive, numeric, or URL-fragment shaped.

# All legal documents
curl "https://api.docxcorp.us/manifest?type=legal" -o legal-manifest.txt

# High-confidence English subset
curl "https://api.docxcorp.us/manifest?type=legal&lang=en&min_confidence=0.8"

See /download for full access patterns.

All typesAll topics/classification