General Word documents
59,877 documents across 76 languages and 10 document types. See /classification for what this label covers.
Documents that did not fit cleanly into one of the other eight topics. Includes general-interest publications, multi-domain content, and organizational documents without a single dominant subject.
Useful for: general-purpose text classification, cross-sector retrieval, fall-back analysis when domain is unknown.
| Type | Count |
|---|---|
| Educational | 14,183 |
| Forms | 9,921 |
| Creative | 7,848 |
| Reference | 6,959 |
| Administrative | 6,037 |
| Correspondence | 5,772 |
| Policies | 3,057 |
| Technical | 2,695 |
| Legal | 1,945 |
| Reports | 1,460 |
| Lang | Count | Share |
|---|---|---|
| en | 18,692 | 33.4% |
| fr | 7,052 | 12.6% |
| cs | 4,415 | 7.9% |
| unknown | 4,155 | 7.4% |
| de | 3,242 | 5.8% |
| es | 3,041 | 5.4% |
| + 70 more | ||
Share is computed against the top 20 languages for this topic (55,908 docs), matching what the API returns. A handful of documents fall outside the top 20 or have no detected language.
| ID | Filename | Type | Lang | Conf |
|---|---|---|---|---|
| 58b3b96fd02e | Inscription-1.docx | Forms | fr | 0.98 |
| dca81eef8914 | Vereinsvorstand_mit_Text.docx | Administrative | de | 0.98 |
| 3e5bdee50017 | 49549 | Technical | en | 0.98 |
| f38771953325 | 52381 | Technical | en | 0.98 |
| 3df8dafff14c | 49485 | Technical | fr | 0.98 |
ID column shows the first 12 characters of the SHA-256 content hash; the full hash is the stable reference. Real public-web filenames vary widely: descriptive, numeric, or URL-fragment shaped.
# All general documents
curl "https://api.docxcorp.us/manifest?topic=general" -o general-manifest.txt
# High-confidence English subset
curl "https://api.docxcorp.us/manifest?topic=general&lang=en&min_confidence=0.8" See /download for full access patterns.