Commons:File naming
Files uploaded to Wikimedia Commons should have a proper name. The file name is the basis for the file's URL and use in other projects. This page explains in detail the considerations, or naming conventions, on which file names should be chosen. There is often more than one good name for an file. In general, Commons aims to provide stable filenames,[1] so as a matter of principle, even bad filenames will often be left alone. Similarly, files are rarely overwritten.[2] Consequently, new uploads should aim for the highest-quality filenames.
Purpose of file names[edit]
Names are used to uniquely identify the item involved. Names should be
- descriptive, chosen according to what the image displays or contents portray
- accurate, especially where scientific names, proper nouns, dates, etc. are used
- consistent, following a schema and providing a coherent set of information
Contributors categorizing files frequently have different demands from those who create, process, manage and upload them. Unless there is a compelling reason not to, the uploader's choice of name should be honored. This is a courtesy, not an absolute right, however. If a name is disruptive or inappropriate, a different name may be chosen.[3]
Naming conventions[edit]
Media files can be uploaded with names in any language in any script (coded as UTF-8) - see Commons:Language policy. The filename extension (e.g., .jpg) should match the file format (e.g., JPEG), and not be doubled (e.g., .jpg.jpg or .tiff.jpg). The filename should clearly describe what the file contains, but it should be reasonably concise. Whether a filename is perceived to be suitable often depends on the familiarity of the individual with the subject. It also occurs that items are known under different terms to different contributors or they believe entities are primarily well-known under the term they are used with to describe this entity, neglecting cultural differences, even within countries. Therefore, it is likely that no single name will completely satisfy all contributors. The following lists various characteristics of a good file name. These should be seen as goals, not as rules written in stone. Generally speaking, stable filenames are more important than perfect filenames, and missing a few of these goals is not justification for moving the file. But when in doubt, aim high.[4]
Descriptive[edit]
- Meaningful – Names should not consist entirely of auto-generated letters and numbers, such as "DSC123456.jpg". Commons uses the Filename prefix blacklist and the Titleblacklist to enforce this policy.
- Content-based – Prioritize describing the subject matter over indicating its origin. Do not use names that solely consist of dates, the name of the photographer or rights holder, terms like “Flickr”, “original”, “crop”, and/or catalogue numbers. For example, File:20110428 OH K1023900 0014.JPG - Flickr - NZ Defence Force.jpg does not describe its subject matter at all despite its length. However, it is acceptable to include such information within the name so long as the name remains reasonably brief and also contains a description of the subject matter. Remember that details like photographer name and source information can alternatively be included in the file's metadata or description. Situations where the subject is identified with a date, such as the book 1984, are exceptions to this criterion.
- Specific – For images of places, the name should describe a specific location with phrasing that aids in figuring out where the image was taken and what the image depicts, such as Colcester Zoo, 18 Rue Norvins, or Anime Expo 2022. The name should not consist primarily of a broad location, such as File:Paris 319.jpg, Ontario hill, or Japan train station, where the location is so large that only someone who knows the area very well can identify the image. Similarly, unless the file is an icon, clip art, or other illustration where a broad category is most descriptive of the subject, the name should not consist primarily of a generic or broad category such as a word like "smartphone", "screenshot", "queen" or "bird", but rather impart detailed information that would help someone identify the specific object depicted, such as "Nokia N8 Blue (Front)".
- Precise – The name should unambiguously identify the file's subject, and distinguish it from other similar subjects. For example, File:Michaeljackson.jpg should have included some information to distinguish itself from files in Category:Michael Jackson. For distinguishing places, refer to wikivoyage:Wikivoyage:Naming conventions#Disambiguation.
- Correct – The name should describe the file's content and convey what the subject is actually called. Inaccurate names for the file subject, although they may be common, should be avoided. The title given to a work of art by the artist that created it is considered appropriate, even if the name has nothing to do with what is depicted (for example, many works of Dadaism). The name should also be free of obvious errors, such as misspelled proper nouns, incorrect dates, and misidentified objects or organisms. Users are allowed to upload "unidentified" or "unknown" organisms but such files may be renamed upon identification.
Clear[edit]
- Concise – The name should be no longer than necessary. Generally a descriptive phrase is sufficient. There is a hard limit on filenames of 240 bytes long[5] Keeping the filename reasonably short reduces need for truncation when the file is downloaded, e.g., on CDs filenames are limited to 31, 64, 197 or 207 bytes or characters, depending on extensions used. Put the subject up front, so that when names are clipped it is still possible to figure out what it is about. Only the first 20 characters show up on category pages. For place names, the basic name of the place, without a whole bunch of localizing addenda, is the best, e.g. "Denver" instead of "Denver, Colorado" or "City of Denver".
- Spelled out – Abbreviations, acronyms, and a person's initials are often ambiguous and thus should be spelled out. Although such initialisms are related to the subject of the file, the meaning is not immediately clear to the reader. Spelling out a subject's full name also aids significantly in searching. For purposes of concision, it is allowed to use well-known acronyms and initialisms such as NATO, so long as other parts of the name provide sufficient information to identify the subject, or to use abbreviations for the image source.
- Recognizable – As many as possible should be able to understand the name, whether they are an expert, someone familiar with the subject area, or someone on the street.
- Intuitive – Names should anticipate what users are likely to type when looking or searching for the subject. Significant keywords not present in the name should be included in the description or metadata.
Practical considerations[edit]
- Unique – By the design of Commons's software, no two files can have the same name. To prevent collisions, it is encouraged to add strongly distinguishing information such as a source, date, or catalogue identifier, although it is not always necessary.
- Appropriate – Names should be neither vulgar (unless unavoidable) nor pedantic. Names apparently created for the purpose of vandalism, attack, or provocation, such as libelous, insulting, degrading, crude, or offensive descriptions, names containing inappropriate or non-public personal information, or names that are blatant advertising or self-promotion, will be removed immediately. For example, an image of a person with the name "File:1BIGGest_nOSE_everS33n.JPG" is unlikely to remain. Names that are or have been associated with nationalistic, religious or racist causes are allowed provided they are legal to host and otherwise fall within Commons scope, for example a filename like "File:Taiwanese Tiaoyutai islands map.png".
- Neutral – To the greatest extent possible, the name should reflect a neutral point of view, avoiding judgmental, opinionated, promotional, controversial, or biased words or claims. Refer to the Neutral point of view and No original research guidelines of Wikipedia. Commons's neutrality policy is different from Wikipedia's and allows non-neutral filenames in some cases, but this does not mean that such filenames are encouraged.[6]
- Language preserving – Follow the conventions of the source(s) appropriate to the subject and avoid translation or romanization unless these are present in the source(s). If a subject has strong ties to a particular language, the name should use that language. Otherwise, all languages are acceptable in file names; Commons does not prefer one in particular. There is no obligation to use a specific language for new uploads, even if other files in a subject area use that language. Any relevant spelling variation may be used.
Tie-breaking criteria[edit]
- Common – When considering different names for a subject, names that are more commonly used (as determined by prevalence in reliable sources) are preferred. Search engines, international organizations, media outlets, encyclopedias, databases, scientific bodies, and scientific journals may be consulted to identify the most common name(s). For places, the name should be the most commonly-used name in the local language. When it comes to organisms and biological subjects, the scientific (Latin) name is recommended.
- Recent – Names used in newer sources are preferred to names used in older sources.
- Consistent – The name should be consistent with the pattern of similar files' names. Many naming conventions exist, and sometimes conflict with other criteria; there is currently no standard set of conventions. The costs and benefits should be weighed when considering any specific convention.
- For batch uploads, use a consistent filename template based on what data is available. Sample templates include
{title} ({source})
,{title} - {source} {id}
, and{brief_description}, {year}
. - Files that form parts of a whole (such as scans from the same book or large images that are divided into smaller portions due to Commons’ upload size restriction) should follow the same naming convention so that they appear together, in order, in categories and lists.
- Certain complex templates (such as those that use BSicons or that display football kits) assume that the images used in them will follow a specific naming convention. Wikisource also uses a specific naming convention for the source files they transcribe.
- If targeting a category, the category name and the date of creation may be used (West facade of Notre-Dame de Paris, 2009-05-26.jpg for a file intended for Category:West facade of Notre-Dame de Paris).[7]
- Consider what page the file might appear on in Wikipedia. Wikipedia has many detailed naming conventions, see the English Wikipedia's naming conventions.
- For batch uploads, use a consistent filename template based on what data is available. Sample templates include
Language-specific guidelines[edit]
These guidelines apply to names in English. Speakers of other languages may define guidelines for their language in the relevant translations.
- The preferred name style is sentence case (downstyle) with initial capitalization and without ending punctuation, e.g. "Smoky sunset in Taiwan", as it is more readable and contains the most information. Other conventions such as title case may make sense in the context of batch uploads, as it is difficult to infer sentence case from all caps.
- Articles (a, an, the) can often be removed without changing the meaning.
- Names are not full sentences, but small bits of information. In most cases, the proper length is between two and twelve words. One-word names are almost always too ambiguous, and should be avoided. If the name is 20 words it is probably too long, and if it is 30 or more, it is almost definitely too long. English filenames will usually use 1 byte per character (some symbols may fall outside the ASCII character set), allowing 240 characters or approximately 50 words as the hard maximum. For non-ASCII characters, 240 bytes may be much less than 240 characters, as these can take up to 4 bytes per character.[8]
- Names should not go out-of-date. Avoid words and phrases like "current", "incumbent", "expected", "recently", "soon", or "next year", preferring more precise language such as "in 1969" or "fifth president".
- Avoid abusing Unicode - control characters can be omitted, strange punctuation can be replaced with standard quotes and commas, and symbols such as "♥" are often more natural when spelled out ("heart"), also increasing visibility in search. Furthermore some characters do not render correctly at all in certain operating systems and browsers. It is a good idea to stick to letters, numbers, underscore (space), ASCII hyphen/minus/dash, plus, and period (dot), as these do not have any MediaWiki restrictions. Letters with diacritics and accents are acceptable, but so is omitting diacritics and accents (e.g. "Calderón"/"Calderon", "Erdoğan"/"Erdogan").
References[edit]
- ↑ as there might be external file clients, and file moving involves significant human and computing resources
- ↑ Commons:Overwriting existing files (a guideline), Template:Dont overwrite
- ↑ Commons:File renaming
- ↑ Commons:Requests for comment/File renaming criterion 2, Commons:Blocking policy, Commons:Project scope#Examples, Commons:Revision deletion
- ↑ The limit was 255 bytes until late 2011 – see Phabricator: T32202. Existing filenames may be up to 255 bytes, but new uploads are restricted to 240. A filename over 240 bytes may break horribly when uploading a new version (As the date is prefixed to the filename for old versions of the file which takes 15 bytes).
- ↑ Commons:Project scope/Neutral point of view
- ↑ Commons:First steps/Uploading files#4. Set an appropriate file name
- ↑ Phabricator: T32202