Document Conversion Glossary: File Types and Terminology

Whether you're a student writing a paper or an educator sharing materials, you've probably run into strange file extensions or formatting mishaps. Documents don't always play nicely across different platforms, and a perfectly polished file on one device can look a bit chaotic on another. Understanding the most common file types and conversion terms can save time, frustration, and confusion, making sharing, reading, and preserving your work much easier.

Glossary of Document File Types

.doc: The classic Microsoft Word format (Word 97-2003), widely editable and used historically

.docx: The modern Word format (from 2007 onward); XML-based and supports advanced features

.epub: An open e-book format packaged in a ZIP archive with XML metadata, ideal for e-readers

.html/.htm: HyperText Markup Language files for Web pages; good for sharing educational content online

.mobi: Amazon's MOBI format for e-books; often contains digital rights management data and HTML-like structure

.odt: OpenDocument Text, an open-source alternative to .docx used in LibreOffice and OpenOffice

.pdf: Portable Document Format, a file format that preserves the document's layout and is universally accessible

.rtf: Rich Text Format, a cross-platform type of text document that supports basic formatting like bold, italics, and margins

.tif/.tiff: A high-quality image format often used for scanning documents

.txt: Plain text file without formatting; highly compatible across platforms

Key Terms in Document Conversion

Conversion: The process of changing a file from one format to another, such as converting Word to PDF

Data Conversion: A broader term for changing data from one format to another, often for interoperability or encoding purposes

Document Conversion: Transforming documents (e.g., scanning paper to PDF or converting a PDF to Word) for improved sharing or processing

File Format: The structure and encoding of data in a file, determining how it's stored and interpreted

Interoperability: The ability of software or systems to work together using different formats

Machine-Readable Glossary: A list of terms and definitions structured so a computer can easily understand and process the data. This format, using file types like .tbx or .utx, allows for the automated exchange of consistent terminology between different software systems.

Optical Character Recognition (OCR): The technology that converts images of text into machine-readable text

Tips for Document Conversion Success

  • Check the file format before converting. For editable text, use .docx or .odt; for read-only sharing, a PDF is ideal.
  • Match the file type to the audience's tools; students may require .docx files for editing, while instructors prefer PDFs for consistency.
  • Always consider interoperability: Make sure that the chosen format is compatible across devices and platforms.
  • Use plain text (.txt) for script-based tasks or situations where you don't need formatting and reliability matters.
  • When preserving layouts, images, or designs, favor PDF or TIFF files over formats that strip formatting.

Proper document conversion keeps content accessible, consistent, and professional. It ensures that what you send is exactly what others see, no matter where or how they open it. Especially in academic, legal, and business settings, that precision isn't just a convenience: It's the difference between a file that works and one that frustrates people.

Additional Resources