When working in a Lab, a number of data formats are frequently used. This list is by no means complete, but provides an overview of possible formats.
Images
Text
Analysed Layout and Text Object (ALTO) is an XML format describing recognised text and layout of an image. It is often used in collaboration with METS (see below).
Hypertext Optical Character Recognition (hOCR) is an XML format describing recognised text and its location on an image used by open source OCR engines such as Tesseract.
Text Encoding Initiative (TEI) is an XML format used to encode text in detail. It is often used for digital editions.
Data
Comma Separated Values (CSV) is a format used to represent a tabular data in comma separated values.
JavaScript Object Notation (JSON) is a format used to transmit data in a human-readable manner.
eXtensible Markup Language (XML) is a markup language much like HTML.
Structural metadata
Moving Picture Experts (MPEG21) is an XML format which describes the structure of a digital object. It is often combined with the Digital Item Declaration (DIDL) to describe the structure.
Metadata Encoding and Transmission Standard (METS) is an XML format which describes the structure of a digital object. It is often used in collaboration with ALTO (see above).
Bibliographic metadata
Functional Requirements for Bibliographic Records (FRBR) is a conceptual model developed by the International Federation of Library Associations and Institutions (IFLA) which is focused on user tasks of retrieval and access in online library catalogues from a user-centred perspective.
Bibframe was initiated by the Library of Congress in order to replace MARC standards and to adopt the linked data principles.
Resource Description and Access (RDA) is a package of data elements, guidelines, and instructions for creating library and cultural heritage resource metadata that are well-formed according to international models for user-focused linked data applications.
Bibliographic Ontology (BIBO) provides main concepts and properties for describing citations and bibliographic references (i.e. quotes, books, articles, etc.) on the Semantic Web.
Museum metadata
Lightweight Information Describing Objects (LIDO) is an XML harvesting schema which supports a full range of descriptive information about museum objects.
Archival metadata
Encoding Archival Description (EAD) is an XML standard for encoding archival finding aids.
Cultural Heritage Metadata
Europeana Data Model (EDM) is the formal specification of the classes and properties that could be used in Europeana, the EU digital platform for cultural heritage.
CIDOC Conceptual Reference Model(CRM) provides definitions and a formal structure for describing the implicit and explicit concepts and relationships used in cultural heritage documentation.