This page is referenced by:
3.1 Summary of Digital Outputs and Digital TechnologiesThere will be four digital outputs for the project. Two are Open Source contributions to the academic community as public infrastructure. The third output will be a series of digital prototype productions of a transmedia publication to demonstrate the research contribution to the field of Open Access and digital publishing. The fourth outcome will be a prototype workflow for the generation of custom OER (in cooperation with the OER team of the Afro-European Mokolo initiative).
3.1.2 Enhanced Transmedia Citation and Reference ManagementIn such citation and reference management, if text is understood in computational terms as a linear string similar to a point in the linear position of the timeline of a video or the trace of a game thread, then these points can be identified, cited, referenced and used in the context of a publication. The design of such a management system is complex, as the fixed properties of print media are no longer the only parameter and the publication becomes more of a recipe for combining a revision point in a variety of distributed media. These features would be explored in a usable Beta prototype software implementation as part of the rapid prototyping research process.
3.1.3 Platform Independent Publication (PIP) TypeThis is a software implementation to test the the Platform Independent Publication type, using existing open document standards. Through the project’s research, co-creation and design research methods, these standards would be implemented in to facilitate their use by academic practitioners.
3.1.4 Transmedia Publications: Hybrid Publishing Consortium PrototypesA series of hybrid publications would be released as digital publications, making use of the existing HPC material and context. Some of the digital assets would manifest as hybrid digital print objects via print-on-demand digital printing.
- Document Validation—writing and authoring
- Document Editing—text, citations, metadata and images
- Media Editing
- Layout Design—typesetting and templates; semi-automatic and automatic
- Publication Collections—libraries, bookshops, academic and OER repositories
- Transmedia Publishing API
In this research project we focus on three of these stages: validation, asset management, and a transmedia API.
220.127.116.11 Interactive ValidatorA validator system ensures that the publication workflow structures documents and publication files according to standards and markup requirements that suit single source publishing workflows. As a result of a successful validation process, documents are machine readable and can be used in a variety of semi-automated publishing workflow processes. The validator has to cover different types of structuring that can be customized, including: typesetting (such as headers, bold, superscript, citation styles, line spacing after paragraphs, etc.); semantic document or publication structuring (picture credits, inline quotes, footnotes, pagination, chapters, and section, etc.); media inclusions (images, audio, video, equations, game sequence, etc.); document and publication file type and publication-ready outputs (PRO)-specific settings (image color profiles, print settings, job description files (JDFs), image resolutions, media queries, custom user output format design or content settings); multi-format document settings (i.e. instructions for conversion of a single source document into a specific format that involves information loss, such as a plain text file); a limited set of descriptive metadata (revision history, author, title, date, etc.).
The validator is also required to be interactive with a GUI and to work in an API environment. The interactive feedback is needed because a purely automated validation and structuring process would not be able to make the correct decisions and the files would end up with incorrect formatting. Instead, with an interactive system for user feedback, the user can intervene and make decisions when prompted to do so. Example cases where user feedback is needed divides into two categories. Firstly, workflow processing errors, for example invisible hyperlinks sitting in a document, which can be flagged up for removal in the way that a word processor’s ‘Navigator’ function allows a user to get an overview of a document. Secondly, common user errors, including not correctly marking up unordered lists, or using empty paragraph breaks rather than paragraph styles to make space between paragraphs.
18.104.22.168 Examples of Interactive Structuring SystemsThe type of interactive validator we will be developing will be for inline interaction within a web based word processing authoring environment. Here is an example from the commercial product for structured writing called ‘EasyDITA’: http://easydita.com/explore/author/. Other types of interactive validation systems produce a report, for the user to respond to, by editing their document and then resubmitting for further validation. Examples are the International Digital Publishing Forum (IDPF) EPUB Validator (http://validator.idpf.org/) and W3C HTML Markup Validation Service (http://validator.w3.org/).
Important features of such systems include differentiating between a document and publication, where different structural and metadata requirements are needed. The validation markup and rule set options are: 1. markup set updatable and customizable; 2. heuristics rule sets; 3. centralized or shared markup profiles. Users interact with the validator via the following: 1. GUI; 2. API; 3. command line.
Requirements, Specification and Standards for the Validator. A sample of categories for ‘validation’ and ‘single source’ file recipes includes:
- document input types (DOCX, HTML5, ODT, GDoc (subset of ODT));
- publication output types, or what we have termed publication-ready outputs (PROs) that are made up of a combination file type specifications, metadata requirements for the distribution channel, and ‘style guides’ for editors and designers in creating specific publication components for multi-format publishing. Such components include tables of contents, front cover texts, and back cover texts. A PRO profile is needed per output format as one format will not automatically translate to another format, e.g. a print book to EPUB;
- definition of a single source file, which acts as a container for multiple sources of data, for example different image sizes from an external source for responsive web design, or second external citation, bibliographic and metadata for publishing purposes;
- information for multi-format outputting, design and templating, per output format user tweaks;
- types of media assets that could be included in a document or publication;
- types of structure and semantic information: document layout e.g., H1, bold etc;
- document structure e.g., pagination, chapter etc;
- and metadata fields and standards for a document.
22.214.171.124 Validator Schematic
An important aspect of the work already conducted by the partners on automated transformation of digital collections is the ability to create and tune metadata exports to multiple XML standards representations, including both MODS and VRA Core4. In addition, a number of workflows already exist to parse asset structures and automate the creation of MongoDB data structures. Together, these sub-components will provide the project with a mature environment for distributed teams of contributors, allowing them to build fully-searchable digital collections of publications' components and create Library of Congress-compliant metadata. The resulting structure can remain as BSON in MongoDB or be exported into other ecosystems such as Tamboti (which uses the native XML eXistdb). Additionally, MongoDB's multiple application programming language bindings enable it to be interfaced rapidly with other components such as Zotero, and allow structured APIs to be implemented for additional project-specific elements of the technology stack.
These tools will be used to build sub-components of the technology stack with the following functionality:
- contributor GUIs controlling decomposition of source documents into individual elements and their integration into a managed collection which can be searched effectively;
- creation of element metadata (using MODS, VRA Core, etc where possible) and editing elements in line, together with version control;
- delivery of versioned publication elements to processing modules;
- output template management, editing and version control;
- interfaces with citation management systems, e.g. Zotero;
- interfaces with other content distribution networks.
- Validation rule set. An external document editing system will be able to have our rule set applied to its documents to raise appropriate flags when a questionable content section is encountered (and to have these flags passed to other systems, and, if appropriate, allow changes and edits to be made to the remote document). The result would be that remote documents could be structured for multi-format conversion.
- Assets and asset structuring meta description framework. The effect here is that remote systems could store their content on our system, then make use of that content in different ways to, for example, extract all citations or images and captions from an archive and then extract and use these components on a granular level.
- Templates for authoring and using templates. Templates could be added to the system for use in making multi-format publications using the transformer engine.
- Transformer. A multi-format conversion engine. After a set of content has been approved by the validator, it could then be sent to the transformer with instructions to use a specific template and be outputted to a number of formats (EPUB, responsive HTML, HTML Book-in-browser, PDF, PDF for print-on-demand [PoD] etc).
The API is used in
- Validation (other systems can have their content structured via use of our API);
- Assets (currently this is for testing purposes, to evaluate how multi-format publications can be stored as a single source master file, which acts as a recipe for its various publication format outputs and can accept different sources to be combined to make these output documents, combining media from different sources. e.g., Pandora, Tamboti and Open Journal System (OJS) etc.);
- Templates (this requires connection to content distribution networks (CDN), so that designers can author templates in software and graphic design libraries they are familiar with, like Bootstrap. The research challenge here for the projects design team is in effect to create modified Bootstrap models for apps, mobile, EPUB etc. Examples would be the open licensed and open source frameworks BakerFramework or PugPig (http://pugpig.com/) and famo.us; The CDN model needs to work for template use, too, so that users in another system can check out a template we are creating);
- Transformer multi-format conversion (send files for multi-format conversion to our engine A-machine and receive back the output files).
3.5 Technical MethodologyAt the project’s core, we would use co-creation and design research methods to establish the software requirement to address the two central research concerns—transmedia referencing and citation, and document portability. Our research approach includes the following elements:
- Hybrid Publishing Consortium—for the project, the context of the existing HPC prototypes provides an experimental framework to keep our research horizon open to extra-academic innovation practices.
- Rapid Prototyping—a workable transmedia publishing framework will be established early on to experiment with the incorporation of a wide array of content types in publication prototypes.
- Discourse Analysis—focusing on how technology is shaped by the social, methods such as the discourse analysis approach of T-PINC - Technology, Power, Ideology, Normativity and Communication (Koubek, University of Bayreuth), will be employed to better understand the forces acting on technology development.
- Software Requirements Process—conventional software methods such as Sommerville’s software requirements process will also be employed, with three repeated phases—requirements gathering, specification and validation.
- Open Source—design methodologies of open code review will be used. All research publishing will be carried out as Open Access, under Creative Commons Attribution ShareAlike 3.0 license.
3.5.1 Comparative Software Tools AssessmentA comparative assessment of available approaches and open source tools, including:
- Data Futures—NoSQL repository migration and long term preservation system. Flexible and custom workflows. http://www.data-futures.org/
- A-machine—publication multi-format digital conversion, automatic design templating and distribution. Hybrid Publishing Consortium. http://consortium.io/ and http://a-machine.net/
- Scalar—multi-media authoring and publishing. http://scalar.usc.edu/
- Public Library—book scanning and digital librarianship of contemporary reading and books. With connections to Open Library, Archive.org project. https://openlibrary.org/ and https://www.memoryoftheworld.org/
- Amara, Participatory Culture Foundation—Crowdsourcing annotation. http://www.amara.org/
- Textus, OKF—Document annotation. http://textusproject.org/
- Zotero—citation management, Roy Rosenzweig Center for History and New Media, http://zotero.org/
3.5.2 Standards and FormatsFor documents OASIS, W3C and IDPF open formats are used. Primarily this will be HTML5 and EPUB3, with attention to emerging EDUPUB W3C and IDPF initiatives announced in a January 2014 W3C call with contributions from publishers Pearson and O’Reilly. http://www.idpf.org/epub/profiles/edu/. For citation and referencing purposes this would make use of a combination of Dublin Core (DC), Open Archive Initiative (OAI), and open Citation Style Language (CSL). Additionally, we would use a set of collections and bibliographic management meta description frameworks, Visual Resources Association (VRA) Core 4.0 and Metadata Object Description Schema (MODS) from the US Library of Congress and Text Encoding Initiative (TEI) for text.
3.5.3 Hardware and SoftwareSoftware used in the project will be MongoDB, Ruby, Transpect from le-tex, Scalar, A-machine framework, Linux, NodeJS, Linux, GitHub. Book scanner from the Public Library project http://www.ahrc.ac.uk/News-and-Events/Events/Pages/The-Public-Library-Project.aspx
3.5.4 Data Acquisition, Processing, Analysis and UseCore data for the project would come from the ‘Typemotion’ publication and exhibition. Other data would come from partners as well as from stakeholders. Data analysis would be carried out in consultation with industry stakeholders such as Westminster Data Futures project, HPC, Fiduswriter, and le-tex.
3.6 Technical Support and Relevant ExperienceCore technical lead comes from the Westminster Data Futures project, HPC, Sourcefabric, Fiduswriter, and le-tex.
3.7 Preservation, Sustainability and Use
3.7.1 Preserving DataData Futures specializes in long term data preservation. Data and code would be stored on the Data Futures’ distributed and secure long term digital preservation network.
3.7.2 Ensuring Continued Access and Use of Digital AssetsPublished material and software code would be published under open licenses. With published materials being in Green Open Access repositories covered by CLOCKSS (Controlled LOCKSS) support. Code would be published on the open repository GitHub and registered with the UK and EU Open Source research repositories. For example: http://sparceurope.org/resources-repositories/.