Menu Close

Integration of Markup and Automated Image Retrieval for Cataloging Comics Signs in Digital Collections

** Editor’s Note: Due to editorial formatting constraints, the below coding samples use double quotation marks. XML markup uses the straight quote (Unicode: alt+0022). 

By Jacob Murel 


Since the start of the twenty-first century, numerous archives and digital collections devoted to comics and their surrounding fan culture have arisen, ranging from consumer-oriented collection services (e.g., ComiXology) to academic archives developed and maintained by research groups (e.g., Comic Book Readership Archive). Over the same time, scholars of an array of disciplines have advanced various methods of cataloging and retrieving comics across such digital collections. These methods can be divided into two categories: text-based and image-based retrieval. The former is utilized in most digital archives or collections, such as the Digital Comic Museum and ComiXology, and generally involves assigning each comic bibliographic metadata—e.g., author(s), publication date, genre—by which the comics document is indexed and queried. Markup encoding, which may be understood as a subcategory of text-based indexing, is the implementation of XML markup languages for digitally representing the textual and bibliographical features of comics documents. John Walshs Comic Book Markup Language (CBML) takes this approach. CBML differs from keyword indexing in that the former records the textual or bibliographic content of a comics document, in addition to its surrounding metadata. That is, CBML can record which characters appear in each panel of each page, how many panels appear on each page, what kinds of transitions occur between panels, etc. Meanwhile, keyword indexing traditionally records document-wide metadata, which can specify author, genre, themes, subject matter, etc. But unlike text-based analysis, but image-based analysis utilizes computer vision to analyze the image itself. This category of analysis includes operations like content-based image retrieval (CBIR), an automated process for searching and retrieving images across databases, based on pictorial properties like colors or shapes. Image search engines like TinEye and Google’s Image Search are two popular examples of such automated image retrieval systems; both utilize computer vision to query images according to internal visual properties rather than surrounding, text-based metadata. Unfortunately, CBIR has seen little if any application application towards the analysis of comics outside of computer science. Despite the respective advantages of text-based and image-based retrieval methods, scholarship on the digital analysis and cataloguing of comics documents has not explored the potential for combining these two methods when assembling and organizing digital collections of comics.

There are, perhaps, many reasons for the lack of formal integration of text-based and image-based methods of digital comics analysis. One reason is that the cataloging of comics in digital collections—indeed, digital comics analysis as a whole—is not a distinct field of research in the way that, for example, comics narratology or fan culture studies are. While scholars have written on the topic of indexing comics in digital and non-digital collections (e.g., Aggleton; Weiner), Melissa Terras’ remarks on digital image processing scholarship equally apply to the digital analysis of comics: it is “not a coherent, standalone research area with an easy route to adoption,” but rather a subject in which “we are only starting to develop a theoretical framework to understand what it means to depend on processed, digital surrogates of primary sources for research evidence” (Terras 80). What scholarship does exist on the subject seems divided by disciplinary boundaries. Text-based indexing, whether focused on document metadata or markup language, often arises from research in the humanities and library science (e.g., Walsh) while CBIR development transpires exclusively within computer science (e.g., Augereau et al.; Rigaud and Burie). In response to this gap, this article explores the respective benefits and limitations of markup language and CBIR, and how these two methods may be integrated to index and query comics signs across large digital collections.

 This article argues that because markup language offers a form of critical, close reading, and CBIR provides an ability to track visual properties apart from meaning accurately and on a large scale, these modes of cataloging prove ideal supplements to one another in indexing, querying, and analyzing comics across digital collections. To this end, the present article takes up comics signs as a topical medium for combining markup and automated image retrieval. The first section provides an overview of scholarship on comics signs, focusing on the work of three notable comics scholars with diverging disciplinary backgrounds. The purpose of this overview is to demonstrate the ambiguous nature of comics signs and how they possess linguistic and pictorial properties that cannot be adequately captured by either markup or automated image analysis alone. The subsequent section discusses markups role as a form of critical, close reading and how one markup language specifically designed for encoding comics documents in digital collections—John Walshs Comic Book Markup Language—may be expanded so as to digitally record and represent comics signs in comics. The penultimate section examines how CBIR may compensate for the shortcomings inherent in utilizing a text-based method of analysis like markup to digitally index and analyze comics signs. Moreover, this section shows how markup may be used to attenuate CBIR’s own limitations. The ultimate goal of this article is not to provide a technically prescriptive method of such integration, but rather to ignite ongoing conversation regarding the roles of markup language and CBIR in analyzing comics.

Critical Approaches to Comics Signs

Throughout this article, the term comics signs refers to the array of partly motivated, partly conventionalized signs used to represent intangible phenomena, such as emotions or motion, within comics. Examples of comics signs include “heart eyes,” indicating love; floating stars above characters’ heads, representing dizziness; wavy lines above garbage, indicating smell; and so forth. Such signs have received many names throughout comics criticism. Scott McCloud refers to them both as comics’ “visual vocabulary” (Understanding 131) and later as “cartoon symbols” (Making 125). Prior to McCloud, the cartoonist Mort Walker produced a satirical dictionary of these cartoon symbols, referring to them as “indicia,” while  in more recent works, media theorist Charles Forceville attempts an earnest catalog of these signs, dubbing them “pictorial runes” (Forceville, “Pictorial” 875), and Bart Eerden further divides these signs into two categories: “pictorial runes” and “indexical signs” (Eerden 245). Cognitive scientist Neil Cohn classifies these symbols as the “closed-class items of [comics’] visual language lexicon,” settling on “morphemes” and “signs” as short-hand references to these items (Language 34). Meanwhile, borrowing terminology from Charles Pierce’s semiotics, comics critic Hannah Miodrag refers to these same signs as “metaphor icons” (Miodrag 177). Though all of these theorists draw from semiotics in theorizing comics’ sign system, the diversity of terms for referencing comics signs reflects each thinker’s distinct critical stance regarding these signs’ position on the text-image spectrum, or in the words of Paul Fisher Davies, on “a continuum of abstraction along which a given line or pattern may be placed by a critic, reader, creator or observer” (Davies 69). In overviewing three critical approaches to comics signs (those by Cohn, Miodrag, and Forceville), the present section demonstrates how comics signs occupy a contested and liminal space between text and image in comics scholarship, and so provide a unique opportunity to examine how text-based and image-based retrieval methods may be integrated for querying comics signs across digital collections, based on both the signs’ linguistic and visual properties.

As reflected in his heavily linguistic name for comics signs (“visual lexicon”), Cohn, more than the other scholars mentioned above, approaches these signs as linguistic units cognitively processed by readers in a manner similar to verbal and written language—indeed, he writes, “Visual languages use the same strategies of morphological combination as verbal languages” (Language 34). Cohn’s linguistic bent is most evident in his classification of signs according to morpheme categories used in linguistics research. For instance, he writes that, in verbal language, “morphemes can appear in front (prefix), at the end (suffix), inside (infix), or surrounding (circumfix) another,” and that they “can also substitute either a part or whole of another word (umlaut/suppletion)” (34). Cohn uses the same categories in describing the function of signs. He writes, “Just as ‘prefixes’ appear prior to a root morpheme, another class of bound morphemes in visual languages appears above the head of characters as ‘upfixes,’ most often to depict emotional or cognitive states” (42). Hearts and stars are examples of these, respectively indicating love and dizziness. Cohn demonstrates how, as with a prefix moved to the end of a word, and so made a suffix, the meaning of a visual morpheme can change depending on where it appears in relation to another sign. He writes, “While hearts retain the meaning of love or lust no matter where they are placed (due to their fixed symbolic meaning), stars mean different things based on whether they are in the eyes (desirous of fame) or above the head (feeling pain)” (45). He compares this alteration in meaning to “the difference between the words recital and recitation—both use the same stem (recite) but have different ways of turning it into a noun that yield slight variations in meaning” (49). In this method of analyzing and categorizing comics signs—and even simply his referring to them as “morphemes”—Cohn presents signs as essentially linguistic units, seemingly analogous to units of written or verbal language itself.

Cohn does warn against carrying his linguistic analogy too far however. At the beginning of his analysis, Cohn writes, “Despite these similarities [between verbal and visual morphemes], it is again important to emphasize that this analogy does not draw equivalence between these forms (i.e., affixes in verbal language should not necessarily be assumed to use the same cognitive processes as bound morphemes in verbal language)” (34). Yet at the end of his discussion, Cohn writes that “verbal, signed, and visual languages all tap into the same cognitive capacities to accomplish their expressions, but do so in ways unique to each modality” (48-9). A critical reading of these two passages might argue that Cohn contradicts himself in first claiming verbal and visual languages do not use the same cognitive processes, only to later claim they do. A more generous reading might argue that, in both statements, Cohn means verbal and visual languages are cognitively processed in the same general way, although they differ minutely due to variations in abstraction and modality. Nevertheless, Cohns theory of comics morphemes clearly approaches comics signs as primarily linguistic objects that are cognitively processed according to criteria generally similar to those of verbal and written language. Cohn critiques past attempts to develop dictionaries or catalogs of comics’ signs, whether satirical (e.g., Walker) or earnest (e.g., Forceville), not because he understands these icons (for him “morphemes”) as non-linguistic entities incapable of being so catalogued, but because such projects “require extensive corpus analyses across numerous comics within and between cultures,” and none so far have been attempted on this scale (34). Here, Cohn’s primary, and seemingly sole, objection to such dictionaries concerns the massive scale of analysis required for such a project, rather than any implicit “linguistic imperialism” (Mitchell 63) that seeks “to promote the visual to the (perceived) higher status of language,” which Hannah Miodrag may posit (Miodrag 171). Through this position and others, Cohn appears to emphasize the linguistic nature of comics signs.

By contrast, Miodrag critiques such decidedly linguistic approaches to comics. While she discusses Cohn’s research on the cognitive processing of sequential images elsewhere in her book Comics and Language, Cohn’s above scholarship addressing comics’ visual lexicon remains absent from Miodrag’s chapter on comics’ signs. Nevertheless, one imagines Cohn’s above approach may be what Miodrag has in mind when she writes, quoting psychologist and visual theorist Julian Hochberg, that linguistic ways of conceptualizing the visual are “sometimes used in a very misleading sense in drawing unjustified analogies between reading and pictorial perception” (Hochberg 66; qtd. in Miodrag 196). For Miodrag, one unjustified equivalence between comics icons and verbal or written language surfaces in descriptions of icons like “heart eyes” or “dizzy stars” as conventionalized signs. Cohn repeatedly emphasizes the conventionalized nature of these signs’ culturally agreed-upon associations, positing that they convey meaning by normative usage rather than through iconic or representational qualities, and are therefore similar to (albeit not entirely like) the abstract notional marks used in many written languages (Cohn, “Japanese” 188, 192; Language 24; “Lexicon” 47). By contrast, Miodrag writes, “[C]onventionalized though these signs may be, there is a degree of motivation at work in many” (Miodrag 172), in other words, “a certain rationale behind the relation of signifier and signified here, that simply does not exist for words” (173). Thus, she describes signs such as “heart eyes” or “dizzy stars” as visual icons. Drawing from a Peircean semiotics, Miodrag understands ionic signs to be motivated, meaning they are not arbitrary or wholly conventional, but created and chosen to represent some likeness or representative quality of a phenomenon. For example, she refers to the sharp, jagged lines often attached to a site of impact, representing both the reverberation of sound and the sharp jolt experienced at the site of a painful impact (173). Cohn speaks of these same visual signs as “symbolic,” suggesting that they often bear a “fixed symbolic meaning,” much like morphemes in written or verbal language (Cohn, Language 24, 45). He even goes so far as to contrast their symbolic nature to a Peircean iconicity, the latter of which Miodrag attributes to the same signs (“Lexicon” 50). Yet for Miodrag, the motivated nature of signs—their limited representative relationship with signified phenomena—differentiates them from the arbitrary signs of written language (Miodrag 174). Overemphasis on the conventionalized, as opposed to representational, nature of comics signs is one reason Miodrag considers heavily linguistic approaches to comics misleading, another reason being that such approaches ignore the flexible and contextual nature of comics signs.

Miodrag writes, “[I]n language, meanings are (relatively) fixed in advance, whereas in visual signification [meanings] are far more heavily dependent on context and open to deductive interpretation” (185). This sharply contrasts with Cohn’s designation of comics signs as closed-class lexical items. As he explains, “These types of signs are often more symbolic, and belong to a relatively closed class of visual lexical items. New signs of this type are much harder to invent” (Cohn, Language 24). Elsewhere, Cohn suggests the idea of a closed class not only implies a fixed set of signs, but also those signs’ fixity of meaning, as when he writes that

symbolic signs must be conventional. Indeed, altering symbolic signs would be far harder than altering iconic ones, since they draw their meaning from communally agreed upon conventions. As a result, symbolic signs are forced to be more entrenched, and thus fall into a closed class category of morphological items. (“Lexicon” 50)

Contra this fixity, Miodrag demonstrates how in his comic Asterios Polyp (2009), David Mazzucchelli repeatedly uses a group of small, wavy lines emanating off of people or objects to indicate a variety of phenomena, such as smell, sound, pain, heat, and anger (Miodrag 183). For Miodrag, the motivated nature of comics signs allows them to take on new meanings depending upon context; e.g., the wavy lines can resemble reverberating sound, throbbing pain, or floating steam depending on their visual and narrative context, and this reliance on context due to motivation differentiates comics signs from written language. She writes, “The forms themselves do not necessarily have a prior association in the way that words do and so are potentially far more flexible […] where language can be augmented or shaped by its context, but without ever escaping its preordained significance entirely” (183). In this way, Miodrag understands the motivated nature of comics signs as further differentiating them from written language by allowing for greater flexibility of meaning and reliance upon context.

Miodrag recognizes an objection to her claim regarding comics signs’ reliance on context, wrought by their motivatedness, when she writes, “It could be argued that [a given sign] is merely a vague conventional symbol—that is, we recognize the general signification of emanation, with the context filling in the specific details” (183). Cohn appears to describe such general signification when he writes of the morpheme’s limited flexibility of meaning. Although he understands comics signs as bearing conventionalized meanings, he claims they can produce new meanings through a limited number of sign combinations, wherein conventional symbols combine together or their positions in relation to one another shift, as in the upfix to eye-umlaut shift previously described (Cohn, Language 48). Even here, however, the number of possible combinations and meanings is fixed and pre-ordained by culture-specific conventions, much like the symbols’ respective, individual meanings. This is because, for Cohn, comics signs are entirely conventional symbols, and so can possess only the meanings (whether in combination or isolation) determined by normative usage. By contrast, for Miodrag, the motivated nature of comics signs and their reliance upon context for producing meaning means

there is no imperative to use comics’ conventionalized signifying practices. The lexicon of repeatable signs is available, but visual motivation allows artists to formulate new signifying complexes that can be readily decoded even if they are not familiar. Where motivation enables deductibility, conventionalized signs become optional. (Miodrag 183-4)

This directly contradicts Cohn’s claim that comics signs, being closed-class lexical items, are difficult to invent, and so rarely invented (Cohn, Language 24). Miodrag holds that because comics signs possess an iconicity of the phenomena they depict understood by their narrative and visual context, new signs can be readily produced. For Miodrag, comics signs, unlike written language, are in fact determined more by this context than by conventional usage. Of the aforementioned wavy lines used to depict smell or steam or sound, she writes that “as soon as this set of graphic markings is isolated from the rest of the picture, it not only ceases to mean precisely what it did in that specific context, but potentially ceases to signify anything at all” (Miodrag 184). This near-total reliance upon context not only allows for the ready invention of new signs, but further differentiates comics signs from written language for Miodrag, as words or morphemes mean what they do no matter their context (although there may be limited variations in meaning).

Despite the ostensible similarities  between Forceville’s approach and Miodrag’s conception of context-reliance and motivated-ness, Forceville’s attempt to develop a method for cataloging comics signs ultimately suggests a linguistic lens more akin to Cohn’s theory. In one essay, he declares that he aims to “make an inventory of all the runes in a single Tintin album,” his larger goal being the “development of a model for cataloging and analyzing pictorial runes” (“Pictorial” 876), a project he undertakes elsewhere with an Asterix album (“Visual” 75-82; see also Eerden). While noting their visual variations, Forceville attempts to divide the various emanata appearing within a single Tintin album into general categories such as “Spirals” or “Spiky Lines,” under which he lists the various uses of each sign type throughout the comic. For example, of the “Spiky Line” emanata, he lists their use in surrounding a character’s head to signify “generic emotion,” the use of the spiky lines attached to speakers to signify “something comes ‘out’ of the musical or sonic source,” and that of the spiky lines attached to objects to suggest “the sparkling of shiny things” (888). Such a cataloging attempt, what Cohn calls a “dictionary” of comics signs, suggests a linguistic bent in its attempt to break down the comics image into discrete visual units, much as Cohn does with more overtly linguistic terminology. Miodrag argues directly against such breaking-down of an image into discrete units. She writes, “Visual signification […] is radically heterogenous: functional differences within language boil down to distinction between letters, while in visual signification multiple variables (two dimensions, size, value, texture, color, orientation, shape) collaborate in constituting any pictorial sign,” and because of this “it is difficult to conceive of minimal visual units that comply with the linguistic model” (Miodrag 189). Moreover, for Miodrag, a given sign’s narrative and visual context further complicates any attempt to isolate it from the rest of the image as a distinct visual unit, for the sign means nothing outside of that context. The necessity of considering every image as a whole rather than a composition of discrete, visual units has been expressed by visual theorists and art critics elsewhere (e.g., Arnheim; Kraus; Mitchell). Forceville’s aim of cataloging comics signs by breaking them down into minimal visual units with a limited range of meanings suggests a linguistic approach similar to Cohn’s and at odds with Miodrag’s art-critical framework.

Like Miodrag, Forceville argues that comics signs are motivated icons rather than arbitrary marks, but although this motivated nature distances comics signs from linguistic marks, Forceville still suggests a largely linguistic approach to comics signs. Rather than drawing from art theory in making a claim for motivated-ness, Forceville refers to previous cognitive science research on Conceptual Metaphor Theory, which may be understood, at the risk of oversimplification, as essentially claiming humans are capable of conceptualizing abstract phenomena due to their imaginative processes and embodied, sensorial experience, based on which they establish verbal and visual metaphors for representing abstract concepts (Forceville, “Pictorial” 887; “Visual” 70-2). For Forceville, comics signs are visual metaphors drawing from sensorial experience (e.g., a red face indicates anger) or verbal metaphors (e.g., “spitting fire” denotes anger), and so “constitute a rudimentary ‘language’ that is used to (help) visualize non-visible events and experiences understood to take place, or to have taken place, in a static medium.” (Forceville, “Pictorial” 888). Forceville’s referencing these signs as a proto-language highlights how, despite arguing for the motivated-ness of comics signs, he understands them as functioning like verbal metaphors drawing less from localized contextual cues or iconicity than from a collective, unconscious cultural repository of metaphor models used to conceptualize abstract phenomena like emotions (“Visual” 70-71, 82-3). One sign can represent different emotions (e.g., a red face can denote anger or embarrassment), but Forceville claims it does so by working in tandem with other signs to produce a range of allowed, culture-specific meanings, much like verbal or written metaphors (84). For him, although comics signs may be motivated, this motivated-ness derives less from surrounding pictorial or narrative cues than from a cultural repository of allowed metaphor models, just as in spoken or written metaphors. Unlike Miodrag, Forceville does not understand the motivated-ness of comics signs as negating their linguistic function.

Moreover, although Forceville shares Miodrag’s emphasis on the partially representative nature of comics signs—e.g., he proposes a rationale similar to Miodrag’s regarding the motivatedness behind the use of jagged or wavy lines to represent sound vibrations (888)—he would likely disapprove of her referencing these signs as “metaphor icons,” instead arguing signs have a “metonymic” relationship with the phenomena they depict (887). Much like Miodrag, Forceville draws from Peircean semiotics and its three categories—indexical, iconic, and symbolic—for signs. In further agreement with Miodrag, Forceville quickly concludes comics signs are not abstract, purely conventional marks, and so denies their status as Peircean symbols (Forceville, “Visual” 73-5; Miodrag 172-3). Yet unlike Miodrag, he prefers the term “indexical” to describe comics signs, writing of signs for anger, “Most of these [signs] can be uncontroversially classified as ‘indexes,’ that is, as metonymically motivated signs resulting from anger, although they often take on an exaggerated form” (Forceville, “Visual” 75). His example is the reddening of face in comics that mimics real-world blushing. Forceville rejects any classification of these signs as “iconic,” believing that, because the signified phenomena are immaterial, signs cannot “derive their meaning from the resemblance they bear to that what they signify” (73). Continuing with his anger example, he writes, “Since anger is an abstract concept, it by definition defies iconic representation, and can hence only be rendered by means of indexical and symbolic signs” (73). By contrast, Miodrag considers comics signs to be Peircean icons because, for her, iconicity is not limited to correspondence with physical forms as Forceville suggests. Miodrag writes, “Basic interpretations of the iconic category tend to be skewed towards image icons, suggesting a discernible similarity of form defines the category as a whole. However, there exist other types of motivated ‘likeness.’ […] Metaphor icons work by a kind of parallelism of attributes” (Miodrag 173). For Miodrag, comics signs are iconical because, although they cannot represent the physical form of an emotion, they can represent specific attributes associated with that emotion. While Forceville interprets the red face or smoke/fire used to signify anger in comics as drawing from sensorial experience or culture-specific metaphor models, Miodrag’s argument suggests such signs signify anger by representing, however cartoonishly, certain attributes of anger, such as the rise in body temperature that often accompanies anger. In this way, Miodrag emphasizes the representative and pictorial quality of comics signs. By contrast, Forceville’s designation of these as indexical or metonymic signs suggests he positions comics signs closer on the text-image spectrum to abstract notational marks, which constitute most written language, than to images popularly conceived as representational.

This comparison between Cohn, Miodrag, and Forceville shows that scholars conceive of comics signs either as primarily linguistic or pictorial objects, and do so in varying degrees, demonstrating the liminal nature of comics signs. Due to their liminal status, any method for indexing and querying comics signs across a digital collection needs to consider both the linguistic and pictorial features of these signs, and provide methods for indexing and querying according to both aspects. Neither text-based nor image-based retrieval methods can do this alone. While the former may be able to track the meaning of individual signs, it fails to adequately account for a sign’s visual properties. Conversely, while automated image retrieval can query signs according to visual characteristics, e.g., color or size, it cannot track a sign’s meaning; e.g., it cannot discern whether a group of wavy lines within a given panel signify smell, anger, or something else. Given the respective advantages and disadvantages of text-based and image-based retrieval methods, integrating XML-based markup with CBIR becomes an attractive option, as it would enable the indexing and querying of comics signs according to both their linguistic and pictorial properties. To explore this option, the following section provides an overview of markup scholarship and proposes one method for expanding CBML to represent comics signs and their role within the relational network that is the comics image.

Comic Book Markup Language

Here, a brief introduction to markup language and its surrounding scholarship is warranted. Perhaps the markup language most widely used in the humanities is the Text Encoding Initiative (TEI), a well-developed model for the digital representation of various document types including print, manuscripts, drama, verse, etc.1 It was developed and is presently maintained by an international consortium of technicians and scholars. TEI’s accompanying TEI Guidelines

make recommendations about suitable ways of representing those features of textual resources which need to be identified explicitly in order to facilitate processing by computer programs. In particular, they specify a set of markers (or tags) which may be inserted in the electronic representation of the text, in order to mark the text structure and other features of interest. (TEI n.p.)

These “features of interest” range from structural features or links between documents to individual words and parts of speech. Obviously, divergent disciplinary and research interests necessitate different encoding practices and features of interest—e.g., an archivist may use TEI for manuscript description, while a linguist may use TEI for linguistic analysis, such as labeling principal parts of speech or recording variance between speech patterns in a transcription. CBML was developed as a sort of expansion of TEI, designed specifically for representing comics documents in this markup language.

In a 2012 issue of Digital Humanities Quarterly, John Walsh introduces his TEI-based Comic Book Markup Language (CBML).2 CBML enables the digital transcription of comics for use in digital collections, archives, and research projects. As markup languages, both TEI and CBML are XML vocabularies consisting of machine-readable tags used to identify, query, and analyze structural and semantic features of an encoded document, whether comics or purely textual works. One purpose behind developing a markup language specifically for comics is, as Walsh writes, “to support the study and analysis of comic books in the way that digital collections of, for instance, English poetry or American fiction support the study and analysis of more traditional literary forms” (Walsh n.p.).  He continues,

A large corpus of digitized comic books, along with encoded transcriptions and descriptive metadata, would allow scholars to search the text of comic books, search for keywords related to topics of interest, search for the appearance of particular characters, or search for works by particular writers and artists. Additionally, when exploited to its full potential, a large CBML collection would allow searching—and other forms of computer processing and computational analysis—based on structural, aesthetic, and informational and documentary features peculiar to the genre of comic books. (Walsh)

In this passage, Walsh evidences his prioritization of comics’ bibliographical and textual features. Indeed, as Walsh writes elsewhere, “Comics are a visual, graphic art form combining text and image. CBML/TEI/XML is a text format. While one could certainly use CBML to describe details about any or all of the pictures in a comic book publication, such an effort would undermine the hybrid form of the comic book” (Walsh). Given Walsh’s focus on tracking comics’ textual and bibliographical features when developing CBML, this markup language provides an excellent means of indexing and querying a comics document according to its non-pictorial properties, such as the meanings of individual signs. CBML’s potential for tracking the meaning of individual comics signs is due to markups status as a form critical, close reading.

Jerome McGann writes, “When you mark up a text you are ipso facto reading and interpreting it” (Radiant 143). By digitally representing a document in markup languages like TEI or CBML, the researcher produces an interpretive description of the document in question. Julia Flanders writes of markup’s interpretational nature, “[W]e can understand XML [and by extension, TEI] as a way of expressing perspectival understandings of the text: not as a way of capturing what is timeless and essential but as a way of inscribing our own changeable will on the text—in other words, as a form of reading” (Flanders 60-1). Here, Flanders speaks of markup language as an alternative method of producing a scholarly reading of a source document, not unlike the traditional research essay. The primary difference between encoding and traditional reading is that, with markup language, “the source text and the scholar’s reading and analysis are inextricably intermingled in the encoded document” (Walsh). Johanna Drucker understands markup’s interpretational potential—its ability to produce scholarly readings in the form of coded representations of physical documents—as one of markups chief merits. To this end, Drucker describes encoded documents as metadocuments

which describe and enhance information but also serve as performative instruments. […] Metalanguages have a fascinating power, carrying a suggestion of higher order capabilities. As texts that describe a language, naming and articulating its structures, forms, and functions they seem to trump languages that are used merely for composition or expression. (Drucker 11)

Rather than simply observing a given source document, be it a fifteenth-century manuscript or twentieth-century comic book, markup language enables one to dynamically engage with a given document as one dissects its structural, narrative, and rhetorical properties to produce an interpretational description.3 In this way, markup provides a digital alternative to traditional methods of critical analysis and close reading.

Markups status as a form of close reading (or re images, close viewing) allows for its intuitive application to comics’ sign system. Given Walsh’s focus on comics text, it seems natural that he proposes a method for encoding comics’ word balloons, one of comics’ most common signs. CBML contains an original non-TEI element <cbml:balloon>,4 which is meant to contain any text that appears within a given balloon. There are numerous attributes that may be attached to this balloon element (attributes modify and define the elements to which they are attached) but the two most pertinent to this article are the @rendition and @type attributes used to indicate the visual form and type of balloon, respectively. The speech balloon’s type and form often connect in comics; e.g., dialogue spoken through an electronic device like a television or radio typically appears in a balloon with pointy edges as opposed to the speech balloon’s archetypal smooth, circular shape. Walsh provides the following sample markup for such an audio balloon, which appears in Stan Lee and Steve Ditko’s The Amazing Spider-Man #5 (1963) (n.b. portions of Walsh’s encoding have been removed from this sample for clarity):

<cbml:balloon rendition=”#uc” type=”audio” subtype=”telecast” who=”#jjj”> 

My name is J. Jonah Jameson, publisher of Now magazine and the Daily Bugle! I am sponsoring this program in the public interest, to expose Spider-Man to the public as the menace he is!

</cbml:balloon> (Walsh)

Notably, Walshs <cbml:balloon> element does not record the visual appearance of the text balloon, e.g., its location on the page or shape, but rather its meaning, e.g., what its attachment to a character/object and its shape/form signify. While CBML’s text format cannot capture all of the nuances and variances of a text balloons visual properties, such as line thickness or shape, markup can record the meaning signified by those visual properties; e.g., it can record whether a balloons form signifies shouting or thinking. Thus, rather than attaching attributes to the <cbml:balloon> element to describe its appearance, the @type and @subtype attributes designate the meaning behind the balloon’s visual form. This method for encoding text balloons can also be applied to comics signs more generally.

Although Miodrag convincingly argues for the “metaphoric iconicity” of balloons as pictorial icons, balloons differ from other comics signs, like “heart eyes” or “dizzy stars,” in that they operate as “carriers” (to borrow a term from Cohn) for text, and so will nearly always be accompanied by some textual content (Miodrag 179; Cohn, Language 35-7). While this difference separates balloons from other comics signs, scholars like Miodrag and Cohn frequently classify them together because balloons and comics signs appear to function in similar ways. This suggests the possibility of developing a <cbml:sign> element, based on Walshs balloon element, which would represent other comics signs that do not carry text. Much like the balloon element, <cbml:sign> would receive the attributes @who, @type, and @subtype (if needed), and be contained within the <cbml:panel> element. In TEI, @who points to another element in the markup file that describes or names the character with whom the element attached to @who is associated in the source document. In order to indicate with which character(s) a given sign is associated, the @who attribute attached to <cbml:sign> would point via a URI reference5 to one or more character ID(s), in much the same way it functions when attached to <cbml:balloon>. The @type and @subtype attributes would be used to indicate which emotion or phenomenon the sign signifies. Using the @who, @type, and @subtype attributes would enable the markup to track the meaning behind a comics sign, and its relationship to other elements within the comics image or document. An example demonstrates how this element would function in representing a given comics panel.

Figure 1

In Panel A of fig. 1, a group of wavy lines floats above Edgar Allan Poe’s head to signify his anger upon hearing the Raven squawk, “Nevermore!” Using Walshs CBML and the proposed <cbml:sign> element, Panel A of fig. 1 may be encoded as follows:

<cbml:panel xml:id=”panelA”>

<cbml:balloon type=”speech” who=”#raven”>



<cbml:sign type=”anger” who=”#poe”/>


In the encoding of this scene, a <cbml:panel> element represents the panel as a whole, and contains both a <cbml:balloon> element for the raven’s speech and a <cbml:sign> element for the “anger lines” above Poe’s head. The <cbml:sign> element receives the attribute @type=“anger” to indicate that the wavy lines signify Poe’s anger, while its @who attribute indicates with which character this sign for anger is associated. The URI link “#poe” points to an element elsewhere in the document containing information—such as character names, role, gender, species, age, etc.—about the character assigned the ID “poe.” As can be seen in this example, the <cbml:sign> element indicates the signified emotion as well as which characters with whom the sign is meant to be associated, if any. The @rend attribute, which Walsh includes with his <cbml:balloon> element, is here avoided so as to leave the tracking and cataloging of signs’ visual properties to automated image analysis, the integration of which will be discussed later in this article. 

The decision to forgo any description of comics signs’ visual properties in markup is largely due to the problems of categorizing and indexing images in text-based formats. While select comics signs may be categorized according to general visual forms that possess consistent, cultural meanings (e.g., hearts=love or Zs=sleep), such general forms do not acknowledge the nuances represented by divergent visual properties. For example, each of fig. 2’s four panels contains a group of wavy lines that, when compared to those of the other panels, represents a distinct immaterial phenomenon; respectively, the wavy lines represent anger (Panel A), heat (Panel B), pain (Panel C), and noise (Panel D). Encoding them all as <cbml:sign type=“wavylines”> fails to acknowledge the visual nuances contributing to their respective meanings, for the minute visual differences between line groups can be just as significant to their individual meanings as is their narrative context (cf. Miodrag 189). Moreover, readers will likely describe the same sign using different terminology. Some may object to describing all of fig. 2’s line groups as wavy lines, arguing that Panel D’s line group is more aptly described as jagged lines. Others may distinguish between wavy and curvy lines, claiming Panel A uses curvy lines to signify anger while Panel B uses wavy lines to signify heat. The field of image indexing has witnessed numerous attempts, such as the development of indexing thesauri like the Thesaurus for Graphic Materials, to overcome complications involved in the subjective classification of visual forms. But even these thesauri are not necessarily successful in overcoming indexer subjectivity, and often come with additional problems.6 However, while descriptions of a given visual sign may differ widely from person to person, the signs meaning remains much more consistent. Indeed, signs function because, once positioned in a specific context, readers generally recognize the same signification in a sign. Text-based means of cataloging images, including CBML, are better suited for tracking the more limited non-visual significations of individual comics signs—i.e., their meanings and relationships to other elements—than for tracking their unlimited visual differences.

Figure 2

Both Forceville and Cohn recognize that one emotion or sensation can be communicated through different emanata, and that one given emanation can have numerous and divergent meanings depending on its context (Cohn, Language 47-8; Forceville, “Pictorial” 887-8). Miodrag is right to assert that comics signs are malleable—capable of signifying various phenomena depending on context—and that this is due to their considerable lack of meaning prior to having any such context (Miodrag 183-4). But once placed within a specific visual and narrative context, a comics sign takes on a largely fixed and limited signification. Admittedly, certain signs can take on multiple meanings within one context, yet even then, a given sign possesses a limited range of related, and potentially synonymous, meanings. E.g., Zs floating above a sleeping character’s head may be interpreted as signifying sleep or signifying snoring during sleep. In this example, however, one can argue snoring itself is an indexical sign for sleep, as one does not snore while awake, and so the Z sign, by signifying a character’s snoring, ultimately signifies the character is asleep. Yet should an encoder or encoding group feel the difference between snoring and not snoring during sleep is important in a comics document, and that Zs are meant to signify the former, then the @type sign for Zs in that comics document would simply take a value of “snore” rather than the “sleep” value it might in comics in which Zs only indicate one is asleep and not, necessarily, that one snores. This example shows that, a given sign’s meaning changes depending on local (panel- or sequence-specific), global (document-wide), and even culture-specific contextual information. As a form of critical, close reading, markup offers an apt means of uncovering and recording the highly context-specific meanings of individual comics signs.

In order to accurately represent a comics document in markup, the encoder must understand the function and nature of each of the document’s constitutive features. Thus, the encoder must conduct a close, critical analysis of the comics document in order to understand each sign’s given meaning based on local and global (i.e., document-wide) contextual details, whether visual or narrative. Through such detail-oriented critical analysis, CBML provides a localized close reading of comics documents. But when used to encode a large quantity of comics, CBML allows for the large-scale, corpus analysis of a comics collection. In fact, any large-scale analysis of a digital, encoded collection will be an analysis based upon critical, close readings—that is, readings attentive to the contextual features of each image and document. The production of large-scale analysis from markup’s close reading is what Walsh infers when he writes,

Individual comic books and graphic narratives may be better understood through strategies of close reading, with careful attention paid to minute details of linguistic and visual language and document features, but comic books as a system and extended comic book narratives—which often unfold over decades in thousands or tens of thousands of individual documents by hundreds of creators (writers and artists)—benefit also from the [large-scale analysis] (of text, image, and metadata) from comic book documents. (Walsh)

By retaining the attention to detail characteristic of close reading while still providing a means for massive, corpus analyses across collections based on the meaning of comics signs, as well as their bibliographic metadata (e.g., data indicating in what works they appeared, who authored them, etc.), markup languages like CBML produce a continual engagement with comics on both the individual and collective level. Previous studies in digital image indexing have evidenced that text-based image queries across digital image collections focus almost entirely on an image’s meaning rather than its visual properties (Armitage and Enser 292-4; Jörgensen 172). The requisite close reading and critical analysis of a comics document when encoding suggests CBML can provide an apt means of uncovering and recording the signified meanings, as well as the context, of individual comics signs.

As Walsh acknowledges, given CBML’s text format, “one could certainly use CBML to describe details about any or all of the pictures in a comic book publication, [yet] such an effort would undermine the hybrid form of the comic book” (Walsh). CBML’s text format and the close reading involved in its implementation are better suited to recording the significations of individual comics signs. Rather than attempting to resolve the complications involved in cataloging visual forms with text-based markup, this section proposes an expansion of CBML, for the purpose of recording the context-specific meanings of comics signs, that takes advantage of markup’s status as a form of close reading for interpreting document features. This approach leaves any querying based on visual properties to automated methods of image analysis capable of tracking similarities or differences in color, size, texture, etc. across massive image sets. The following section begins with a survey of select recent work in CBIR and digital image analysis regarding comics, and from there elaborates on how CBIR and CBML may be usefully integrated so as to develop a holistic method of indexing and querying comics signs.

CBIR, CBML, and Interpretation

Content-based image retrieval (CBIR) is a computational process of searching and retrieving images from a large collection based on properties such as color or shape, as opposed to searching via image metadata as one might do in a digital image archive. CBIR enables an extension of the basic method of image criticism—that is, comparative analysis—across significantly larger datasets, and at faster speeds, than the solitary critic may be capable of matching. Numerous computer scientists have explored the potential for implementing automated, content-based analysis of comics images, and thereby showcased the potential for an extensive corpus analysis of comics images based on pictorial properties alone. Three major research areas for automated content analysis of comics are text recognition (see Aramaki et al.; Liu et al.), boundary detection (whether of word balloons [see Correia and Gomes] or panels [see Pang et al.]), and character recognition (see Iwata et al.). While each of these areas focuses on analyzing one feature of comics across a large image set, other scholars have incorporated methods for tracking several features together to analyze collections via “high-level” descriptions, e.g., artistic style or genre (Augereau et al. 2, 7). For instance, W.T. Chu and W.C. Cheng coupled automated methods for detecting screentones (i.e., shading and texturing) with those for detecting panel layouts, as a way of querying comics according to the styles of individual manga artists (Chu and Cheng). As Chu and Cheng’s method for analyzing artistic style demonstrates, computers cannot track meaning or other such intangible qualities of a document; they are only capable of analyzing large image sets according to pixel-level analyses that cross-examine similarities and differences in color, hue, shape, and other visual properties.

One canonical example of automated image analysis applied to comics is Lev Manovichs essay How to Compare One Million Images?” Therein, Manovich uses automated image analysis and data visualizations to cross-analyze a set of approximately one million manga pages according to criteria such as color, saturation, hue, etc. Such research yields several obvious benefits—much larger sets of image data may be compared, and much more rapidly, than via traditional, analog criticism, and such massive automated image comparison may address pictorial properties without being influenced by their human-endowed meanings (Manovich 261-2). A problem with Manovichs study, however, is his suggestion that this automated comparison of large image sets can somehow offer an interpretation of the image data that is not encumbered by the critics subjectivity. Near the beginning of his essay, Manovich writes,

Having at our disposal very large cultural data sets which can be analyzed automatically with computers and explored using interactive interfaces and visualization opens up new possibilities. [..] Instead of being fuzzy and blurred, our horizon (knowledge of a cultural field as whole) can become razor sharp and at the same time acquire a new depth (being able to sort and cluster millions of artifacts along multiple dimensions). This would enrich our understanding of any single artifact because we would see it in relation to precisely delineated larger patterns. It would also allow us to make more confident statements about the field at large. (252)

While Manovich is correct that the increased scale allowed by digital modes of analysis allows for new insights previously inaccessible to the individual scholar, he overreaches in his valuation of digital analysis, as is particularly evident in the above final sentence. Manovichs idea of being able to produce more confident statements about the field at large” with “razor sharp” precision suggests that he sees large-scale, automated image analysis as being able to obtain a view of a given image collection closer to reality than the view an individual critic can obtain, as though automated analysis depicts the collection as it is in the world, unencumbered by interpretation. But this is not how comics—or indeed, art at large—should be approached. In compiling over one million images for a computer to compare, with the express goal of discovering all constitutive features and variations of his chosen manga image set, Manovich never engages with his art objects (i.e., manga pages) as they are designed to be encountered—that is, read individually and in smaller groups (e.g. two pages at a time as the reader experiences them). Instead, hoping to reveal the image collections essential nature, he appears to view the computer as an interpretive agent acting on his behalf.

In light of this misguided approach, research on the application of CBIR to comics collections may benefit from Jerome McGanns remark concerning digital textual scholarship: what is needed is a dynamic engagement with [images] and not a program aimed at discovering the objectively constitutive features of what [an image] is’” (Dialogue” 105). To this end, CBIR may be supplemented with CBML given the latters status as a form of critical reading and resultant ability to engage dynamically with comics documents. Beginning their survey of contemporary advancements in automated comics image analysis, Augereau et al. write

To fully analyze and understand the content of comics, we need to consider natural language processing to understand the story and the dialogues; and computer vision to understand the line drawings, characters, locations, actions, etc. A high-level analysis is also necessary to understand events, emotions, storytelling, the relations between the characters, etc. (2)

Due to its inability to track meaning and interpret images as humans do, CBIR cannot conduct Augereau et al.’s “high-level analysis” of events, emotions, and story. By contrast, markup’s nature as a form of critical, close reading suggests CBML serves as a potential means for this high-level analysis. A system that integrates CBIR and CBML can enable analysis and understanding of comics collections according to visual, textual, and high-level properties. However, this is not precisely how CBML has been utilized in previous attempts to integrate CBML and CBIR.

Nguyen et al. propose a different method for integrating CBML with deep learning technology. The purpose of their integration is to develop a means to index the visual and textual content of comic book images for content-based retrieval systems such as search engines” (Nguyen et al. 6). Their integration offers a “processing pipeline from global image analysis to precise content extraction” (2). In this pipeline, “comic book images are processed and then panels are extracted,” after which point Nguyen et al.’s pipeline “use[s] a state-of-the-art algorithm to detect (localize) texts in panels” (7). Once text has been detected in the panel, the pipeline employs a combination of algorithms and “state-of-the-art object detection models” to detect both balloons and character’s faces. The computer then “analyze[s] the relationship between balloons and characters” (7). Once all this information has been extracted, “[i]n the last stage, all extracted information is stored in a description file (CBML) to support the online procedure (e.g., search engines)” (7). As this brief overview may illustrate, Nguyen et al. employ CBML primarily as a storing house for the textual and narrative data collected from a computers automated content analysis. In approaching CBML as a means of recording data from the computers automated analysis, Nguyen et al. tacitly attempt to forego markups utility as a form of critical, close reading, which digital humanities scholars such as Drucker, McGann, and Flanders hold up as one of markups chief strengths. This is not to accuse Nguyen et al. of misunderstanding markups purpose or potential. What Nguyen et al. reveal in this approach to CBML, however, is their larger aim of implementing computational methods in order to objectively analyze and catalog comics.

The unavoidable problem with this overarching aim of Nguyen et al.s implementation of CBML is that their resulting markup still represents a specific reading of a comics document: the surface-level reading of a computational agent. In Nguyen et al.’s study, CBML records a reading that views the comics document as a mass of textual and/or visual data with no overarching narrative meaning or structure. For instance, in their initial example concerning the computers analysis of a Spider-Man comic panel, the computer produces a CBML file that records the pictorial surface areas of panels, balloons, even character faces. The following is an excerpt from their sample:

<cbml:character who=“#c1″ height=”958″ width=”304″ x=”53″ y=”109”></cbml:character>

<cbml:face height=”458″ width=”154″ x=”102″ y=”118″></cbml:face>


<cbml:balloon tailDirection=”SE” tailTip=”131,155″ who=”#c1″ height=”498″ width=”324″ x=”25″ y=”6″ n=”0″>






<surface lrx=”25″ lry=”106″ lux=”349″ uly=”504″>

<zone points=”334,184 333,192 340,192 . . . 315,195 331,495 332,113 334,171″/>


</cbml:balloon> (Nguyen et al. 3)


The various attributes containing coordinate-point values—e.g., @lrx, @lry, and @points—are derived from TEI and designate which pixels of the digitized image the specified element covers. As this sample markup shows, the computer-generated CBML file is entirely devoid of meaning-related information, consisting almost solely of visual, surface-level information. What is produced is a faux-objective, superficial analysis of the comics image in markup. While this sort of file is useful for designating what surface area of the given comics image is to be annotated with information on balloons, characters, or comics signs, it provides no information regarding the iconographic meaning of visual elements in the sample image—e.g., it does not indicate whether the balloon signifies speech, thought, shouting, or something else. Of course, this gap in the markup can be filled by having an encoder or a group of encoders review the generated CBML files and revise the markup to include the non-surface-level information the computer is incapable of recognizing, deciding matters such as whether a detected sign signifies anger or smell or whether a balloon’s visual form signifies thought or speech. This method of markup encoding, however, risks integrating two contradictory modes of reading images.

Nguyen et al.s CBML example reflects how the computer reads the image—in discrete units of information sans meaning—but this is not how people read and find meaning in images. Near the end of their essay, Nguyen et al. write that the computers CBML file allows an original comic book page image to be automatically split into several sub-images corresponding to panels with a precise visual and textual description of their respective contents” (27). But this computer-based sectioning of images into discrete units diverges from humanities-based picture criticism. As Kari Kraus writes, “Images are typically thought to be autographic: they operate through what we now think of as ‘analog’ representation methods, with smoothly continuous rather than discrete and stepwise units of information” (237). Image researchers, among which comics scholars maybe numbered, are typically concerned more with an images meaning or interpretation, values found in the images overall unitary pattern” (Kraus 248; cf. Layne 584), than with its basic building blocks, like hue and shading—indeed, these are often considered only (if at all) in light of the images meaning. While Nguyen et al.s division of the comics image into discrete sub-images appears to be a useful means for designating which section of the image is to be annotated with information regarding, say, a comics balloon or sign, it is not true to how humans read and find meaning in images.

Numerous comics scholars have argued that no comics image—with its immanent textual and visual elements—can be read or analyzed in isolation from all of the others appearing alongside it, nor can one part of the image be read apart from the rest of the image (see Groensteen; Postema). Sectioning a comics image within its assigned, meaning-making CBML is difficult. For instance, as mentioned previously in this article, Forceville includes red character faces among comics signs. How would these faces’ blushing be annotated in Nguyen et al.s proposed system? Would the blushing be attached to the <cbml:character> element in some way? Would there be a separate <cbml:sign> element pointing to the character? If the latter, would this elements surface area span the same region as the <cbml:character> or <cbml:face> elements? What about using markup to encode “speed lines” and blurring that signifies speed, such as if one were developing markup for a panel that depicts the DC Comics superhero Flash running at super-speed? In this imagined panel, the Flash is the only concrete object, amidst a blurred background that contains numerous horizontal “speed lines.” How would this be represented in the markup? Is it best to attach an @type attribute to the <cbml:panel> element? Should I insert a separate <cbml:sign> element that spans the whole panel and points to the Flash? Returning to the previous sections wavy lines example: if the lines appear over a red, sweaty-faced character, readers may infer the lines represent steam to signify anger. But if the lines appear over a garbage can, among a fly or two, readers will infer the lines represent smell. If a signs visual context is just as much (if not more) a contributor to its meaning as that signs stand-alone form, can one meaningfully divide the sign, e.g., the wavy lines, from its visual context, e.g., the garbage can and flies?

These examples show that, given the heavily interwoven nature of comics signs—indeed, comics as a whole—the generation of markup by a computer with the intent of discerning meaning makes little sense. Sectioning a given image for analysis by an automated image retrieval system may make sense given computers view and query image files in sections. As a form of close reading, however, markup is a process of making meaning—a high-level analysis of the comics image—and so something the computer is incapable of fully and usefully exploiting. Given its power to record and represent the meaning behind textual, visual, and bibliographic elements, and, in turn, produce a specific reading of an image, markup is of most use when it reflects a human reading of a comics image. Any effort to “fully analyze and understand the content of comics” or conduct a high-level analysis of comics cannot forego human-based critical readings; content analysis cannot be entirely left to automated algorithms and detection models. As Taylor Arnold and Lauren Tilton write of large-scale digital analyses of images,

[O]ne must view” visual materials before studying them. Viewing, which we define as an interpretive action taken by either a person or a model, is necessitated by the way in which information is transferred in visual materials. Therefore, in order to view images computationally, a representation of elements contained within the visual material—a code system in semiotics or, similarly, a metadata schema in informatics—must be constructed. (2)

While Tilton and Arnold write that this initial viewing may be performed by an algorithmic model, even such algorithmic viewing” requires an initial human-based critical viewing of the documents in question, as no such model can be implemented effectively unless the researcher has an in-depth knowledge of the visual documents for which the model is designed. This is especially true for attempts to catalog a variety of comics signs across a digital collection, given the meaning of any one sign, as with nearly any feature of a comics image, is heavily dependent on context. Because markup concerns the meaning behind and connection between elements of a given document, it is a form of analysis beyond the scope of automated systems.

Figure 3: Panel sequence modeled after Cohn, “Limits”

This does not mean that, within a system that integrates CBML and CBIR, the former cannot be used to annotate the comics image. But when markup is used to record the relationship between pictorial elements and their collectively-produced meaning, dividing a comics image into subsections is ill-suited to markups purpose. Dividing the image within a markup file that designates the signification of comics signs suggests a reading of the image in which the meanings of those signs—indeed, potentially of all elements of the image—are each restricted to a specific topographical region on the image surface. When representing the meanings of comics signs via CBML, markup is better suited to recording the textual, visual, and bibliographical structures of the comics image, rather than its visible surface area, and may more usefully leave the latter to CBIR. Rather than designate the spatial area of comics elements using coordinate-point attribute values, as Nguyen et al. do, the CBML file is better suited for only recording the various elements along with their structural and narrative relationships. For example, in this approach, fig. 3 would be encoded as follows:

<div type=”panelGrp” subtype=”dailyStrip”>

<graphic url=”/Users/jacobmurel/Documents/cbml_research/signs_paper/figure03.jpeg”/>

<cbml:panel characters=”#raven” xml:id=”panel01″ n=”1″>

<sound who=#raven”>



<cbml:sign who=”#raven” type=”emanata” subtype=”sound”/>


<cbml:panel characters=”#poe” xml:id=”panel02″ n=”2″>

<cbml:balloon who=”#poe” type=speech”>




<cbml:panel characters=”#poe” xml:id=”panel03″ n=”3″/>

<cbml:panel characters=“#raven #poe” xml:id=”panel04″ n=”4″>

<cbml:sign who=”#raven #poe” type=”impactStar”/>



Here, the markup does not record the visual information of the comics panel sequence, but focuses on structural and narrative relationships of elements—e.g., relationships concerning which characters and signs appear in which panels, to whom each balloon or sign is connected, or the assumed reading order of the panels in sequence. Through the use of the TEI <graphic> element, however, the markup remains attached to its corresponding image file, allowing the markup for the panel sequence to be easily retrieved when CBIR is used to search a collection based on the appearance of a comics sign potentially contained within the image, such as “noise emanata” or “impact stars.” This annotation system allows comics images to be indexed/queried according to both their linguistic and pictorial properties. The integration of CBML and CBIR along these lines utilizes the respective advantages of each—CBIRs ability to match visual characteristics and CBMLs usefulness for recording the meaningful relationships between elements—while limiting their respective shortcomings. In this framework, both human and computer agents work together to operate a holistic analytic system that produces results with greater efficacy and precision than can either type of agent working alone.


Given the introductory nature of this article, there remain many areas for further development and research to successfully integrate CBML and CBIR in the annotative manner described. A central concern consists in defining what is cataloged as a sign and what is not, a difficult issuing considering Cohn, Miodrag, and Forceville all differ in their interpretations of what counts as a discrete comics sign. For instance, Forceville includes reddened character faces alongside “smoke lines” and “steam lines” for indicating anger, classifying all as pictorial runes (Forceville, “Visual” 82). Miodrag makes no mention of reddened character faces. In reading Mazzucchelli’s Asterios Polyp, however, she implies that Mazzucchelli’s stylizations of characters, including line types and colors unique to certain characters, function similarly to signs like “smoke lines” and “steam lines” (Miodrag 174-7). Unlike Miodrag, Cohn distinguishes more explicitly between the ways an artist draws objects or individuals—what he calls “open-class lexical items”—and the discrete signs, like “steam lines,” used to represent immaterial phenomena—what he calls “closed-class lexical items” (Cohn, Language 24). One can easily imagine other divisions—e.g., although Eerden largely shares Forceville’s critical framework, he differentiates between “pictorial runes” (e.g., “steam lines” for anger) and “indexical signs” (e.g., red faces) (Eerden 245-6). Different research teams and individuals will obviously work from different critical frameworks in developing the sort of annotated retrieval method described, and thus have diverging criteria for what is prioritized as a “comics sign.” While they may simply adopt one of the frameworks described above or work from an alternative critical approach, developers will nevertheless need to engage with the above positions in developing encoding criteria based on a critical conception of comics signs.

Additionally, the most obvious avenue for future work in integrating text-based and image-based retrieval methods is the technical development of an annotated retrieval model similar to that introduced in this article. While the proposed CBML element <cbml:sign> can be easily created through a modification of the CBML schema made freely available by John Walsh,7 CBML’s technical integration with CBIR would require strategic, cross-disciplinary partnerships between scholars. Yet before such practical implementation may proceed, a more detailed discussion on the technical integration of CBML and CBIR needs to transpire. The present article is, admittedly, conceptual and so perhaps vague regarding the technical specificities in integrating CBML with CBIR. This is in part due to the fact that the article aims to advance an interdisciplinary discussion on the integration of text-based and image-based retrieval methods, rather than produce a prescriptive methodology. Both markup and CBIR in relation to comics have primarily been the exclusive domain of professions working largely in isolation from one another. Given comics’ status as a hybrid medium, it seems poetic that any indexing and querying methods for comics collections might involve a hybridization of critical, theoretical, and technical research by scholars working in tandem across disciplines. Precisely what this would look like in practice remains to be seen.


[1] For an extended and seminal introduction to TEI, see Vanhoutte 2004.

[2] Walsh’s CBML has been built upon since its initial introduction in DHQ. For examples of these developments, see Bateman et al. 2010 and Murel 2020.

[3] Examples of digital projects that utilize TEI markup are Harlem Shadows ( and the Women Writers Project ( The Map of Early Modern London ( offers an interesting example of incorporating XML markup with GIS technology to annotate a digitized scan of the 17th century Civitas Londinum woodblock print. For an example of the sort of large-scale analysis markup enables, see Connell and Flanders 2020.

[4] In documentation on XML markup languages (of which TEI and CBML may be considered species types), element names are wrapped inside pointed brackets (< >) while attribute names are preceded by an “@” sign. The present article follows this formatting.

[5] In TEI markup, a URI (Uniform Resource Identifier) reference is a unique character string that references another element in the markup file. In this article’s following markup examples, for instance, an element would be assigned an @xml:id of “raven” or “poe” to which other elements would point via the URI references “#raven” or “#poe.”

[6] For the sake of brevity, this article will not go into all of the complications involved in implementing indexing thesauri. These complications are discussed in Cawkell 413; Chen and Rasmussen 294; Matusiak 284-7; Thomas 202-3; and Woll 22. Nevertheless, Neil Cohn’s aforementioned reservations about a catalogue of comics signs are not unwarranted. The number of different signs for signifying anger in comics is potentially limitless—new signs can be invented every day and numerous modifications made to extant signs (Miodrag 181). When one considers this, a comprehensive catalogue of comics signs seems impractical. 

[7] Available for download here:

Works Cited

Aggleton, Jen. “Defining Digital Comics: a British Library Perspective.” Journal of Graphic Novels and Comics, vol. 10, no. 4, 2019, pp. 393-409.

Aramaki, Y., et al. “Test Detection in Manga by Combining Connected-Component-Based and Region Classifications.” Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP) Phoenix, AZ, 25-28 September 2016, pp. 2901-5. 

Armitage, L. and P.  Enser. “Analysis of User Need in Image Archives.” Journal of Information Science, vol. 23, no. 4, 1997, pp. 287-99.

Arnheim, Rudolf. “On the Nature of Photography.” Critical Inquiry, vol. 1, no. 1, 1974, pp. 149-61.

Arnold, Taylor and Lauren Tilton. Distant Viewing: Analyzing Visual Corpora.” Digital Scholarship in the Humanities, March 2019, pp. 1-14. Accessible at:

Augereau, O., et al. “A Survey of Comics Research in Computer Science.” Journal of Imaging, vol. 4, no. 7, 2018, pp. 1-19. Accessible at:

Bateman, J., et al. An Open Multilevel Classification Scheme for the Visual Layout of Comics and Graphic Novels: Motivation and Design.” Digital Scholarship in the Humanities, vol. 32, no. 3 (2017), pp. 476-510.

Cawkell, A.E., “Picture-Queries and Picture Databases.” Journal of Information Science, vol. 19, no. 6, 1993, pp. 409-23.

Chen, Hsin-Liang and Edie Rasmussen. “Intellectual Access to Images.” Library Trends, vol. 48, no. 2, 1999, pp. 291-302.

Chu, W.T. and W.C. Cheng. “Manga-Specific Features and Latent Style Model for Manga Style Analysis.” Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 20-25 March 2016, pp. 1332-6.

Cohn, Neil. “A Visual Lexicon.” The Public Journal of Semiotics, vol. 1, no. 1 (2007), pp. 35-56.

Cohn, Neil. “The limits of time and transitions: Challenges to theories of sequential image comprehension.” Studies in Comics, vol. 1, no. 1 (2010), pp. 127–46.

—. The Visual Language of Comics: Introduction to the Structure and Cognition of Sequential Images. Bloomsbury, 2013.

Connell, Sarah and Julia Flanders. Writing, Reception, Intertextuality: Networking Womens Writing.” Journal of Medieval and Early Modern Studies, vol. 50, no. 1 (2020), pp. 161-80.

Correia, J.M. and A.J. Gomes. “Balloon Extraction from Complex Comic Books Using Edge Detection and Histogram Scoring.” Multimedia Tools and Application vol. 75, no. 18, 2016, pp. 11367-90.

Davies, Paul Fisher. Comics as Communication: A Functional Approach. Springer, 2019.

Drucker, Johanna. SpecLab: Digital Aesthetics and Projects in Speculative Computing. University of Chicago Press, 2009.

Eerden, Bart. “Anger in Asterix: The Metaphorical Representation of Anger in Comics and Animated Films.” Multimodal Metaphor, edited by Charles Forceville and Eduardo Urios-Aparisi, De Gruyter, 2009, pp. 243-64.

Flanders, Julia. Digital Humanities and the Politics of Scholarly Work. 2005. Brown University, PhD dissertation. ProQuest Dissertations and Theses.

Forceville, Charles. “Pictorial Runes in Tintin and the Picaros.” Journal of Pragmatics, vol. 43, 2011, pp. 875-90.

—. “Visual Representations of the Idealized Cognitive Model of Anger in the Asterix Album La Zizanie.” Journal of Pragmatics, vol. 37, 2005, pp. 69-88.

Groensteen, Thierry. The System of Comics. Translated by Bart Beaty and Nick Nguyen. Jackson, University Press of Mississippi, 2007.

Hochberg, Julian. “The Representation of Things and People.” Art, Perception, and Reality, edited by E.H. Gombrich, et al., John Hopkins University Press, 1972, pp. 47-94.

Iwata, M., et al. A Study to Achieve Manga Character Retrieval Method for Manga Images.” Proceedings of the 2013 27th Brazilian Symposium on Software Engineering (SBES 13), Tours, France, 7–10 April 2013, pp. 309-13, IEEE Computer Society: Washington, D.C., 2013.

Jörgensen, Corinne. “Attributes of Images in Describing Tasks.” Information Processing and Management, vol. 34, nos. 2/3, 1998, pp. 161-74.

Kraus, Kari. Picture Criticism: Textual Studies and the Image.” The Cambridge Companion to Textual Scholarship, edited by Neil Fraistat and Julia Flanders, pp. 236-56, Cambridge: Cambridge UP, 2013.

Layne, Sara Shatford. Some Issues in the Indexing of Images,” Journal of the American Society for Information Science, vol. 45, no. 8 (1994), pp. 583-8.

Liu, X., et al. Text-Aware Balloon Extraction from Manga.” The Visual Computer: International Journal of Computer Graphics, vol. 32, no. 4 (2016), pp. 501–511.

Matusiak, Krystyna K. “Towards User-Centered Indexing in Digital Image Collections.” OCLC Systems and Services: International Digital Library Perspectives, vol. 22, no. 4, 2006, pp. 283-98.

Manovich, Lev. How to Compare One Million Images,” Understanding Digital Humanities, edited by D. Berry, pp. 249-71, New York: Palgrave, 2012.

McCloud, Scott. Making Comics: Storytelling Secrets of Comics, Manga, and Graphic Novels. William Morrow, 2006.

—. Understanding Comics. Kitchen Sink Press, 1993.

McGann, Jerome. Dialogue and Interpretation at the Interface of Man and Machine: Reflections on Textuality and a Proposal for an Experiment in Machine Reading,” Computers and the Humanities, vol. 26 (2002), pp. 95-107.

—. Radiant Textuality: Literature After the World Wide Web. Palgrave Macmillan, 2001.

Miodrag, Hannah. Comics and Language: Reimagining Critical Discourse on the Form. University of Mississippi Press, 2013.

Mitchell, W.J.T. Iconology: Image, Text, Ideology. University of Chicago Press, 1987.

Murel, Jacob. On the Use of XML Markup Language in Comics Criticism.” Digital Scholarship in the Humanities (2020), pp. 1-19.

Nguyen, N.V., et al. “Digital Comics Image Indexing Based on Deep Learning.” Journal of Imaging, vol. 4, no. 7, 2018, pp. 1-34. Accessible at:

Pang, X., et al. “A Robust Panel Extraction Method for Manga.” Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, FL, 3-7 November 2014, pp. 1125-8.

Postema, Barbara. Narrative Structure in Comics: Making Sense of Fragments. RIT Press, 2013.

Rigaud, Christophe and Jean-Christophe Burie. Computer Vision Applied to Comic Book Images.” Empirical Comics Research: Digital, Multimodal, and Cognitive Methods, edited by Alexander Dunst, et al., New York, Routledge, 2018, pp. 104-24.

TEI P5: Guidelines for Electronic Text Encoding and Interchange. TEI Consortium, 2010. Electronic edition accessible at:

Terras, Melissa. “Image Processing in the Digital Humanities.” Digital Humanities in Practice, edited by Claire Warwick, et al., London, Facet Publishing, 2012, pp. 71-90.

Thomas, Julia. “Getting the Picture: Word and Image in the Digital Archive.” European Journal of English Studies, vol. 11, no. 2, 2007, pp.193-206.

Vanhoutte, Edward. An Introduction to the TEI and the TEI Consortium.” Literary and Linguistic Computing, vol. 19, no. 1 (2004), pp. 9–16.

Walker, Mort. The Lexicon of Comicana. iUniverse, 1980.

Walsh, John. “Comic Book Markup Language: An Introduction and Rationale.” Digital Humanities Quarterly, vol. 6, no. 1, 2012, accessible at:

Weiner, Robert G., editor. Graphic Novels and Comics in Libraries and Archives: Essays on Readers, Researchers, History and Cataloguing. McFarland, 2010.

Woll, Johanna. User Access to Digital Image Collections of Cultural Heritage Materials: The Thesaurus as Pass-Key.” Art Documentation: Journal of the Art Libraries Society of North America, vol. 24, no. 2 (2005), pp. 19-28.

Posted in Volume 12, Issue 2