To enhance the quality of the detection of the bodies on paintings, a proper training dataset has to be created. The Pose Annotation Project for Artworks (PAPA) is a platform where people can annotate images from the photographic collection of the Bibliotheca Hertziana in Rome.
More than five thousand hands were extracted from a selection of six thousand paintings from the photographic collection of the Bibliotheca Hertziana in Rome. Have an overview of the hands collection!
The hand can point, touch, grasp, sign, gesticulate and has been the source to many symbolic codifications which, to a certain extent, prevailed over time. Yet, traditional approaches in art history tend to focus on a small portion of depicted hands, and mostly symbolic ones.
The work proposed here intends to use computational methods to have a more global view of painted hands in early modern European paintings—a study of the detail at large scale. To this end, the digitized photographic collection of the Bibliotheca Hertziana—Max-Planck Institute for Art History was used.
A total of three experiments were developed. They all rely on a primary step of detection and extraction of the hands with the help of a machine learning model for body pose estimation.
The first experiment consists of the possibility to train the machine to automaticaly recognize specific hand poses based on an existing classification proposed by Dimova. The goal of this process is to reveal the propensity of the hand to communicate with the viewer and the existence of a form of vocabulary.
The second experiment explores the concept of embodied knowledge and the exploration of the whole corpus of hands through the gesture of the researcher.
The last experiment consits of an attempt of recontextualisation of the hands with their corresponding iconography. A new type of visualisation is also created with networks to better analyze the combination of types of hands in specific contexts.
The corpus of study is an important aspect of the research. Although we are working with big data, the content of the corpus helps framing both historically and materially the study scope.
Here we are working with the digitized photographic collection of the Hertziana—Max-Planck Institute for Art History. Photographic collections were created with the development of photography during the second half of the 19th century and play an important role in academic research. Photography quickly replaced an illustrative tradition and represented an interesting tool for art historian to easily transport, sort, classify and compare artworks. Most of their digitization processes started in the 1990s and brought new archiving considerations, between the photographic object and what it represents. Yet, the fact that we are using digital images of photographic representations of artworks, sometimes found in black and white, raises many questions.
Similar to photography, the digital image and its manipulative propensity raises concerns about objectivity and truth. The digital image corresponds to a mathematical form of the object it refers to—a series of transformations to represent it on screen—and eludes its materiality. In our case, we ignore the photographic object and the digital image is used as an eikon—the image of the painting.
A pre-trained machine learning model for body pose estimation called OpenPose was used. First, the hands are detected and then automatically cropped based on the coordinates of the keypoints information provided by the model.Click on the image to reveal the results!
The detection process is not perfect. Here only two hands out of three were found. It is due to the fact that the model was not trained on artworks but photographic images representing our western contemporary life. Objectivity is not a synonym for computational approaches as the results are influenced by these original training datasets and a specific computational representation of the hand through keypoints. There is a need to enhance these models to increase the accuracy of the results, and to acknowledge the many missing hands and constraints from the computational process.
Defining categories of hand gestures is rather complex, yet fundamental for the training of a machine learning model for the automated estimation of hand gestures. Attempts from past art historians have shown the different perspectives that can be taken for the analysis of hand gestures. Hands can have multiple communicative functions and various origins that are sometimes difficult to trace. The primary sketching of the conceptual landscape of hands—which borrows ontologies proposed by the art historians Wittkower, Gombrich, Barasch and Chastel, in relation to the theories of the cognitivist McNeill—does not solve our categorisation issue.
Dimova, in her work Le langage des mains dans l'art, proposes one of the most comprehensive research projects on western painted hand gestures from the 17th and 18th century. She created a lexicon representing a selection of 30 illustrated chirograms and their possible meaning. Below are the first two pages representing these chirograms
Because the collection of hands does not hold all of Dimova's chirograms and due to computational requirements, the initial 30 categories were reduced to 9. These 9 categories were manually populated with images from the collection. Various experiments were then developed using different types of features. The first are geometric features, which use the keypoints extracted with the help of the Openpose model. The second was directly using the cropped images of the hands. With the latter, the aim was to see if a model could determine new salient visual features from the training process.
Despite many efforts and different experiments undertaken for the classification of hands in early modern paintings, no convincing results were achieved. The classifier never met accuracy levels that could allow us to claim the creation of a model for painted hands gestures recognition.
The reasons for this result are various:
How is it possible to access a specific hand gesture in a collection of more than five thousand painted hands? The possibility to search for a painted hand pose through the one of the researcher is made possible with the Gesture for Artworks Browsing (GAB) application. The application offers a new consideration on the use of the body of the researcher to question the content of paintings and the potential embodied knowledge.
The two images bellow do not represent the same hand gesture. Yet, they both have the same orientation and fingers position. Because they share similar features, they are automatically clustered together. After flipping left hands, they are dispatched into different clusters, in closer adequation to their palm orientation.
Similarily, the second set of hands bellow represents the same gesture. Once left hands are flipped, they are automatically clustered together.
As observed with various experiments on the prediction of iconographic chirograms, computational approaches have the potential to cover a larger amount of hands than traditional methods in art history. Here we explore the potential of computational methods to take into consideration all the hands from the collection and create new categories. To this end we:
As we can see in the graph below, the distribution of the left and right hands among each cluster shows that they are separable. It means that hand types are not equally represented within every cluster. After a visual inspection of the clusters, we determined that, although left and right hands with a similar hand shape and orientation are found in the same cluster, they do not represent the same gestural patterns. Indeed, the orientation of the hands plays an important role in the definition of the clusters, but where a right hand on chest presents similar visual features as a left speaking hand, their meaning significantly differs and should not belong to the same cluster if the gestural pattern is taken into account. To create a proper clustering according to gestural patterns, left and right hands were split, and the original keypoints of the left hand were then horizontally flipped. The features were then computed again for the left hands and merged with the untouched right hand keypoints features. If you click on the graph, you can reveal the new distribution with flipped hands. It shows much more balanced clusters for each hand type, and their visual examination confirmed a gestural coherence among the clusters. We therefore differentiate gestural patterns, obtained by flipping the left hands, in confrontation to geometric patterns. By geometric patterns we understand the general orientation and shape of the hands independently from the hand type, and which aim to reflect on the visual effect produced by these hands for the composition.
Using unsupervised clustering methods allows us to shape new categories of hand poses. Yet, the categories previously shaped by Dimova are an essential foundation to the analysis of early modern European painted hands. Integrating these categories in the clustering process of the machine shows the important relation between art historical theories and computational methods. Through close collaboration, we enhance both approaches and foster a wider understanding of hand poses. It also shows the possibility to finetune the computational process with new or existing knowledge on hands.
Computational methods allow us to consider the whole dataset of hands extracted from the collection of paintings. We were able to cluster them, including categories defined by the art historian Temenushka Dimova. These different clusters represent different hand poses that are used as modes of visual communication by the painter, either for the benefit of the composition or the narrative.
The whole process can be divided into different steps, each revealing new concerns, both methodological and regarding our understanding of early modern painted hands.
Based on these hand combinations, we have the possibility to take some distance from the detail and have a more global understanding of the gestural patterns in play. Each hand was attributed with a specific cluster, either one from previous works in art history or one generated through computational methods.
The process allows us to study—among others—most common hand combinations. As we can see in the image above, the second most common combination reflects on the great majority of religious representations found in the collection of the Bibliotheca Hertziana. The dominance of Joint palms praying also reflects its manifest use for religious expression. It is typically this type of observations made at the level of the pair of hands that reinforce the idea of a correlation between the type of hand and the iconography of the painting.
The iconographic attribution is based on the ICONCLASS system. Each painting is attributed an iconography from the system based on keywords extracted from the title.The requests to the system to find corresponding iconographies are performed in an automated way with the help of a script.For example, the painting Madonna with Child corresponds to the iconography 11F The Virgin Mary.
“Like an artist, an art historian has a style” [Carrier, 1989]
Is there a proper style of doing art history?
A proper style of addressing hands in art?
Could the computational realm define its own style for the apprehension of the historical artifact?
There is no right answer to these questions. I would argue that there are as many styles as there are perspectives on the object of study—and with each style comes a new way of interpretation.
When it comes to the research style of the machine, several points have to be outlined
The major difference brought by the computational perspective is the analysis of the detail at a large scale. Computer vision methods reframe images to reveal the detail and to study it in confrontation to thousands of similar details.
This new research paradigm induces different points of reflection:
In order to be studied and processed by the machine, i.e. compared through grouping and classification methods, salient features have to be defined. The aim is to simplify the information into measurable characteristics. Here it corresponds to properties that best represent the pose:
The original painting is an Allegory of the Faith of Alessandro Moretto from 1540. The origin of commission was not well established and was most probably painted in-between Verona and Brescia. The artwork of Moretto diverges on multiple aspects from its contemporary representation of the Faith. The novelty is the isolated figure and the text written on the strip:“justus ex fide vivi—The righteous by faith shall live”.
The hand is pointing to the strip. The verse repeated by Saint Paul is commented by Martin Luther (1483-1546) at the time of Reformation.
The painting potentially hides a message on the actual religious context of the 16th century, with a direct reference to the problematic catholic principles raised by Luther.
Alessandro Moretto, Allegory of Faith, 1540, Oil on canvas, 102 x 78 cm, Saint Petersburg, The State Hermitage Museum
Searching for similar hand gestures with the GAB application, we find another intriguing pose, mostly found within the icongoraphy of Virgins, such as Mary or Saint Catherine.
Jacopo Palma, A blonde woman, around 1520. Oil on wood, 77.5 x 64.1 cm, London, The National Gallery.
Called the inverted V, the hand gesture can endorse multiple meanings.
In the eye of Leo Steinberg, the hand is part of a system of representation surrounding the body of the young Jesus to express his mankind and the peculiar bond between the mother and her child [Steinberg, 1996].
According to Mauro Zanchi, the reversed V is an apotropaic sign that recalls the horn of Pan or the crescent moon of Diana. When accomplished by young women, it refers to the virginal status of the character, a sign that is not uncommon in the Veneto of the first quarter of the 16th century. When accomplished by mothers, the sign represents protection. In the religious iconography, the sign would therefore prefigure the tragic ending of Jesus on the cross. [Zanchi, 2017]
Mostly found at the time of Catholic Reformation, the inverted V could also refer to the holy Trinity and the fundamental role of the Virgin in the salvation of humankind. The gesture reaffirms the important position of Mary in the Holy story and portrays divine love.
Perino del Vaga, Madonna with Child, 1535-1540, Oil on canvas, 121 x 91 cm, Florence, Gli Uffizi. Detail of the hand.
With the help of the network graph, we can visualize the types of hand poses most commonly used in the iconography of Mary. As with other religious representations, praying gestures are central. Other central clusters are the ones representing horizontal flat hands. Most of these hands are not symbolic but caring gestures, in interaction with the body of the child—a form of maternal symbolism. This symbolic is confirmed and futher expanded by Steinberg, where these gestures reveal his incarnation and manhood [Steinberg, 1996].
These peculiar hands either shape the lexicon of a form of universal motherhood, or the one of the descent of God into manhood, or the one of the Divine love of Mary.
Although the computational tools do not allow to solve the meaning of the inverted V, they allow us to pay closer attention to unseen hands; to accelerate the reseach process; to define new associations; and to augment the original lexicon provided by Dimova.
Furthermore, these tools work hand in hand with a proper historical investigation and a work of recontextualization.