Valentine Bernasconi

Because of a need for transparency on the content of the collection and for later analysis of the results. It includes a better knowledge of the material and the possibility to tailor cleaning processes to the specificities of the collection. It is also part of and effort of contextualisation of the digital image.

The dataset

The corpus of study is an important aspect of the research. Although we are working with big data, the content of the corpus helps framing both historically and materially the study scope.

Here we are working with the digitized photographic collection of the Hertziana—Max-Planck Institute for Art History. Photographic collections were created with the development of photography during the second half of the 19th century and play an important role in academic research. Photography quickly replaced an illustrative tradition and represented an interesting tool for art historian to easily transport, sort, classify and compare artworks. Most of their digitization processes started in the 1990s and brought new archiving considerations, between the photographic object and what it represents. Yet, the fact that we are using digital images of photographic representations of artworks, sometimes found in black and white, raises many questions.

Similar to photography, the digital image and its manipulative propensity raises concerns about objectivity and truth. The digital image corresponds to a mathematical form of the object it refers to—a series of transformations to represent it on screen—and eludes its materiality. In our case, we ignore the photographic object and the digital image is used as an eikon—the image of the painting.

Predicting iconographic chirograms

The categorization of Temenuzhka Dimova

Shaping classes of hand poses

The complexity of automated prediction of hands

A chirogram or pictorial chirogram is a term introduced by Temenuzhka Dimova in her work to refer to the iconographic configurations of the hand. It describes the hand gesture represented through graphic means.

A classifier is a supervised machine learning model trained to make a prediction on the attribution of a class to a given object. To this end, the model is usually trained with a training dataset made of objects and their corresponding class.

Defining categories of hand gestures is rather complex, yet fundamental for the training of a machine learning model for the automated estimation of hand gestures. Attempts from past art historians have shown the different perspectives that can be taken for the analysis of hand gestures. Hands can have multiple communicative functions and various origins that are sometimes difficult to trace. The primary sketching of the conceptual landscape of hands—which borrows ontologies proposed by the art historians Wittkower, Gombrich, Barasch and Chastel, in relation to the theories of the cognitivist McNeill—does not solve our categorisation issue.

Dimova, in her work Le langage des mains dans l'art, proposes one of the most comprehensive research projects on western painted hand gestures from the 17th and 18th century. She created a lexicon representing a selection of 30 illustrated chirograms and their possible meaning. Below are the first two pages representing these chirograms

Because the collection of hands does not hold all of Dimova's chirograms and due to computational requirements, the initial 30 categories were reduced to 9. These 9 categories were manually populated with images from the collection.
Various experiments were then developed using different types of features. The first are geometric features, which use the keypoints extracted with the help of the Openpose model. The second was directly using the cropped images of the hands. With the latter, the aim was to see if a model could determine new salient visual features from the training process.

Despite many efforts and different experiments undertaken for the classification of hands in early modern paintings, no convincing results were achieved. The classifier never met accuracy levels that could allow us to claim the creation of a model for painted hands gestures recognition.

The reasons for this result are various:

Resource and computational limitation
Hands similarity
Keypoint features
Missing context

The interface was kept simple. You perform a five second hand gesture in front of your webcamera. The video is recorded. Each frame is then analyzed in the background to extract similar hand poses from the collection of paintings. The results are then displayed with a .gif animation that aims at reproducing the original gesture with painted hands.

Early modern painted hands come in all sorts of shapes and orientations.

Experiencing painted hands

How is it possible to access a specific hand gesture in a collection of more than five thousand painted hands? The possibility to search for a painted hand pose through the one of the researcher is made possible with the Gesture for Artworks Browsing (GAB) application. The application offers a new consideration on the use of the body of the researcher to question the content of paintings and the potential embodied knowledge.

A simple unsupervied k-nearest neighbor model (k-NN) was fitted on the collection of hands with features extracted from the 21 keypoints detected by the OpenPose model. The MediaPipe framework was used for live hand detection from the Webcamera. For each frame, MediaPipe provides the same keypoints annotation. The retrieval of similar painted hand pose is then performed with the pre-trained k-NN model. It produces a list of the five closest painted hands in the feature space.

“The understanding of art relies on experience” [Koering reffering to Leo Steinberg
The new system offers to think in paintings, extending the classic art historical approach of thinking about paintings [Elkins, 1999] what is a body and the possible gestures it allows, what is the perception of a painter on the body and the possible personal and contextual influence regarding its representation. Subsequently, it is the use of an embodied knowledge that is in play here. It implies a post cognitivist conception of cognition, which claims that knowledge comes from a direct interaction between the body and the world, and that human intelligence resides at their intersection[Penny, 2017]. The body becomes a research tool that has the potential to offer a new understanding of painted hand gestures.

Creating new categories of hands

Clustering hands

Flipping left hands

Adding symbolic categories

A cluster is obtained from clustering methods that use specific features to group hands according to similarity criteria. The goal is to create new categories of hands based on postural similarity. A total of 40 clusters were defined with this method, which corresponds to 40 different hand poses or hand types.

This is a projection of a selection of hands in a 2D space based on their keypoint features. The color surrounding the hands corresponds to their cluster defined by an unsupervised clustering method.

The two images bellow do not represent the same hand gesture. Yet, they both have the same orientation and fingers position. Because they share similar features, they are automatically clustered together. After flipping left hands, they are dispatched into different clusters, in closer adequation to their palm orientation.

Similarily, the second set of hands bellow represents the same gesture. Once left hands are flipped, they are automatically clustered together.

As observed with various experiments on the prediction of iconographic chirograms, computational approaches have the potential to cover a larger amount of hands than traditional methods in art history. Here we explore the potential of computational methods to take into consideration all the hands from the collection and create new categories. To this end we:

Apply clustering methods on the collection
Flip the left hands
Add symbolic categories

As we can see in the graph below, the distribution of the left and right hands among each cluster shows that they are separable. It means that hand types are not equally represented within every cluster. After a visual inspection of the clusters, we determined that, although left and right hands with a similar hand shape and orientation are found in the same cluster, they do not represent the same gestural patterns. Indeed, the orientation of the hands plays an important role in the definition of the clusters, but where a right hand on chest presents similar visual features as a left speaking hand, their meaning significantly differs and should not belong to the same cluster if the gestural pattern is taken into account.
To create a proper clustering according to gestural patterns, left and right hands were split, and the original keypoints of the left hand were then horizontally flipped. The features were then computed again for the left hands and merged with the untouched right hand keypoints features. If you click on the graph, you can reveal the new distribution with flipped hands. It shows much more balanced clusters for each hand type, and their visual examination confirmed a gestural coherence among the clusters. We therefore differentiate gestural patterns, obtained by flipping the left hands, in confrontation to geometric patterns. By geometric patterns we understand the general orientation and shape of the hands independently from the hand type, and which aim to reflect on the visual effect produced by these hands for the composition.

Tap!

Using unsupervised clustering methods allows us to shape new categories of hand poses. Yet, the categories previously shaped by Dimova are an essential foundation to the analysis of early modern European painted hands. Integrating these categories in the clustering process of the machine shows the important relation between art historical theories and computational methods. Through close collaboration, we enhance both approaches and foster a wider understanding of hand poses. It also shows the possibility to finetune the computational process with new or existing knowledge on hands.

Understanding gestural patterns as a network

Combining hands

Working with clusters

Iconographic attribution

Creating a network

Using the network

Computational methods allow us to consider the whole dataset of hands extracted from the collection of paintings. We were able to cluster them, including categories defined by the art historian Temenushka Dimova. These different clusters represent different hand poses that are used as modes of visual communication by the painter, either for the benefit of the composition or the narrative.

group them according to the similarity of their keypoint features
combine them based on the iconographic context
and define new visualizations to understand the relation between combinations of hands and specific narratives

The whole process can be divided into different steps, each revealing new concerns, both methodological and regarding our understanding of early modern painted hands.

Based on these hand combinations, we have the possibility to take some distance from the detail and have a more global understanding of the gestural patterns in play. Each hand was attributed with a specific cluster, either one from previous works in art history or one generated through computational methods.

The process allows us to study—among others—most common hand combinations. As we can see in the image above, the second most common combination reflects on the great majority of religious representations found in the collection of the Bibliotheca Hertziana. The dominance of Joint palms praying also reflects its manifest use for religious expression. It is typically this type of observations made at the level of the pair of hands that reinforce the idea of a correlation between the type of hand and the iconography of the painting.

The iconographic attribution is based on the ICONCLASS system. Each painting is attributed an iconography from the system based on keywords extracted from the title.

The requests to the system to find corresponding iconographies are performed in an automated way with the help of a script.

For example, the painting Madonna with Child corresponds to the iconography 11F The Virgin Mary.

A web interface was then created. It displays all the different networks for each iconography found in the collection of paintings. The goal of the network visualisation is to offer the possibility to the user to select an iconography, to explore what kind of hand poses are most commonly used and their combinations.

The hand types are represented by vertices. Each vertex holds an image of the most central hand pose in the cluster. The edges and their width correspond to the number of times a pair of clusters was found for the iconography. The color of the edge refers to the iconography and is useful to distinguish iconographies when multiple networks are displayed at the same time.

Creating combinaisons of hands means that for each pair of hands found on the same painting, they are considered as a combinaison. Because these hands were attributed with specific clusters, we assume that a specific combination of clusters can be determined for each painting.