Jonathan Bordo is associate professor of cultural studies at Trent University, where he teaches aesthetic and cultural theory. His current project is a monograph entitled The Landscape without a Witness: An Essay in Modern Painting.
Einstein, 1933: "There are certain occupations, even in modern society, which entail living in isolation and do not require great physical or intellectual effort. Such occupations as the service of lighthouses and light-ships come to mind."1 Solitude, Einstein argued, would be perfect for the young scientist engaged with philosophical and mathematical problems. His own youth, we are tempted to speculate, might be thought of this way, the Bern patent office where he had earned a living seeming no more than a distant oceanic lightship. Consistent with this picture of otherworldliness, we have enshrined Einstein as the philosopher-scientist who, unmindful of the noise from his office work, rethought the foundations of his discipline and toppled the Newtonian absolutes of space and time.
Where did these images come from? Why were the people in the photos labelled this way? What sorts of politics are at work when pictures are paired with labels, and what are the implications when they are used to train technical systems?
But when we look at the training images widely used in computer-vision systems, we find a bedrock composed of shaky and skewed assumptions. For reasons that are rarely discussed within the field of computer vision, and despite all that institutions like MIT and companies like Google and Facebook have done, the project of interpreting images is a profoundly complex and relational endeavour. Images are remarkably slippery things, laden with multiple potential meanings, irresolvable questions, and contradictions. Entire subfields of philosophy, art history, and media theory are dedicated to teasing out all the nuances of the unstable relationship between images and meanings.
While WordNet attempts to organize the entire English language, ImageNet is restricted to nouns (the idea being that nouns are things that pictures can represent). In the ImageNet hierarchy, every concept is organised under one of nine top-level categories: plant, geologic formation, natural object, sport, artifact, fungus, person, animal and miscellaneous. Below these are layers of additional nested classes.
When a user uploads a picture, the application first runs a face detector to locate any faces. If it finds any, it sends them to the Caffe model for classification. The application then returns the original images with a bounding box showing the detected face and the label the classifier has assigned to the image. If no faces are detected, the application sends the entire scene to the Caffe model and returns an image with a label in the upper left corner.
The dataset itself continued the practice of collecting hundreds of thousands of images of unsuspecting people who had uploaded pictures to sites like Flickr. But the dataset contains a unique set of categories not previously seen in other face-image datasets. The IBM DiF team asks whether age, gender and skin colour are truly sufficient in generating a dataset that can ensure fairness and accuracy and concludes that even more classifications are needed. So they move into truly strange territory: including facial symmetry and skull shapes to build a complete picture of the face. The researchers claim that the use of craniofacial features is justified because it captures much more granular information about a person's face than just gender, age and skin colour alone. The paper accompanying the dataset specifically highlights prior work done to show that skin colour is itself a weak predictor of race, but this begs the question of why moving to skull shapes is appropriate.
Constantine V. Nakassis is a linguistic anthropologist with interests in language in culture; semiotics; film theory; mass media; brands; and youth culture. His regional focus is Tamil Nadu, India. He organizes the annual Chicago Tamil Forum workshop and is Chair of the Committee on Southern Asian Studies (2020-2023).
In formal terms, Visual Sociologist like Grady have plead for Visual Essays with strong narrative structures (as in Harper narrative approach mentioned before), whilst others have advocate for a more experimental approach, expressing that its value resides in its capacity of producing experiences. The narrative approach have had a bigger influence in photography and Documentary cinema because it expands the narrative conventions that frames the investigation and attempts to give life at those social factors implied in concrete lives. However, its been criticized because it can jeopardize its scientific commitment to developed a theory form valid and representative data. The answer to this quandary comes from Visual Anthropology that affirms that ethnographic Film is a particular gender capable of framing a theory when exposing the director role and by developing a filming strategy that includes social and Cultural context. Documentaries in particular is a rich field for research (for once more than photography), because it have developed numerous conventions to give material about the context (Narration, Off Voice, additional material inclusion, outside sound, etc.) To its outbreak can be added the technological development of the field, and the lower prices on the budget production making it more accessible89. 2b1af7f3a8