Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Sonifying Computer Vision for Cyclist Awareness of Oncoming Vehicles

prototype concept

a prototype concept

I research the development of new forms of multimodal (audio-visual) display systems, which serve a range of purposes from creative expression to general communication platforms. These systems embody new ways of integrating sound and image, in particular by developing new technologies for spatializing and articulating sound.

I am particularly interested in combining Computer Vision and sonification technologies in which a computational analysis of the local environment can communicate important out of sight information to a person via sonic cues. The same technology can be used to produce new media artworks, new forms of wearable safety equipment, and new kinds of large scale interactive audio-visual display environments.

Top 20 HCI books of all time (#11)

A unique aspect of my research agenda is its integration of humanistic methodologies and applying them in an expanded design context. This is consistent with what has been theorized as “third wave human computer interaction” or HCI that takes into its purview the full scope of human and cultural experience in its design considerations.

I have found phenomenology, semiotics and hermeneutics to provide invaluable intellectual resources for conceptualizing my research and rich in concepts that can be applied to these new technologies. These humanistic disciplines are especially ideal for integrating the role of imagination and the interpretation of experience in both the production and reception of new media technologies.

One of the features of new media art, which I have conceptualized as the production of ‘content systems,’ is that they can also serve as general platforms for other kinds of media than that specifically designed for the artwork. I have schematized this notion as follows:

Content Production: the creation of media for pre-existing technologies, such as film and video, or gaming platforms.

Content Systems: the creation of new technology platforms for the purpose of communication new aesthetic content.

Platforms: technology systems that can “host” a wider range of media.

New media art can be understood as the general production of content systems, or new forms of technology tied to specific aesthetic experiences. However, it is also the case that there is a general aspect of content systems that consists of the fact that since much of the content is typically audio-visual in nature, this content can be ‘swapped out’ for other audio-visual content. Thus, new media art has as one of its aspects the production of multi-purpose platforms that could conceivably host other types of content.

There is a precedent for the kind of fluid space of practice described above, which for my own research focus I have termed ‘neo-Davincian’ but which relates to the historical precedent of the new techniques and technologies of the Renaissance being equally capable of producing artwork, new kinds of representations, and new kinds of practical equipment simultaneously. It is one of my theses that computational media and design, particularly in the area known as ‘creative coding’ (coding in art and design application areas) easily partakes of this fluid practical space between creativity, representation, and practical design implementation, since the same code and the same tools and platforms easily contribute to the fluidity between aesthetic, designed and modeled artifacts.

transcending contemporary art and design binaries with Rennaisance thinking

The prototype I discuss below is organized under the heading of “CV/Son” which stands for Computer Vision and Sonification. These two disciplines are usually distinct and my research aims to bring them together for the purpose of developing a wide range of new designs and products based on the same underlying technologies and software development that integrates these two technology domains.

Throughout my investigations, methods and ideas from humanistic inquiry remain central. For example, in Technology and the Lifeworld, philosopher of technology Don Ihde identifies three kinds of relationship in the assimilation of new technologies by the self: Embodied (the bodily appropriation of an artifact), Hermeneutic (the use of an artifact to read the world) and Alterity (the artifact as introducing a layer of difference or strangeness to the experience of the world).

Don Ihde’s technical modalities

This is but one example of the several philosophical frameworks that I have found to be usefully applicable to the invention of new forms of multimodal display. The creation of robust and meaningful bridges between humanistic inquiry and the design of new technologies is central to my project.

Skull Buzzer

Skull Buzzer combines a rear mounted camera with bone conduction headphone technology built into a bike helmet to communicate oncoming traffic conditions and other surrounding visual information to the Cyclist via sonification of recognized visual objects. The current prototype is envisioned with the Raspberry Pi using OpenCV in Python as well as SoniPy.

Computer Vision

Computer vision is a multidimensional field of study that allows systems to understand and interpret visual data. It involves the collection, processing, and analysis of digital images to recognize patterns, assess situations, and make informed decisions. This technology can be likened to the human ability to perceive and interpret their visual world. However, it takes this a step further by not just seeing, but understanding and making sense of what it sees.

In the context of road safety, computer vision plays a crucial role in recognizing objects such as vehicles, cyclists, and pedestrians. It uses stereo sensors to continuously capture images of the environment, thereby creating an understandable picture for the system. Semantic segmentation and object detection are utilized to recognize objects and assign them specific features. Deep learning then instructs the system on how to behave based on the detected objects.

The application of computer vision is also extended to detecting and tracking cyclists. A vision-based framework uses a mixture model of multiple viewpoints trained by Support Vector Machines (SVM) and an extended Kalman filter (EKF) to estimate the position and velocity of the bicycle. This can be particularly challenging due to the non-rigidity of the bicycle and the changing appearance from different viewpoints.

When integrated into a cyclist’s helmet, this technology can provide valuable real-time feedback about the surrounding environment. The visual information captured by the system is translated into sonified information, a form of sound signals. This allows the cyclist to be aware of potential dangers, such as approaching vehicles or pedestrians or even other cyclists, thus enhancing their safety while on the road. The computer vision system in this scenario acts as an extra set of eyes for the cyclist, constantly monitoring the environment and relaying critical information via audio cues.

mood board of conveying environmental information to a cyclist via a pairing of computer vision and bone conduction technologies

Limitations of Small Mounted Mirrors

Small mirrors attached to a cyclist’s handlebars or glasses do offer some degree of rearward visibility, but they indeed have several limitations that make them less than ideal for ensuring complete awareness of approaching vehicles. Here are some reasons why small mirrors are not the most effective solution:

Limited field of view: Small mirrors provide a limited field of view, which means they offer only a narrow perspective of what’s happening behind the cyclist. This restricted view may not capture all approaching vehicles or provide a comprehensive understanding of the traffic situation.

Blind spots: Mirrors mounted in fixed positions may create blind spots, where certain areas behind the cyclist remain out of sight. These blind spots can be particularly problematic when multiple vehicles are approaching or when the cyclist is in a complex traffic environment.

Reliance on head position: To effectively use small mirrors, the cyclist must constantly adjust their head position to align the mirrors with their line of sight. This requirement can be distracting, as it diverts attention from the road ahead and demands extra effort from the cyclist to maintain a proper view. Moreover, frequent head movements can lead to neck strain or discomfort during long rides.

Interpretation of images: Small mirrors provide a relatively small reflection, making it challenging for the rider to interpret the images accurately. Judging the distance, speed, and intentions of approaching vehicles can be difficult with limited visual information. This can compromise the cyclist’s ability to make well-informed decisions and react promptly to potential hazards.

Environmental factors: Mirrors can be affected by environmental conditions such as rain, fog, or glare from the sun. These factors may obstruct the clarity of the reflected image or make it harder to see approaching vehicles, further reducing the effectiveness of small mirrors.

Distractions and cognitive load: Constantly checking small mirrors requires the cyclist to divide their attention between the road ahead and the mirror’s reflection. This can increase cognitive load and decrease the ability to concentrate on other crucial aspects of cycling, such as maintaining balance, anticipating traffic patterns, and responding to changing road conditions.

Types of small mountable mirrors: on the handlebars and on the rider’s glasses
A composite video feed from rear and side cameras instead of an actual rear view mirror

A computer vision system mounted on the back side of a cyclist’s helmet can utilize sonic signals to warn the rider of oncoming vehicles that may pose a risk of collision. Here’s how this system can work:

Object detection: The computer vision system uses cameras or other sensors to detect and track objects, specifically vehicles, approaching from behind the cyclist. Through image processing and analysis, it identifies and distinguishes these objects from the surrounding environment.

Distance estimation: Once the computer vision system detects an oncoming vehicle, it uses depth estimation algorithms to calculate the distance between the cyclist and the vehicle. This estimation can be based on factors such as object size, perspective, and parallax.

Risk assessment: Using the calculated distance, the system assesses the proximity of the vehicle to the cyclist. If the distance falls within a predefined threshold, indicating that the vehicle is too close and may pose a risk of collision, the system triggers an alert.

Audible cues: The computer vision system generates audible cues, such as warning beeps or spoken alerts, to notify the cyclist of the approaching vehicle. These audible cues can be transmitted through speakers or earphones integrated into the helmet, ensuring that the warnings are clearly heard by the rider.

Alert intensity: The intensity or frequency of the audible cues can be modulated to provide the cyclist with a sense of the vehicle’s proximity. For example, the alerts may increase in frequency or become more urgent as the vehicle gets closer, giving the rider a better understanding of the potential danger.

User customization: The system can allow users to customize the audible cues based on their preferences. Cyclists may have the option to choose different alert tones or adjust the volume to suit their comfort level. This customization ensures that the alerts are effective and not overly distracting.

By providing audible cues based on the analysis of oncoming vehicles, the computer vision system mounted on the cyclist’s helmet offers an additional layer of safety. It helps the cyclist maintain situational awareness and react promptly to potential collision risks from vehicles approaching from behind.

safe distance between cyclist and car
dangerous distance between cyclist and truck

Bone Conduction

Bone conduction headphones work by leveraging the natural resonance of our bones to transmit sound. Unlike traditional headphones that send sound waves through the air and into the ear canal, bone conduction headphones convert audio signals into vibrations. These vibrations are then transmitted through your skull, bypassing your eardrums completely, and directly stimulating the cochlea — the organ in your inner ear responsible for converting vibrations into electrical signals that your brain perceives as sound.

This technology offers several unique benefits, the most significant of which is that it leaves the ear canal open. This means that while you’re listening to music or a podcast, you can also hear the sounds of your environment. It provides a sort of dual audio experience, where you can enjoy your audio content without being cut off from the world around you. This is especially beneficial for those who engage in outdoor activities like running, biking, or hiking, where awareness of one’s surroundings is crucial for safety.

some preliminary research questions

Moreover, bone conduction headphones provide a solution for individuals with certain types of hearing impairment. Because these headphones bypass the eardrum, they’re a great alternative for people who have eardrum damage or other issues related to the outer or middle ear.

Despite their many advantages, it’s worth noting that bone conduction headphones can still cause hearing damage if used at excessively high volumes, and their sound quality, while improving, is generally not as rich or full as that of high-quality traditional headphones.

background on bone conduction headphones

Sonification

The interdisciplinary field of sonification involves the use of sound to convey information, data, or patterns to individuals. It focuses on the transformation of non-auditory data into audible representations, allowing users to perceive and interpret complex information through their sense of hearing. Sonification techniques aim to enhance understanding, provide alternative sensory modalities for data analysis, and support decision-making processes.

In the specific use case of translating information from a computer vision system to audible cues for a cyclist, sonification can be employed to convert visual information, such as the presence and proximity of vehicles, into a language of sounds that the cyclist can interpret while riding. Here’s a detailed explanation of the process:

Data representation: The computer vision system captures visual data from cameras or sensors and processes it to extract relevant information, such as the position, size, and distance of vehicles approaching from behind. This data is then translated into a format suitable for sonification.

Mapping data to sound parameters: Sonification involves mapping data variables to sound parameters, such as pitch, volume, duration, timbre, and spatial positioning of sounds. Each parameter represents a specific aspect of the visual information to be conveyed. For example, the proximity or distance of a vehicle could be represented by changes in pitch or volume.

Designing auditory icons or earcons: Auditory icons and earcons are audio symbols that convey specific meanings or concepts. In the context of a computer vision system for cyclists, these auditory icons or earcons can be designed to represent different types of vehicles (e.g., car, truck, bicycle) or actions (e.g., approaching, accelerating, decelerating).

Alert generation: Based on the visual information processed by the computer vision system, the sonification algorithm generates appropriate auditory cues or alerts. These alerts should convey the relevant information about approaching vehicles to the cyclist, helping them assess the potential risks and make informed decisions.

Cognitive mapping and training: To effectively understand the sonified information, the cyclist needs to develop a cognitive mapping between the auditory cues and the corresponding visual information. This mapping can be facilitated through training or practice sessions, where the cyclist learns to associate specific sounds with specific visual situations.

Iterative design and evaluation: The sonification system should undergo iterative design and evaluation processes to ensure its effectiveness and usability. User feedback and testing can help refine the mapping strategies, adjust sound parameters, and optimize the overall sonification design for the cyclist’s needs and preferences.

By translating visual information from a computer vision system into an audible language, sonification enables the cyclist to perceive and interpret critical data without relying solely on visual attention. It allows them to maintain focus on the road ahead while staying aware of approaching vehicles from behind. However, it’s essential to strike a balance between the clarity of auditory cues and the avoidance of auditory overload to ensure that the sonification system effectively communicates the visual information without overwhelming the cyclist.


Sonifying Computer Vision for Cyclist Awareness of Oncoming Vehicles was originally published in Cycle Sage on Medium, where people are continuing the conversation by highlighting and responding to this story.



This post first appeared on Making Electronic Music, Visuals And Culture, please read the originial post: here

Share the post

Sonifying Computer Vision for Cyclist Awareness of Oncoming Vehicles

×

Subscribe to Making Electronic Music, Visuals And Culture

Get updates delivered right to your inbox!

Thank you for your subscription

×