Minds in 3D: AI is Seeing the Unseen with Spatial Intelligence

6 min readMay 16, 2024

I’ve always been fascinated by the 3D world, how we perceive it, and how machines might one day do the same. This curiosity led me to explore Spatial AI, a field that allows AI to “see” in 3D! This article begins with a basic understanding of spatial intelligence, data, and AI. Further, it explains the concept of AI depth fusion, highlighting how machines perceive depth and interact with real environments. Spatial AI empowers enchanting real-world examples from robotics, navigation, and urban planning, showing its significant impact.

Looking ahead, the article elaborates on the future potential of spatial AI to further transform how machines understand and engage with the world around them. This work highlights the growing importance of spatial intelligence in developing advanced AI applications.

Spatial Intelligence — One of the Eight Intelligences

In 1999, Howard Gardner, a research professor of Cognition and Education at Harvard University and developmental psychologist, identified eight intelligences: linguistic, logical-mathematical, musical, spatial, bodily/kinesthetic, interpersonal, intrapersonal, and naturalistic.

Spatial intelligence, named for its focus on space, is the ability to visualize and manipulate objects in three dimensions. In simpler terms, it’s our brain’s superpower for understanding and reasoning about the world around us — not just in terms of length and width but also depth. It involves skills like mental rotation (visualizing objects from different angles), spatial awareness (understanding your position about surroundings), and interpreting spatial relationships between objects.

We use spatial intelligence when imagining a route to an address and mentally rotating landmarks to guide our way, designing house architecture and envisioning the layout of rooms and furniture placement, or maneuvering the car into a tight spot, analyzing its size and position relative to the parking space.

Sample item of the type used in the Paper Folding task and sample item from the Vandenberg Mental Rotation Test. In the Paper Folding Test, the diagrams on the left show a piece of paper being folded and a hole being punched in the paper. The task is to say which of the five diagrams on the right shows how the paper will look when it is unfolded. In the Vandenberg Mental Rotation Test, the task is to determine which two of the four figures on the right are rotations of the figure on the left. source

People with high spatial intelligence are like natural-born visual thinkers. They can use visual information, whether a real-world scene or an abstract concept, to solve problems, design solutions, and even imagine new possibilities. Spatial intelligence is a superpower for thinking in three dimensions and beyond.

Spatial Data

Spatial data refers to any information that describes something’s location and relationship in space (not just limited to Earth) as it includes:

Latitude and longitude coordinates
Addresses and postal codes
Aerial and satellite imagery.
Elevation and terrain data
Land cover and land use data
Transportation networks like road, rail, and public transit networks
Points of interest (POIs), such as restaurants, hotels, stores, or tourist attractions
Climate and weather data such as temperature, precipitation, and other meteorological variables, either as historical data or real-time observations
Demographic information about the population, age, income, and other socioeconomic characteristics aggregated for specific geographic areas
Natural resources and hazards such as location and extent of natural resources, such as water bodies, minerals, or oil reserves, as well as the occurrence and risk of natural hazards like earthquakes, floods, or wildfires
Even social media posts with location tags

AI Depth Fusion and its Connection with Spatial Intelligence

AI depth fusion is a technique that combines the power of Artificial Intelligence (AI) with depth information to create a richer understanding of the environment. It combines visual data, “what the AI sees,” with depth information, “how far away things are.” That creates a more complete picture of the environment, allowing the AI to understand the scene’s 3D structure.

AI depth fusion provides the data (3D understanding) that fuels spatial intelligence in AI systems. By analyzing the fused data (image + depth), the AI can reason about:

Object size and location in 3D space
Distances and relationships between objects
The overall layout and structure of the environment

Example: Imagine an AI trying to navigate a room. Visual data alone might tell there’s a chair and a table, but depth fusion helps it understand the chair’s height, distance from the table, and how much space there is to maneuver around it.

Spatial AI

While traditional AI often excels at tasks like image recognition or text analysis, focusing on the “what” in an image or document, spatial intelligence in AI goes beyond just recognizing objects. It allows AI to understand the “where” and “how” of objects — their location, size, distance, and how they interact with each other in 3D space.

Unlike traditional AI that tries to understand everything in a scene, Spatial AI focuses on what matters most. It grabs the key details and builds a picture of the world in real-time!

Spatial AI brain: an imagining of how the representation and processing graph structures of a general Spatial AI system might map to a graph processor. The key elements we identify are the real-time processing loop, the graph-based map store, and blocks which interface with sensors and output actuators. Note that we envision additional ‘close to the sensor’ processing built into visual sensors, aiming to reduce the data bandwidth (eventually in two directions) between the main processor and cameras, which will generally be located some distance away. Source

Four components of Spatial AI

Spatial data
Spatial intelligence for AI systems to interpret, navigate, and manipulate objects of the physical world
AI algorithms such as machine learning and deep learning techniques specifically designed to handle spatial data and extract insights.
Data fusion techniques to combine visual data (images, videos) with depth data (LiDAR) to create a richer 3D understanding.

The interesting question is what can be achieved with such a powerful mix?

Spatial AI real-world applications

Autonomous Vehicles: Spatial AI is crucial for navigating and operating self-driving cars, drones, and other autonomous vehicles. It enables these vehicles to understand and navigate their environment by detecting objects, predicting the movements of different vehicles and pedestrians, and planning safe routes.
Robotics: In industrial and consumer robotics, spatial AI helps robots to move and operate efficiently within space. It includes assembling products, navigating through warehouses, or providing services in homes and offices.
Augmented and Virtual Reality (AR/VR): Spatial AI enhances AR and VR experiences by allowing devices to interact more intuitively with the user’s environment. It includes placing virtual objects in real-world spaces, enhancing gaming experiences, or providing interactive educational tools.
Geographic Information Systems (GIS): Spatial AI analyzes spatial data, enhances map-based analytics, and optimizes logistics and planning. It provides insights derived from analyzing large sets of geographical data to help in urban planning, environmental monitoring, and disaster management.
Healthcare: In healthcare, spatial AI can assist with navigation during surgical procedures, particularly those involving implants or the precise removal of tissues. It helps create and use 3D models of organs or other body parts to plan surgeries or train medical professionals.
Smart Cities: Spatial AI contributes to the development of smart cities through applications in traffic management, public safety, and infrastructure planning. It helps analyze the flow of people and vehicles to optimize traffic lights, public transport schedules, and emergency response strategies.
Retail and Inventory Management: Spatial AI helps manage store layouts by analyzing customer movement and optimizing product placements. It also enhances inventory management through automated warehouses where robots efficiently stock and retrieve products.
Agriculture: In agriculture, spatial AI enables precision farming techniques such as monitoring crop health, predicting yields, and efficiently applying resources like water and fertilizers based on the spatial variation across a field.

Future

The future of spatial AI is promising, with numerous advancements and applications expected to emerge. Some of the key trends and areas of development in spatial AI include more immersive and realistic AR and VR experiences, more effective and safer drones, self-driving cars, and robots, better energy management, improved environmental monitoring, urban planning, and disaster response.

As spatial AI continues to advance and be integrated into various aspects of our lives, it is critical to address the ethical and privacy concerns associated with collecting, analyzing, and sharing spatial data.

References

Components of Spatial Intelligence. (n.d.). Retrieved from https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=2d3103e8de852e7346de496f4bccb3dac11699fa&utm_campaign=what-is-spatial-intelligence

Davison, A. (2018). FutureMapping: The Computational Structure of Spatial AI Systems. arXiv:1803.11288. Available at https://arxiv.org/pdf/1803.11288

Davison, A. (2018). Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark. arXiv:2401.03991. Available at https://arxiv.org/pdf/2401.03991