Crossing the Chasm to the Metaverse: Why Virtual Reality will will grow explosively in the next 3 years
The big tech news of the holiday season was the Oculus Quest 2 virtual reality headset and its blockbuster Christmas sales. Headlines like “How the metaverse won Christmas” highlighted the incredible sales of the Quest 2, which on Christmas morning took the Oculus app to the #1 spot in the iOS App Store.
More importantly than raw popularity, the Quest 2 has dramatically expanded the market for VR from the urban, “early adopter” groups to a much broader segment of the population:
For decades, virtual reality has been limited to specialized applications like flight simulation and surgeon training, and consumer VR has remained a niche medium for hardcore gamers. The Oculus Quest 2 has changed the rules of the game. Today, we are in the early stages of a tidal wave of VR adoption:
Facebook has brute-forced the consumer virtual reality market into existence by reducing the cost of immersive VR by ~95%, subsidizing the world with cheap, high quality, VR headsets that are continuously improving
VR experiences are rapidly expanding beyond gaming into entertainment, fitness, and other categories that will reach a much broader user base
Machine learning, specifically Generative AI, has become much more powerful in the last two years and is powering advances in VR hardware, software, and tooling
Social virtual worlds such as Roblox, Fortnite, and Minecraft, collectively referred to as “The Metaverse,” have created highly engaging experiences and profitable business models that are driving investment and consumer attention to VR as an experiential medium for these worlds
VR is in the early stages of “crossing the chasm” from young, early-adopter hardcore gamers to a diverse set of experiences for the mainstream market. For many years, venture capitalists were disappointed to find themselves “too early” to a VR market that only found niche adoption — very soon, they will find themselves too late.
1. Facebook (now Meta) has brute-forced the consumer VR market into existence
The holiday success of the Oculus Quest 2 has been a long time in the making.
Consistent with Mark Zuckerberg’s intentions to pivot Facebook to a VR-delivered, immersive “metaverse” company, the-company-now-known-as-Meta is investing $10 billion per year into VR1 and advancing low-cost VR hardware faster than ever before.
The Quest 2 is the fruit of this investment, and the device that has kicked off today’s mainstream era of consumer VR. The Quest 2 is revolutionary on two fronts:
The $300 price point is completely unprecedented for a device of this quality, delivering an experience that would have cost $10,000 ten years ago and $3,000 three years ago.
The system is completely standalone and self-contained, unlike other VR systems which require a connected cable to a gaming PC and two position-tracking devices set up in the room. This:
Reduces the cost of the total system from $3000+ to $300
Opens up experiences like fitness and dancing which would be awkward with a connected cable
Lets the user find any room in the house rather than being confined to a specific room — critical both for experiences like fitness that need more physical space, and for making VR experiences feel personal rather than like a “family computer”
Opens up VR to the group of people who have neither the interest nor aptitude to purchase and configure a gaming PC
Make no mistake: the Quest 2 is not a high-end headset. It does not compete with PC-based gaming headsets like the Valve Index, let alone even more specialized headsets.2 But it’s inexpensive, it’s easy to use, it’s in stock, and people love it:
There are definitely some people who report having a great first 10 minutes with the headset and then leaving it on a shelf for six months, especially those who get motion sickness in VR. However, the game developers who build for Oculus have reported strong retention numbers:
VR has arrived — we are now in the “Post Quest 2” era of virtual reality. Every year, there will be more capable, more comfortable, and cheaper headsets strapped to more faces. These will come both from Meta and from its competitors: gaming companies like Sony and Valve have products in-market, while Apple is rumored to be entering the market later this year.
2. VR is rapidly expanding beyond gaming into entertainment, fitness, social, and other categories
Currently, the consumer VR world is dominated by gaming. This isn’t surprising:
Gamers are early adopters of new hardware and have long been willing to spend money for greater immersiveness
Gaming has an established revenue model with most of the games on the Oculus store selling for one-time fees of $10-30
The Oculus userbase is currently very young and gaming-oriented
This mirrors the early history of the iPhone, whose first breakout hit was Angry Birds — long before companies like Waze, Uber, and Tinder created entirely new business models enabled by mobile devices.
But even today, VR experiences go well beyond gaming. VRChat is, in my opinion, the most interesting VR app today, and a glimpse of what the medium can be for people. On first glance, it’s disorienting — you enter a room filled with bizarre characters, all talking over each other, with no sense of what’s going on. The audio is spatially anchored, so you walk closer to find out more:
As you approach the group, you find that this isn’t just a shouting match but a cocktail party: groups of people are having individual conversations, people are mingling between groups, and the people are using this strange new medium for an age-old purpose: meeting people and making friends. To many users, these friends feel no less real in VRChat than in the physical world — I have met multiple couples in VRChat who consider themselves boyfriend and girlfriend, and spend lots of time together in the app, but have never met each other in “real life.” I spoke to a 16-year-old kid in VRChat who had left school due to bullying and extreme social anxiety, but in VRChat was a extroverted raconteur and the center of attention. He said to me, “I feel more like myself in here.”
Bigscreen VR is another social app, but focused on watching videos together. Users can share videos from any streaming service to their friends while having a conversation on the side, just like watching TV or sports at home with friends:
Supernatural is a fitness app that turns a workout from “running on a grey treadmill in your basement” to “shattering jewels with magic wands on the surface of the Moon.” Supernatural distracts you from the fact that you’re exercising, which makes it much easier to summon the willpower to actually do it. Despite Supernatural’s relatively limited capabilities, it is a breakout hit and Meta recently made a $400 million offer to acquire the company3.
In all of these applications, VR isn’t just taking a physical-world experience and transliterating it to the digital world. The VR experience is actually better on many parameters than the physical experience. As users get more comfortable with the medium, developers will create more ambitious experiences that are even less tied to the constraints of physical reality.
Virtual reality is in a similar position to the iPhone in 2009:
The low-priced headsets like Quest 2 are more impressive than anything that’s come before, but not yet “insanely great”4
Adoption hasn’t reached the point where half of your friends use a VR headset
The software experiences are just beginning to scratch the surface of the medium’s limitless possibilities
Problems (1) and (2) will be solved by Meta and its competitors. Mark Zuckerberg has decided that Meta will be the iOS of the virtual reality medium, and is spending as much money as it takes to make that happen. Perhaps in response to Apple’s app tracking changes that severely impacted Facebook’s revenue, Meta has decided that it must directly own the consumer hardware platform at any cost. Competition from Sony, Apple, Valve, and other entrants5 will only further erode prices and accelerate adoption.
The rapidly growing ecosystem will provide tailwinds to anyone building VR experiences: companies who ship great VR experiences will have an early position in a rapidly growing and improving installed base of VR headsets. The headsets will continue to improve on the five core parameters (resolution, field of view, frame rate, head tracking latency, and comfort6) while also delivering new capabilities:
Meta’s higher-end VR headset, codenamed Project Cambria, will have cameras inside the headset pointed at the user’s face, to record a user’s facial expressions and replicate them on the user’s VR avatar, enabling more expressive conversations
Controls will continue to improve and will allow a user to use their hands and body to control their virtual representation as fluently as they control their physical body
Improved haptics and feedback will allow a user to “touch” their virtual environment, feeling the walls, or objects, or the rain falling on their hands
In the smartphone market, we’re at the stage where we know that the iPhone 14 will be better than the iPhone 13, but not revolutionarily different. VR is at the “iPhone 3G” stage — each iteration will be bring major new capabilities and will unlock new experiences and markets.
3. Machine Learning is unlocking vastly more immersive VR experiences
Virtual reality has concrete obstacles to usage: you have to strap a headset to your face, the controllers to your hands, find a clear space to use it, and perhaps endure the mockery of your spouse. The payoff for this effort is the illusion of presence: an environment that can make your brain feel that you actually are in the virtual experience, rather than looking at it or consuming it as a medium. The illusion of presence is the entire raison d’etre of consumer VR.7
The nature of these illusions is that they need to be perfect to be immersive. If you’re walking down the street and you see a thousand people behaving normally and one person floating six feet above the ground, you’ll know that something is amiss. When these illusions break in VR, the user snaps out of their presence illusion and the experience is ruined.
Today, VR apps are typically limited to cartoonish levels of fidelity to avoid creating a level of immersion that cannot be maintained. To get deeper and deeper immersion, it’s simply not feasible to invest increasingly more human effort to make more immersive worlds.
Grand Theft Auto 5, for example, famous for its incredibly detailed open-world model of Los Angeles, cost $265 million to produce, more than any other video game in history. Still, it wouldn’t be immersive by VR standards: You can’t go inside most buildings, there are only a few types of people in the game speaking the same few canned phrases, and there are a limited number of ways to interact with other characters. It would take a lifetime to fully characterize the behaviors of even a single person, let alone an entire city — manual content creation will not scale to the requirements of immersive VR.
Immersion at scale requires automated content generation techniques. Generative AI is the new set of techniques that can create that content at scale.
What is Generative AI?
Generative artificial intelligence (AI), also called generative machine learning8 (ML), is a category of statistical techniques that have exploded in the last 5 years to synthesize novel data from large sets of training data. These techniques, when used to create misleading videos, were termed “deepfakes” in 2018:
We can construct a deepfake in our minds easily enough — imagine Morgan Freeman delivering the Gettysburg Address. You can do this because you know what Morgan Freeman’s voice sounds like, and can imagine how he would say words that you haven’t actually heard him say. Generative AI works in a similar way, extracting patterns from training data like “people who say ‘ice cream’ this way and ‘prank’ that way would tend to say ‘four score’ like this.” What once took a talented impressionist now needs only a trained speech model and a few seconds of Morgan Freeman’s voice.
In the last two years, generative models have become staggeringly more powerful, and can do things like create a synthetic image just from the caption (!), create a video of an ancestor from an old photograph (!), create synthetic voices for narration and video games, compose music in the style of a popular artist, and write the code that a software engineer would have written themselves.
How will Generative AI create immersive VR environments?
Synthesizing Natural and Built Environments
The world has 200 million square miles of surface area, and the city of Los Angeles, modeled at a $265 million cost in Grand Theft Auto 5, is only 500 of those square miles. Manually creating an immersive replica of the Earth, with 3-dimensional mountains, trees, cities, and buildings, would be economically and operationally infeasible.
The latest iteration of Microsoft Flight Simulator, released in 2020, uses automated techniques to stunning effect, creating a simulated replica of the entire world. An Austrian company called Blackshark.ai used 2-dimensional satellite photographs of the Earth to synthesize nearly every forest, mountain, and building (1.5 billion!) on the planet. This database is far too vast to be stored locally on the user’s device — it’s streamed from the cloud as necessary based on the user’s location and trajectory.
The experience is one that can’t really be communicated in words. You can get in a plane in MS Flight Simulator, fly to your own house, and see it in three dimensions, with the actual trees in your front yard! Try to tell which side of the video below is MS Flight Simulator and which is real life:
These techniques aren’t limited to creators with Microsoft-sized budgets. Products like Autodesk’s Recap 360 allow architects to construct 3D models from ordinary photographs, and startups like Polycam and Luma are building AI-powered next-generation photogrammetry tools that will allow anyone to create a high-fidelity 3D model with just a few phone photos.
Synthesizing Social Environments
The most important aspect of creating an immersive world is people who exist in VR in all of their richness and complexity. In a social experience like VRChat, these are real people, who need to be realized in VR with their body language, facial expressions, and other aspects of social communication. For example, any cartoonist knows that the eyebrows are an essential component of communication:
But how can a user see their friend’s eyebrows in VR, when those eyebrows are completely covered by the VR headset? Meta’s solution, which will ship with their high-end Cambria headset next year, has cameras inside the headset that see the area around a user’s mouth. Using a training set of people’s facial expressions outside VR, Meta has built a generative AI model of how a person’s eyebrows and mouth move together, and can reconstruct the likely position of your eyebrows from the position of your mouth and the tone of your voice. The result is a hyper-realistic virtual avatar that can show your entire facial expression even while you’re wearing a VR headset over your face:
For games and simulations, like a training app for a medical student to learn how to communicate difficult news to a patient, generative AI will play an even larger role. Generative voice models, like those from Replica, can create a synthetic voice for the patient, while conversational AI models can simulate what the patient would say in such a difficult situation. Generative AI will synthesize realistic gait and body language for simulated characters, and “open-world” games will start to actually resemble an open world where you can go anywhere and have a natural voice conversation with anyone.
Powering end-user creation
Generative AI will not just power VR developers but eventually put incredible creation power in the hands of VR end-users. A person inside a VR world will be able to ask for “a light yellow Craftsman-style house on a maple-lined street in suburban Philadelphia” and have it created immediately, with an architecturally matching interior, a TV in the living room to watch the Eagles game with friends who come to visit, and furniture from the user’s preferred brands (for a fee, of course). This is not science fiction — it is a medium-term extension of generative AI capabilities that exist today.
Generative AI will deliver creative superpowers for VR developers and end-users alike.
4. “The Metaverse” is drawing investment and mindshare to social virtual worlds
The buzzword of the moment is “The Metaverse” — everyone from Mark Zuckerberg to Microsoft CEO Satya Nadella to VR enthusiasts to cryptocurrency promoters are talking about their “metaverse strategy.” The “metaverse” term has taken on a life of its own, so the exact definitions can be overlapping and confusing, but the terminology has been important in highlighting the incredible business opportunities in purely digital worlds.
Gaming companies like Fortnite (Creative Mode), Minecraft, and Roblox have created large, open-ended online environments where the experience feels more like “play” than “gaming” — there is no mandatory goal, no universal winner, and people can create their own games and experiences inside these worlds instead of just playing what the game developer provides. These companies have each built userbases in the hundreds of millions and revenue streams in the billions of dollars without any VR support, and support large developer ecosystems that generate (real-life!) incomes to creators.
The definitive reading on virtual worlds, and:
Their business models and growing user engagement
Their digital economies of goods, services, virtual real estate, fashion, art, and live performances
Why they are succeeding today at a scale much greater than the earlier generation of worlds like Second Life
is Meagan Loyst’s Metaverse 101 primer:
VR is the optimal experiential medium for virtual worlds. Loyst is correct to say that there are technical barriers to delivering a massively interactive VR experience today, and that there simply aren’t enough VR headsets out there yet for VR to be a significant percentage of the Fortnite/Minecraft/Roblox userbases. But these barriers will erode quickly, and these virtual worlds have the revenue and the userbases to start pulling people to VR when the platform is ready for it. VR’s environmental and social immersion will make it the flagship delivery medium for virtual worlds.
Closing Thoughts
The future of virtual reality is bright, and approaching fast. The market still needs to deliver stickier experiences that make people return to VR regularly, the way we all frequently check our smartphones today. But with the core pieces are already in place, and with new headsets from Meta and Apple in 2023, the next three years will be very exciting.
Many people in the “pundit class” — venture capitalists, journalists, corporate executives — aren’t seeing the opportunity yet because VR is not following the typical tech adoption trends of starting in the major cities and making its way outward.
This is a novel adoption pattern for consumer technology, and an encouraging one, indicating that VR has already crossed the chasm from the early adopter crowd (who will inevitably move on to the next new thing) into the early and enthusiastic mainstream.
VR’s future has arrived.
Contact me by email: amal@dorai.org
I am interested in talking to companies who are creating novel VR experiences or who are building generative models or other tooling for immersive VR creation. If you are building the future of virtual reality, please contact me at amal@dorai.org .
Like or Retweet this article on Twitter:
This level of investment would have been essentially impossible to scale this fast through private capital markets. This means that hardware competitors outside of Meta’s trillion-dollar-peer group will need to compete in higher-end markets rather than in entry-level headsets.
VR headsets for enterprise use, such as pilot training, surgeon training, factory visualization, etc. are vastly higher quality than anything in the consumer market. These devices, from companies like Varjo and VRgineers, start at ~$5,000 and are not targeted at the consumer market. However, capabilities in these headsets like field-of-view and high resolution will come down the price curve over time.
Even this (relatively) paltry acquisition is under scrutiny from the FTC, providing some indication that Facebook will not be allowed to acquire every breakout hit on the platform. FTC Commissioner Lina Khan will not be Commissioner forever, but her overall skepticism of anticompetitive behavior by Big Tech has bipartisan support.
Unlike the iPhone in 2009, the “insanely great” VR headsets already exist — they’re just very expensive. This means that the challenges in improving the low-end experience are a matter of making existing technology cheaper, and not needing to invent non-existent technology.
Current competitors of the Oculus Quest 2 in the consumer/gaming markets include the Valve Index at the high end, the HP Reverb in the medium tier, and Sony PlayStation VR in console gaming. Apple is rumored to be entering the market as well, but it’s not yet clear whether Apple will be competing in virtual reality or exclusively in augmented reality.
Comfort is primarily a function of weight and balance, which will be a key challenge for the Oculus Quest platform, as the self-contained nature of the headset means that the CPU, battery, tracking cameras, and everything else are all attached to the user’s face:
The presence illusion has two major components as defined by Mel Slater: the place illusion — the sensory perceptions of the environment, and the plausibility illusion — the components of the environment behaving in the way that you expect them to. The reflection of the stadium lights in the freshly waxed floor before an NBA game is part of the place illusion. The basketball bouncing off that floor with the correct physics is part of the plausibility illusion.
There is a vigorous debate about the distinction, or lack thereof, between machine learning and artificial intelligence and whether the techniques described here should be called “artificial intelligence.” I do think the term “artificial intelligence” can be misleading — generative models are based on statistical mathematical operations which bear little resemblance to what we would normally call “intelligence” — but I use “Generative AI” here since that is the more common term.