Joe Bertolami is looking into the future.
In 2005, with the Xbox 360 a strong force in the market, the Microsoft senior software engineer is considering what’s next. He and his team are doing incubation work for the future of Xbox, figuring out what the next several years might look like, and looking at what various technology manufacturers have in the pipeline.
He’s getting, as he puts it, a “tangible peek” into the future. Meeting with companies such as Samsung and LG, Bertolami is looking not at what the companies have in stores, not at what they’ll put on shelves in six months, but where they think technology will be in two years. “And that’s if things go optimistically well,” he says. He’s seeing what these companies come up with when they shoot for the moon.
“Basically, all of these companies have their own prototyping divisions,” Bertolami says. “The people working there — kind of like our people — were very eager to push the boundary forward. You would talk to the Samsung display team, and they’d tell you everything they’re envisioning for higher-fidelity displays. But also stereo 3D, headsets, and all these things. You’d see basically varying levels of collaboration between Microsoft and these other companies where we had these relationships.”
During this period, one technology begins getting a good amount of attention within Microsoft: depth-sensing cameras, cameras that can recognize the size of a room and the objects within. It isn’t the first time Microsoft has investigated camera-based technology. A couple of years prior, for instance, Microsoft senior product planner Richard Velazquez had a 3D camera on his product roadmap for Xbox, before putting it in what he calls “the Boneyard,” with other discarded ideas.
Over the years, depth sensing had been something on the minds of a lot of people at Microsoft, jumping from an exciting new technology, to something on the backburner, to an integral part of one of Xbox’s most ambitious projects to date: Kinect, a motion-sensing peripheral for the Xbox 360, meant to eliminate the need for standard controllers. It would be a large-scale investment for Microsoft, but one that wouldn’t pan out in the long run.
Speaking to 12 people that worked on and with the peripheral, from former Microsoft employees to third-party developers, we pieced together a story that begins as a small team on a skunkworks project and balloons into a companywide effort, requiring multiple divisions, angry children, the help of waiters and waitresses, and an insistence that people keep their clothes on.
It begins in Israel, when five soldiers find themselves unemployed.
It’s 2005, and Aviad Maizels has the show Teenage Mutant Ninja Turtles on his mind. “When I was 10 years old, my kid brother was 5, and he wanted to watch an episode of Teenage Mutant Ninja Turtles,” he told Engadget in 2013. “The episode had a very specific name, and he told me to write it on an empty video cassette. I asked him, ‘Why would you want me to write it?’ He said that if I write it on the tape, that is the show the machine will play. It seems really silly now, but that’s how we want technology to work ... we want it to be magic.”
Maizels doesn’t have a job, and neither do his friends Ophir Sharon, Alex Shpunt, Dima Rais, and Tamir Berliner. They all met in the Israel Defense Forces, which Israeli men are required to serve in for at least three years after turning 18. They all have science backgrounds; four of them previously worked in research and development for the military, and all of them have degrees in mathematics, engineering, or an equivalent field. Rather than look for jobs, the friends decide to start their own company, as Maizels told Engadget, to “come up with the next big thing.”
What is the next big thing? A “technology that made technology itself disappear.”
In their early discussions, the five friends settle on the game industry as their new company’s focus, with Maizels saying they felt games had become “stagnant” and repetitive. “Finish one game, then you get into the next game, and it’s a similar story, similar actions,” Maizels recounted to Engadget. But what if you could play a game without holding a controller? What if the human body itself were the controller?
The group calls its company PrimeSense, a Tel Aviv-based startup focusing on depth-sensing technology and how it can be used to map a moving body. Its breakthrough is a camera that can map humans and objects in 3D, as well as recognize gestures for hands-free control. In March 2006, PrimeSense shows off its technology for the first time at the annual Game Developers Conference as it searches for a business partner. It leaves that show with a host of new contacts — including Microsoft, which PrimeSense keeps in contact with, setting up a second appointment with the tech giant a few months later.
Alex Kipman is in Los Angeles at the 2006 E3 conference. As he’d later recount in a Wired article, his heart is racing.
Kipman, Microsoft’s general manager of hardware incubation, is meeting PrimeSense, looking at its depth-sensing tech, seeing how it tracks the body’s movements. “He totally got it and understood what could be done with the technology,” co-founder Tamir Berliner told Wired.
A few years later, the two parties will be intrinsically tied together, when Microsoft contracts PrimeSense to help it develop Kinect. But that won’t happen overnight. In 2006, depth sensing is still a new, unproven, and — most importantly — expensive proposition for Microsoft. For the collaboration to work, a few things need to happen. First, PrimeSense’s depth sensor is both too big and too expensive; that needs to change in order to produce it at a consumer scale. “The hardware has to disappear in a way that makes it almost invisible to the user,” Raghu Murthi, Microsoft’s former general manager of natural user interface hardware for the Xbox 360, told Wired, “so that the consumer is always interacting with the system, but not really aware of how.”
And even then, PrimeSense’s technology alone isn’t enough to ship to consumers. If Microsoft is to invest in depth-sensing cameras, it needs to develop a piece of hardware that uses them in a way people will want to buy. That, too, comes with its own host of problems. And that, too, is expensive. Kipman has his work cut out for him.
But before all that, Nintendo will release a new console that will turn the game industry on its head.
It’s November 2006, and Nintendo’s Wii is revolutionizing the game industry, becoming the fastest-selling console of all time.
The Wii sold the dream of physically engaging with video games. It got people off the couch, standing and moving, turning an otherwise sedentary activity into something active. Its accessibility — through the motion-controlled Wii Remotes that took the place of standard controllers — made it an easily adoptable console for families and elderly people. Its library, full of sports games and family titles, appealed to a wider audience than the Mature-rated titles that littered the Xbox 360 and the PlayStation 3. It was a hit in nursing homes.
Nintendo had drawn a line in the sand. Microsoft, already considering its next steps, looks at its options for the future of Xbox. On the one hand, there’s depth-sensing technology. It’s interesting, but after some evaluation, the company decides not to go all-in on it. The technology just isn’t there yet; Microsoft isn’t able to do full motion sensing without the use of something like skeletal tracking, which would come later. “So we could kind of detect bumps and stuff. We sort of were able to extrapolate, like, ‘Yeah, these are hand positions. We can do some stuff with hand positions,’” Brian Murphy, a former creative director and game designer at Microsoft, says about the company’s early tests with depth sensing. “But it wasn’t anything like the skeletal tracking.”
“[We were] just using the depth data and the RGB camera in a more raw form,” he says.
There is still research into the technology, though. Bertolami and his team still tinker with ideas and research cameras. In summer 2006, Microsoft’s development kit for the Xbox Live Vision camera, a short-lived webcam peripheral for the Xbox 360, even ships with a demonstration program for rudimentary depth sensing and facial tracking from the California-based company GestureTek — though at the time, the company’s chief technical officer Francis MacDougall dismisses the idea of depth sensing taking down the Wii, telling Gamasutra, “It’s just a demo.”
Despite the interest, for the time being, depth sensing isn’t Xbox’s next big step.
Yet there is a growing sense within Microsoft that the company needs to have some sort of response to Nintendo’s new juggernaut. With depth-sensing technology not ready to get Microsoft’s green light, the company leans into other options.
“There were several different ways we could’ve gone,” Velazquez says.
“When controllers first came out with the dual thumbsticks and everything else, that eventually just became the standard,” Velazquez says. “So the discussion was, ‘Is this Wiimote tech device just going to be the future standard for gaming?’”
Microsoft begins looking into developing its own motion-tracking device, and experiments with a “waggle stick” or “Xbox Wiimote,” as Murphy puts it. It considers other alternative controller methods, Velazquez says, such as a concept for an “indestructible controller that you can throw or toss or bounce around.” He compares the idea to a Nerf ball, something that was “harmless if thrown around, wouldn’t break if dropped or thrown, and could be interchanged and shared with other players.” There are also ideas for motion controllers where players hold “one in one hand, one in the other, just like the PlayStation Move,” Velazquez tells Polygon.
Microsoft works with these ideas for a while, until two new employees help dig up some old ideas.
Kudo Tsunoda throws a football through the TV. The TV throws it back. It requires some fancy editing to make it look right, but he’s able to play a virtual game of catch without a controller, just his hands. He is, in essence, the controller.
Tsunoda joins Microsoft in 2008 along with Darren Bennett as general manager and creative director, respectively. They join forces with Kipman. As others experiment with what might be Xbox’s Wii equivalent, Tsunoda, Bennett, and Kipman, along with a small team, work on their own skunkworks project based around depth cameras and depth sensing. The key difference between this and the previous depth initiative is that the new effort brings machine learning into the mix, to identify a user’s skeleton with PrimeSense’s technology. They make several different demos, and, much like with the Wii before it, they pitch new ways to engage with games, ditching the standard controller in favor of a camera that can read the human body’s movements, translating that data into on-screen actions.
It’s these demos that start to change the course of Microsoft’s vision for the future. With Kipman at the head, the small team is able to show, not tell, what a device utilizing depth-sensing and skeletal tracking technologies can bring to gaming.
“He’s really great with presentations, and sort of persuasive,” Velazquez tells Polygon. “Being able to actually create these experiences for people to do it so that they can see what it could become was really the most powerful part that, I think, led to his team really getting the funding and the traction to kick this off in full gear.”
[Ed. note: Microsoft declined to participate in this story, turning down interview requests for Bennett, Kipman, Tsunoda, and others who still work at the company.]
The team also has what Bertolami calls a “shockingly practical and accurate” pipeline for where it thinks the technology will be in two years — ostensibly when it might launch. The tech is brittle, and Kipman’s team perhaps cuts a few corners for the sake of the presentation, but the group puts together the broad strokes of what the Kinect’s final algorithm and pipeline will end up shipping with.
“And that’s pretty remarkable given this was, like, late 2008, early 2009, and [they were] doing it basically just by themselves without a whole lot of help from anybody else,” Bertolami says.
The team’s demos, featuring pitches for depth sensing, voice recognition, and skeletal tracking, catch the attention of Microsoft executives. They come at a serendipitous time, too, as some in management at Xbox are starting to lean away from developing a Wii equivalent. They don’t want to make, as Bertolami puts it, a “Wii 2.”
Word comes from the top down that, despite initially walking away from it, Microsoft is going to invest in depth-sensing technology. The project is greenlit around the holiday season of 2008. The team working on motion controllers is absorbed by Tsunoda, Bennett, and Kipman’s team, and work begins in earnest in early 2009.
“[Kipman] was the person [who] saw it from the angle of Kinect,” Bertolami says. “The pitches that I was more clearly seeing were not the Kinect angle. There were other interesting angles as well, but the person who really saw that and really got it at the very earliest phase would be Alex. When he came up with that concept and started to build his team, and pitch it, then we put our heads together, because obviously there was a lot that we had learned in our exploration of depth-sensing cameras. And so that kind of all merged together. That’s why you’ll see a collection of different people’s names on a lot of the early patents, because it was a combination of a core team that he was staffing and then also people who were sort of already involved in depth camera investigations for Xbox.”
The Wii is no longer a blueprint for the company, but an inspiration, an impetus to do something bigger, better. “Instead of just having this view into what a wand is doing, how a wand is moving through a 3D space, we’re going to understand how the bones of your body [work], and how your actual body movement is actually happening,” Bertolami says.
Kinect, at the time codenamed “Natal,” named after the Brazilian city Kipman is from, is born. Over the next few years, it will spin up to become a massive project for Microsoft.
It’s 2009, and Velazquez is traveling the world, going into strangers’ homes and taking pictures of their living rooms. He has the Wii on his mind.
Velazquez is doing ethnographic research, going around the United States, as well as to Asia and Europe, looking at what accommodations people have to make to use the Wii, in an effort to understand what Microsoft will need to do to make Kinect fit comfortably in people’s homes. It’s a bit of a tricky dilemma, as the Wii and Kinect are fundamentally two different pieces of hardware — the Wii needs to sense the data coming from two Wiimote controllers, while Kinect needs to register not only a player’s body and movements, but the room they’re standing in. Compounding that issue is that every body and every room are different. How will the Kinect read both a tall father and his short daughter? What about a brightly lit living room versus a dimly lit basement? How about the size of a large living room in the Midwest — where property is comparatively cheap — versus the tiny, cramped living spaces of most New York City apartments?
“We were taking pictures of those homes and bringing [them] back to the engineering team so they could look at [them] and see all the different environments that this had to work in,” Velazquez says.
“Like, some people had their living room set up for a Wii all the time,” he says. “Other people, when they wanted to play Wii or whatever, they had to significantly adjust and move the coffee table out of the way and adjust it so far and do all this other stuff.”
One of the solutions Microsoft lands on ends up being a controversial — and very expensive — one.
“One of the biggest investments we made was just in the motor to allow the Kinect to adjust itself,” Velazquez says. “I won’t say exactly how much it was, but let’s just say it was more than a dollar. We sold 10 million Xbox Kinect units in the first  days, or something like that, so, you know, however many dollars it was, it was more than 10 million.”
From the earliest days of Kinect, one of the device’s core pillars was about immersing the player in the games, removing the controller and replacing it with the human body. That immersion is broken, Velazquez argues, if players are constantly having to adjust the device so that the camera can detect them. As he puts it, “It’s breaking the magic.”
“At the end of the day, that decision was made at the higher levels,” Velazquez says. “And ultimately, I feel, they made the right choice and they made that investment.”
Similarly, Bertolami is researching what it will take to make Tsunoda, Bennett, and Kipman’s demo something shippable in a consumer form. Bertolami’s team is tasked with looking over the early pipeline, assessing the kind of work it will take to build up a skunkworks project into something that can “ship out to tens or hundreds of millions of people,” as he puts it.
“One of the major struggles was performance,” Bertolami says. “Early on, we knew that you could deliver a very good experience if you had significantly more performance than what the Xbox 360 would have in it, or would have to spare, rather.”
It’s a problem the team faces early on, but it remains a persistent problem during Kinect’s development. Over the course of approximately 22 months, Microsoft’s product team makes adjustments to Kinect, refining and optimizing it until it delivers the same experience and quality of tracking with only a fraction of the initial performance needed. That work requires adjusting the camera’s resolution, working with Microsoft Research and game developers to assess how the technology is performing in-game, deciding what it can run on the GPU and CPU, and “making lots and lots of adjustments and tunings.”
On the software side, developers within Microsoft are pitching ideas to capitalize on Kinect’s motion controls, while also trying to make the peripheral accessible to the whole family. Internally, people working on the Kinect abide by a set of principles, such as the Kinect should, as one former development lead speaking anonymously says, “be as fun to watch as it is to play. It should be easy to start. It should work for everyone.”
“It’s all about broadening the market,” he adds. “The original pitch for Kinect Adventures — it’s a game that you play with your family. Get the family together. Get together Mom and Dad and kids, and have an adventure, compete against each other.”
Kinect Adventures, developed by a team of approximately 40 people at Microsoft Studios subsidiary Good Science Studio, is made as a pack-in title for Kinect. It features five different game types, including Reflex Ridge, where players race along a moving platform avoiding, ducking from, or jumping over obstacles along a path. It is inspired, the lead says, by “Brain Wall,” a segment on the Japanese game show The Tunnels’ Thanks to Everyone, where contestants try to quickly position their bodies in ridiculous poses to fit through similarly shaped holes in a moving wall coming toward them. The game is also a good chance to test the in-development Kinect on video games.
“I mean, we did a lot of work with the development of the platform itself,” the lead says. “When I first [joined] the team, it was a lot of work on partnering and working with the different skeletal tracking systems that were being developed to try to actually run the core of the system.”
“The research itself was evolving as we were building the product,” Bertolami says. “This was probably one of the first products I worked on where research was so heavily involved with the product effort.”
In Redmond, Washington, where Microsoft is based, group program manager Richard Irving is brought on board to help ensure the Kinect will be a commercial success.
From the jump, Kinect is an ambitious project. The device is to combine depth technology, skeletal tracking, and voice recognition, all into one mass-produced product that needs to hit a standard retail price for a video game peripheral — before a lot of this technology has been produced at such a scale. So, Microsoft pulls Irving and a group of other leaders from their current projects and places them on the Kinect initiative to figure it out.
“At the time, we didn’t know what it was going to take to actually deliver Kinect,” Irving says. “We just knew that it was going to be big and impactful and important.”
As it turns out, it’s going to take a lot of hands on deck — thousands of hands to solve trillions of problems. When Microsoft begins to dig in to all the variables its new product will encounter in homes around the world, it becomes evident that there is no easy solution.
“We did the math at one point, and if you were to try and test all of the possible permutations, it was over a trillion test cases that you would’ve had to run manually in a lab somewhere,” Irving says. “There just wasn’t enough time in computing history to actually have tested all of those cases in a lab, and so the question was, How do we get all of these scenarios covered in the time frame that we need to?”
It isn’t just about living room sizes and human height. Microsoft needs to test for as many ethnicities, accents, dialects, skin tones, hair types, clothing types, and other variables as possible to get Kinect out the door. The solution is the biggest “take-home program” Microsoft has ever done up until this point, Irving says. Employees at the company at large, not just in the Xbox division, are invited to take home and test out pre-release hardware. It is QA testing on a mass scale.
Letting employees take Kinects home helps expose the in-development hardware to numerous variables it might not easily encounter at Microsoft. A former lead at Microsoft Studios, speaking to Polygon anonymously because of nondisclosure agreements he signed while with the company, says the Kinect had trouble sensing his wife, a tall Asian woman. It wasn’t necessarily a skin tone problem, he says, but rather a body type problem. “A lot of the early skeletal data was from volunteers at Microsoft — you know, a lot of white guys.”
“We had bugs where pregnant women at different phases would get different sets of results tracking, because the [machine learning] didn’t know what to do with the bump on their stomach because we didn’t have data for it,” he continues. “Voice is one that the data set [...] worked way better if you spoke U.S. English than it did if you had an accent.”
“One of the funnier stories is, we had to remind people that, ‘Hey, we are actually looking at these images,’” Velazquez says about the take-home testing. “‘So, just remember that. Don’t play in front of your Kinect naked.’”
The Kinect is no longer just an Xbox project — it is a Microsoft project. Nongaming divisions of the company, such as Microsoft Research and Windows, are brought on board to help out. The Bing team plays a significant role in bringing the Kinect’s speech recognition and natural language processing online. Meanwhile, there is a short list of Microsoft companies and divisions that can’t be bothered with Kinect, based on the scale of what they are working on.
Much like how it had to come up with new ways to test software, Microsoft has to come up with new ways to manufacture its new peripheral at scale; PrimeSense’s technology has never been manufactured at the scale of a consumer electronics device before. Doing so requires inventing new manufacturing processes, techniques, and equipment. To test this new pipeline, Microsoft builds a facility that allows the company to invent and refine the processes needed to get Kinect made.
“There was a point in time in the development life cycle where all of the hardware was being built at that small-scale manufacturing facility,” Irving says, “and they were using that process to test the manufacturability of Kinect.”
The Kinect having so many people on board is exciting, people interviewed for this story say. And it speaks to the company’s wider ambitions for Kinect.
“From Microsoft’s perspective, it wasn’t just about video games. Right? It was about the future of computing,” Irving says. “Which, if Microsoft is really going to bet its resources on something like Kinect — gaming is a great business, but Microsoft is so much bigger than gaming. When you look at what Microsoft cared about [with] Kinect, they really cared about the future of computing.”
Steven Spielberg is standing on stage next to Don Mattrick at the Galen Center in Los Angeles. The storied director is here to help show Kinect — using the codename “Natal” — to the world at Microsoft’s 2009 E3 press conference.
“Two months ago, Don shared with me the Natal experience, and the gamer in me went out of my mind when I got to really be interactive with this,” Spielberg tells the crowd. “I think more dramatically, I felt like I was present for [a] historic moment. A moment as significant as the transformation from the square-shaped movie screen to CinemaScope, and then to IMAX. So as a creator I could suddenly envision a new way of personalizing gameplay, the gameplay experience, making it possible to even change the paradigm of storytelling and of social interaction.
“I think what Microsoft is doing [with the Kinect] is not about reinventing the wheel. It’s about no wheel at all.”
Spielberg’s script hits all the Kinect talking points, specifically highlighting how the device will appeal to people who are otherwise too intimidated by standard controllers. Despite how big the game industry is in 2009, 60% of homes still don’t own a console, he says.
“And Don and I have always agreed that the only way to bring interactive entertainment to everybody is to make the technology invisible,” Spielberg says. “Only then can we shine the spotlight where it belongs, which is on you and the fun you can have with a technology that recognizes not only your thumbs and your wrist, but your entire being.”
All in all, Microsoft spends around 25 minutes of its press briefing talking about Kinect. It shows off a game where players can use their hands to paint on a digital canvas, and a soccer experience where players wave their hands to deflect incoming soccer balls. The show closes with a demo for Milo & Kate, the next game from storied developer Peter Molyneux — a project, the developer says, that will be “a landmark in computer entertainment.” It will end up never coming out.
In Cambridge, Massachusetts, Kinect comes along at a near-perfect time for Harmonix CEO Alex Rigopulos.
In 2009, the smash success of Harmonix’s Rock Band series is declining. The developer is deep in research and development, trying to figure out what its next big franchise will be. Its conclusion is a full-body dancing game. Initially, Rigopulos says, the idea is that players will strap on 3D spatial trackers to their wrists and ankles — either devices created by Harmonix, which had plenty of experience developing peripherals, or an off-the-shelf alternative. Neither option, he adds, is ideal.
“And then, out of the blue, Microsoft contacted us — we had worked with them quite a bit on the Rock Band franchise — and disclosed us very early on their plans for Kinect,” Rigopulos says. “We were just blown away, because it was the perfect technical solution for the game we wanted to build, and they were going to be manufacturing it and marketing it as a first party, which relieved us of having to do that ourselves.”
Harmonix is all-in on Kinect.
Patrick Hackett, on the other hand, is unimpressed with Kinect. It’s vaporware, he says, an “awful peripheral that will die immediately.” Drew Skillman doesn’t disagree.
“As an intelligent consumer, you can kind of check off the things that are going to do well and not. That one just didn’t feel like it was going to do well,” Hackett says about his initial impressions seeing the peripheral. “Then I got to play with it, and it was a very different story.”
It isn’t until Hackett, a senior gameplay programmer at San Francisco-based developer Double Fine Productions, sees Skillman, his co-worker and Double Fine lead technical artist, making prototypes with the Kinect that his interest is piqued. He’d never seen a depth camera before — he didn’t even know what one was — but after seeing Skillman using the Kinect not as a game peripheral but a “piece of technology,” he is all-in.
“There was someone in New York at [New York University’s Interactive Telecommunications Program] who wrote a set of libraries for processing,” Skillman says. “And there was this implementation — he, like, wrote the glue that let anyone play with the Kinect as a piece of technology. And that just totally opened so many doors and it introduced so many of us to ‘Whoa, there’s all this crazy shit you can do if you have a depth camera and skeleton tracking and it’s matched to an RGB image.’”
“It was at that point that I started to realize that, like, ‘Oh my God, these depth cameras are things that I never understood before and suddenly they’re going to be everywhere,’” Hackett says.
For developers, making games for Kinect often means fundamentally changing design philosophies, and having to change ways of thinking around simple features such as menus. On the one hand, at Harmonix, this requires a lot of iteration and experimentation, and, in the case of menus, completely starting from scratch and having to reinvent design and user interface primitives, Rigopulos says. On the other hand, it is artistically liberating for the company to get the chance to reinvent the wheel.
“It’s normally not a priority when you’re developing [a] regular console game,” Rigopulos says, “but all of the sudden you’re given new input means and you just have this new opportunity to say, ‘OK, let’s make just navigating the menus an adventure.’”
In a game using a controller and standard button press inputs, knowing whether or not someone presses the “jump” button is a relatively easy task. On Microsoft Studios’ Kinect launch title Kinect Adventure, it takes two engineering years to get it right.
The issues, similar to when QA testing the peripheral itself, come down to just how many variables Kinect might encounter when in the public’s hands, says the former lead designer. For example, if a heavier-set person is to turn rather than jump, the sensor can lose track of where it thought the user’s hip bone was. If the Kinect isn’t angled to where it can see the floor, then it can’t see whether a player’s feet leave the ground — another problem Microsoft has to compensate for. It also has to detect jumps that don’t happen, and players that can’t physically jump off the ground. Jumping proves to be a “super expensive” problem for the studio, according to the designer.
Some employees get creative when testing games on Kinect. Kristie Fisher, a game user researcher for Microsoft, invites friends — who are willing to sign nondisclosure agreements — over to QA-test the game Dance Central. Packed into her “small” apartment, they are able to test the game in a party scenario, she says. Numerous people use cardboard cutouts in place of actual humans when running skeletal tests. “We bought a bunch of cardboard standees of Gandalf, Elvis, and Darth Vader we used to test when we didn’t want to get up from our desks,” says Tim Schafer, co-founder of Double Fine. Additionally, at Microsoft, Kinects are set up in The Commons, a shopping mall on Microsoft’s campus where employees can eat, get their hair cut, or have large meetings.
“I think you had to be super creative with it,” Hackett says.
“Yeah,” Skillman echoes. “I feel like it really was at the start of this trend towards if you want to do things that are crazy and really creative, you can’t get bogged down spending days implementing the basics of the functionality of whatever the peripheral is or the sensor is, whatever. And the Kinect was perfect in that it did all of the heavy lifting and it gave you this box to play in. So all you had to do is, you just had to be willing to go apeshit inside of that box — which was very much the style of Double Fine development.”
Double Fine especially has fun when developing for Kinect. To test games, employees bring their kids into work. They take the hardware out in public, bringing still-in-development Kinect games to bars and restaurants. Sometimes, waiters and waitresses get so intrigued by what Double Fine is doing, they come over and play too. The team wants to put Kinect on the back of a truck bed and drive it around San Francisco, but the device’s infrared sensor gets blown out in sunlight, which hinders their plans. They find plenty of other ways to play with the peripheral, though.
“We did find one time that bouncing a yoga ball, it made the Kinect think it was a person in a fetal position bouncing up and down,” Hackett says. “Which was, like, terrifying when you show the debug render. It was really gross.”
Hackett and Skillman make numerous prototypes, including inappropriate ideas they are never “allowed to check in, ever.” Other less lewd ideas simply aren’t good fits for Kinect’s target demographic.
“We did all this work, which we’re still really proud of, to create a background plate based on your movements,” Skillman says. “So as you move around and play the game, we capture background pixels, which lets us do things like entirely remove your body and do all kinds of weird stuff where we take you out of the scene. [We created] a sniper experience where you get shot in the head and we can actually remove your head and make it explode. Which is, of course, not shippable.”
Microsoft’s plan is to load the Kinect with family-friendly titles at launch — which make developers like Double Fine and Harmonix great candidates to develop games for the device. Focusing on family-friendly games comes down to pinpointing what features the company wants to put front and center when revealing Kinect to the world, says Irving.
“The challenge when you have something with a product that has never been to market before — that consumers have never really seen before — the challenge that you have is figuring out what are the one or two or three things that you’re going to tell consumers that it does, and how are you going to back that up and reinforce that every step of the way?” he says.
“And so, despite all of the cool things that Kinect could do, what we decided that it should do at launch was stand-up, full-body tracking. And there [was] a set of games that lended themselves to stand-up, full-body tracking — and they were much more family-oriented.”
Irving continues, “When we did all of our market research and user testing, those were the games that were resonating most with consumers along with the capabilities of the device.”
Nintendo’s Wii also plays a big part in that decision making. As Irving points out, the Wii had been successful at two things: getting people off the couch and playing, and, more importantly, getting people who normally didn’t play games to, well, play games. It appealed to a broader audience than the standard video game console. Kinect, Microsoft hopes, will do the same thing.
“It wasn’t meant to be a hardcore gaming peripheral at all,” Velazquez says about Kinect. “It was specifically meant to bring in families and people you wouldn’t associate with an Xbox console.”
That’s how Kinect ends up with launch titles such as Kinectimals, Kinect Adventure, and Harmonix’s Dance Central. But targeting a device and games toward children means testing them with children. Which is no easy feat.
Unfinished games are, of course, unfinished, and testing them requires playing through buggy builds that don’t quite work right yet. It’s not necessarily a concept children understand, or are sympathetic to when they just want to play a video game. For Fisher, the game user researcher, working with children during the QA process feels like having to double as a preschool teacher or family counselor. “I saw some tears and some fights between siblings, and it was a little discouraging at times,” she says.
“I mean, kids are great,” a designer speaking anonymously says. “They’ll try anything, but they’ll tell you if it sucks. They don’t have any filter. They’re like, ‘This doesn’t work.’”
“There were times when you’d be demoing [games] and nothing would be happening; you would just be rendering the video from the perspective of the Kinect on a big TV,” Skillman says. “And then kids would always charge it. That was weird. Kids would just charge the TV.”
Bypassing its standard consumer isn’t a guaranteed strategy for Microsoft. Video game peripherals haven’t always had the best track record in terms of sales or quality, and the primary precedent for in-home motion gaming comes from the Wii and Sony’s EyeToy, a peripheral for the PlayStation 2 — both being pretty vastly different devices from Kinect. To get consumers on board, Microsoft needs to get the Kinect in front of people. And Velazquez needs to do some math.
In Redmond, Velazquez is running numbers. He’s surprised with the results.
To drum up excitement for Kinect, Microsoft stages a U.S. mall tour in 2010, setting up large kiosks where people can test out the device and play games. For a lot of people, it is enough to sell them on the idea of purchasing Kinect — and they are willing to pay a good amount for it.
Part of Velazquez’s job as product planner is estimating how many Kinect sensors Microsoft will sell, as well as what price they can sell for. Initially, the assumption at the company is that Kinect will launch at $100, he says, citing the prices and sales of similar products on the market. Many at Microsoft hope the device will ship at that price because it’s a “psychological sweet spot of a price point,” he adds. But, as the company gets closer to launch, it notices that the more people demo the device, the more people want the device.
“When somebody saw somebody playing the Kinect, their interest levels and purchase intent went up,” Velazquez says. “When somebody actually experienced a game for themselves, it was like another order of magnitude different. So it was distinctly higher than that.”
Microsoft raises the price of Kinect from $100 to $149 for launch, and Velazquez sees research showing it could raise the cost “much more” than even that. The only specific reason he cites why Microsoft doesn’t is the Xbox 360 Arcade Edition, which is priced at $199.99; people at the company feel it would be weird to launch a peripheral for the same cost as a full console. Irving, speaking about the device’s price change, says the Kinect is not a cheap device for Microsoft to manufacture. Raising the price helps cover the costs of the more expensive components in the device, such as the depth sensor. The company lands on the $149 price based on its need to make the Kinect a sound business, he says, as well as something consumers would be willing to pay to adopt a device that needed another piece of hardware — the Xbox 360 — to work.
But still, $149 isn’t exactly cheap at this time for a video game peripheral. The EyeToy launched for only $49. Guitar Hero 5, then part of a multibillion-dollar franchise for publisher Activision, launched for only $99 when it debuted a year earlier than Kinect. To increase the value of the Kinect in consumers’ eyes, Microsoft bundles the device with games, such as Kinect Adventures, as well as with the Xbox 360 itself.
“[We] were motivated to charge as little as possible to accelerate adoption of the Kinect from the very beginning,” Irving says in a follow-up message, “but it took a long time to really understand how much everything was going to cost to build at scale, and how we were going to bring it to market (standalone, bundled with a game, console+kinect bundled together) to figure out a price point that would meet the two requirements of being inexpensive enough for consumers to adopt and a sustainable business for Microsoft.”
“In the retail industry, it’s always easier to start off high and then [go lower]; you can never really raise the price once you launch something,” Velazquez says. “So the question is, do we do the console strategy for most consoles except maybe the Wii? It’s like, you subsidize the first few consoles — you sell them almost at a loss — just so you could build up the base and make up the money in games. That could’ve been the strategy for Kinect, but it was just showing that people really wanted this. So, that was the interesting part as well — how well of a financial success, I would say, it was.”
In September 2010, Microsoft is expecting to sell 3 million Kinect units during the holiday season, extending the then-5-year-old Xbox 360’s life span another five years.
“We’re treating the launch of Kinect as an entirely new platform launch, as almost a new generation,” Xbox product director Aaron Greenberg tells Gamasutra at the time. “For us that does extend the product life cycle.”
It doesn’t hurt that Microsoft is putting Kinect everywhere it can to increase interest. Leaning on the tagline “You Are the Controller,” meant to speak to the accessibility of the Kinect, ads are plastered on hundreds of millions of bottles of Pepsi, featured in nongaming magazines like People and InStyle, and run during TV shows such as Glee and Dancing With the Stars. Oprah Winfrey gives out free Kinects on her show. Microsoft throws a celebrity-filled launch party in Beverly Hills hosted by High School Musical star Ashley Tisdale. It holds an event with hundreds of dancers in New York City’s Times Square the night before the Kinect’s launch.
“This is about reaching new audiences and creating a spin wheel that will keep momentum going long after launch,” Xbox general manager of global marketing communications Robert Matthews says at the time in a news release about the ad campaign. “It’s about igniting consumer passion, empowering advocacy and amplifying that passion through our marketing.”
“In terms of scope and scale, Kinect is one of the most comprehensive marketing campaigns in Xbox history, as measured by the breadth of partnerships, digital and social marketing integration, and broad consumer outreach,” he adds.
Microsoft releases Kinect in North America on Nov. 4, 2010, with global releases following throughout the month. As Microsoft’s numbers predict, people really do want the device. The company sells an average of 133,000 units a day for 60 days, totaling 8 million units, winning Kinect the Guinness World Record for being the “Fastest-Selling Consumer Electronics Device.” By March 2011, Microsoft announces it has sold 10 million Kinects and 10 million retail Kinect games, calling the device “an overwhelming success.”
All the money in the world couldn’t make Kinect happen — whether that be the money Microsoft spent making and marketing it, or the money consumers spent buying it.
Kinect’s 15 launch titles didn’t exactly set the world on fire, with the highest-rated game, Dance Central, earning an 82 on Metacritic, and the lowest, Deca Sports Freedom, landing at a 26. Most of the others fell somewhere between 40 and 70, with the average score for the lineup being 57.6.
Third-party developers weren’t exactly champing at the bit to develop for Kinect, either. Though the peripheral sold well, even 10 million units paled in comparison to the more than 55 million Xbox 360s in the wild at the time. Even if a game had a 100% attachment rate on the Kinect, at best it would still only reach a fifth of the audience it might on the Xbox 360. Coupled with Kinect requiring major shifts in development philosophies, a lot of major third-party studios stayed away, instead developing solely for the Xbox 360.
And even if a third-party studio did decide to develop for Kinect, it had the device’s technical limitations to deal with. Kinect didn’t lend itself to experiences people had come to expect from Xbox 360 games, leaving some feeling as though the Kinect was a gimmick.
“When you made the decision to support the Kinect sensor, you were inherently limiting your audience,” Irving says. “If you were a big enough budget title or if you’re a studio that needs the economics to work for them, you’re going to target the largest set, the broadest configuration that you possibly can.”
Of course, Microsoft’s whole modus operandi when developing the Kinect was to target families instead of standard players. However, now a decade later, people speaking to Polygon for this piece have complicated feelings about this decision.
“I think in some ways it could have been a failure of expectations,” Fisher says. She adds that the games that did best for Kinect were always the games that worked around its technical limitations. “And those are family games, party games. I think when Kinect was originally announced, the idea was like, ‘This is the future. You’re just going to control a game with your hand. It’s going to be incredible,’ and so people were kind of picturing the full range of really complex game mechanics and the full range of really complex game experiences. And then I think there’s a broader mismatch between expectations and reality.”
“Would Kinect be more successful than it’s been if Battlefield and Call of Duty took great advantage of it? You know, maybe,” Irving says. “But that’s also IP that is incredibly well-refined for the gamepad and for the controller. The intersection of Call of Duty players and people who are good with the Xbox gamepad is incredibly high. Right? And so there are other scenarios that we imagined, shooters and other types of games, other genres of games, that [lent] themselves to older audiences, taking advantage of Kinect with speech recognition and head tracking and things like that. But I don’t feel bad that there weren’t more of those games that took advantage of it, because those games were already great with the controller.”
It was a perfect storm of issues that kept a lot of developers and players from adopting the Kinect. But that didn’t stop Microsoft from continuing support for the Kinect. It had already made a large-scale investment in developing, marketing, and releasing the device, and it continued to push the Kinect forward into its plans for the next generation of consoles.
The device was a part of Xbox’s future.
Until it wasn’t.
When improving Kinect in its next iteration, Microsoft first made some obvious choices: better speech recognition, lower latency, new depth-sensing technology, things like that. Technical improvements people had come to expect from a new generation of hardware.
“Basically, [Microsoft was pitching us on] a much-improved-on-every-axis version of Kinect in terms of the quality and reliability of the sensing data you were going to be getting off the device,” Harmonix’s Rigopulos says. “Because there was a fair amount of fudging that had to be done in the first generation of Kinect to deal with the latency or noise you would be getting off of the device. A lot of that was going to be cleaned up with the new Kinect.”
But the biggest change was how the next Kinect would be sold. The peripheral was to be packed in with every Xbox One, Microsoft announced at the console’s reveal event in May 2013. Not only that, but Kinect would need to be plugged in at all times for players to even use the console. Microsoft called the new Kinect an “essential and integrated” part of the Xbox One, saying “game and entertainment creators can build experiences that assume the availability of voice, gesture and natural sensing, leading to unrivaled ease of use, premium experiences and interactivity for [players].”
“The primary reason for including an Xbox Kinect in every Xbox One, initially, was because you want the developers to be able to say, ‘Yeah, this is available to every single person who has that console, so it’s worth my time, resources, and investment dollars to develop something with the Kinect in mind, because everyone has it,’” Velazquez says.
In August 2013, in response to consumer backlash, Microsoft walked back the news that the Xbox One required the new Kinect to be plugged in, announcing that the console would function without it. But the sensor was still bundled with every Xbox One, which launched on Nov. 22, 2013, for $499 — $100 more expensive than Sony’s PlayStation 4, released the same month. In May 2014, Microsoft walked that back, too, announcing that a Kinect-less Xbox One would be launching the next month for $399.
Part of the decision to downplay the Kinect’s role in the Xbox One came from the exact same thing that helped spark its inclusion in the first place: a lack of developer support.
Irving, who didn’t ultimately make the decision to unbundle the Kinect, but had input into it, boils it down to two reasons. One is that bundling the Kinect with every console was an expensive proposition. “That’s an OK cost to pay, assuming the lion’s share of your portfolio is taking advantage of it. That helps justify or rationalize the investment,” he says. “When they’re not, then you look at it and you have to ask some critical questions, like [...] ‘Is including this in every package or bundle really to our benefit?’”
The second reason Irving offers comes down to asking people what they’re buying the Xbox One for, whether that be “console and the Kinect and the games? Or [are they] buying it for different reasons?”
“And so ultimately what it came down to is, some game developers are adopting, some game developers aren’t. Some consumers want this, some consumers don’t,” Irving continues. “Gosh, let’s give the flexibility to every developer and to every player — every gamer — to decide, ‘Do I want the Kinect-enabled experience or not,’ and create the ability to buy [an Xbox One] without a Kinect sensor.”
“As an outside observer, it seems like one clearly sold better than the other,” says Irving, who left Microsoft in 2016.
The news was brutal, Rigopulos says.
Harmonix had been working on a new motion-controlled game called Fantasia: Music Evolved in conjunction with Disney. It featured songs from popular artists such as Lady Gaga, Queen, and Nicki Minaj. It was an expensive, ambitious project, Rigopulos says: “The entire business case for that project was premised upon the assumption that the Kinect was going to be bundled with every Xbox One.”
“I don’t begrudge them that at all,” he continues. “It was a necessary decision that they had to make to adjust to the market reality at that time, so it was a completely reasonable decision on Microsoft’s part. But it was a decision that really made it much more difficult to make that project viable.”
For a bit, Harmonix continued to prototype new ideas for Kinect, both during and after Fantasia’s development, but ultimately the company put them on the back burner until there was evidence that there was a high enough install base for Kinect on Xbox One. “Which never really materialized in a big way,” Rigopulos says.
By E3 2015, Microsoft was silent on the Kinect. In 2017, the company announced it was no longer manufacturing Kinect for Xbox One, and would cease selling the device once retailers sold out. The Xbox One S and Xbox One X both shipped without the dedicated ports for Kinect, requiring USB adapters to use it — and Microsoft discontinued those adapters in early 2018.
As far as the game industry was concerned, Kinect was dead.
In Mountain View, California, Drew Skillman keeps looking at his laptop. He continuously references Kinect videos on his Vimeo page.
Skillman and Patrick Hackett, his former Double Fine co-worker, both work for Google now; they co-created Tilt Brush, a VR app for room-scale painting. When Hackett heard Microsoft was discontinuing Kinect, he says he immediately went out and bought one and re-downloaded Double Fine’s Kinect Party. He still has it. He plays it with his kids.
For Skillman, the Kinect holds a special, and very important, place in his heart.
“It’s so funny, at least speaking for the two of us. I’m looking at my Vimeo page, and the Kinect basically shaped my entire career,” he says.
Kristie Fisher also works for Google now. Richard Velazquez is at Amazon. Richard Irving spent time at Hulu, before moving on to his newest currently unannounced venture with, per his LinkedIn, “a founding team that includes the creators of Xbox Live, Kinect for Xbox 360, and Hulu with Live TV.” Joe Bertolami works for Snap Inc., the company behind Snapchat; Alex Rigopulos is still the CEO of Harmonix; Tim Schafer is still leading Double Fine, which Microsoft acquired last year; and Brian Murphy is the co-founder of the VR and AR development studio Drifter Entertainment.
Many people speaking for this story point to the influence Kinect had on the tech industry.
“Yeah, it was almost [necessary that] someone had to go first, and that’s where everyone learned all these lessons,” Skillman says, laughing.
Even though Kinect didn’t take off in the way some people hoped, gaming in virtual reality incorporates motion control. In Rigopulos’ opinion, VR delivers on the promises of motion gaming in ways Kinect never could. “With Kinect you were still limited to a screen, a TV screen, which was this small window into a world,” he says. “The beauty of VR is you’re in the world.”
“I view [Kinect] as a material step forward, a discontinuous leap forward in the evolution of motion gaming that accomplished a lot despite its shortcomings,” Rigopulos says. “And I think [it] helped pave the way for some of the fruits of that advancement that are going to come to bear in VR.”
Amazon has sold more than 100 million Alexa devices — which use voice recognition to register user commands, like the Kinect. New iPhones come with a forward-facing camera that uses depth-sensing technology to scan a user’s face, allowing it to unlock their phones or access sensitive material such as passwords. Kinect was one of the first mass-produced depth cameras for consumer use.
Many also encounter the Kinect in their professional lives, but oftentimes not used for gaming. “After Microsoft, I was at a company called OTOY,” Bertolami says. “OTOY’s very big on camera technologies and AI and rendering. We were using Kinect at one point, because we were evaluating whether Kinect could give us a better reconstruction of actors versus some of the other stuff that OTOY was also developing. [...] Kinect is one of the least expensive ways to do that — as long as the quality level is acceptable to you.”
Kinect has been used for stroke recovery, to translate sign language, by NASA to control a robotic arm, and in the demilitarized zone between North Korea and South Korea to monitor for objects crossing the border. It might’ve died in the game industry, but Kinect has lived on through alternate uses.
In February 2019, Microsoft even announced a new Kinect, albeit without any gaming functionality. The Azure Kinect is a PC peripheral and development kit, allowing people to use its artificial intelligence to build computer vision and speech models.
“Kinect’s greatest strengths were getting you and my friends and the industry to think really, really hard and differently about the experiences they were having on their game console and on other devices,” Irving says.
“To that end, it was really a great story about how so many different disparate parts of Microsoft — who, again, wouldn’t otherwise work on video games — came together to work on this really cool thing,” he says.
“There was really no prior art to draw from or to be constrained by,” Skillman says. “Pretty much everything you did — based on our experience, at least — was the first time [people] had seen it and done it.”
“Yeah,” Hackett says. “It was weird and new.”
Special thanks: Kenneth Shepard
Correction: A previous version of this article said the Xbox 360 Arcade Edition launched at a price of $199. The console debuted at $279.99 and received a price cut one year later to $199.99. We’ve edited the article to reflect this.