Most software today must support both novice and expert users. You can let novices discover available features through the use of visible clues such as drop down menus that list all potential actions, or as Microsoft's Ribbon tries to accomplish, you can try to show the most relevant actions all at once depending on what context the user is in. Supporting novices is all about showing them visibly what actions and features are possible. Expert users on the other hand can quickly tire and become frustrated with having to use these same scaffolds that help novice users so much. If an expert user already knows a specific action to take, even just a few extra mouse movements and clicks for each simple action will start to slow them down. This is why we usually offer keyboard shortcuts for known actions, covering simple actions like copy and paste, all of the way up to the most niche actions that you'll find in software such as Adobe Photoshop. So how can you make it clear to users what is possible? A great place to start is to look at how people interact with objects in the real world, and the idea of affordances (Norman, 1990; Gibson, 1979).
In the real world, it's clear what we can do with many physical objects. We don't think twice about picking up a book, flipping through the book's pages, or handing the book to someone else. We can pick up a photograph, turn it over to see if anything is written on the back to explain where or when it was taken, or who is in the picture. Interacting with these real world objects just seems natural. One of the goals of gesture-based systems is to leverage this natural feeling of interacting with real world objects in how we interact with digital objects; this is why these types of systems are sometimes referred to as Natural User Interfaces (NUIs). Therefore rather than trying to come up with new complicated ways to interact with digital objects, your first goal should be to try to leverage how people already interact with objects and each other when designing gesture based systems.
With traditional, mouse-based systems, how can people tell which items on a computer screen are clickable, or how would you tell the difference between regular text and a link on a web page? Affordances are the characteristics of objects (both real and digital) that provide visual clues about what interactions are possible with the objects. For example, when you walk towards the front door of a building you've never been to, and you reach out to open the door, how do you know whether to push or pull on the door. Even if there is a "push" or "pull" sign on the door, you probably don't really think about it, you just interact with the door. Chances are the door has visual clues, or affordances that you pick up on. If the door has any sort of physical handle that you could place your hand around, you will likely grab that handle and pull. Likewise if the door has a flat piece of metal at about arm height that looks different from the rest of the door, you will likely place your hand against this area and push. You may not notice these types of affordances of everyday objects, but you interact with them all day.
When designing software interfaces, it is therefore important to design objects so that they give off affordances that will make interacting with the software as simple as walking up to and opening a door. In the same way that you shouldn't have to spend time trying to understand how to open a door, your users shouldn't have to waste time trying to understand which objects are clickable, which objects allow text to be entered, and which objects simply cannot be interacted with.
I have spent time introducing the concept of affordances because one of the key challenges when moving to gesture-based systems is offering affordances that will sufficiently guide users. Showing what's possible becomes more difficult in the gesture world. The fact that something is visible goes a long way in letting you know that you can interact with it. You see something, you may touch it. But now imagine the case where you provide pictures that can be moved with just one finger, yet two or more fingers can be used to shrink or stretch the picture. With gesture and touch based interactions, it is best to make possible actions as explicit as possible. Try to get people engaged by making the interface seem inviting, fun, and playful. I will discuss the idea of engagement further in the following section, but for now just understand that a great way for people to learn how to use your system is to get them to touch it and to want to keep touching it. All it takes is for a user to accidentally touch a picture with multiple fingers and see it change size to understand that such things are possible. After one such unexpected, and hopefully fun encounter, users will likely become more comfortable with just trying things out to see what is possible.
There are some ways to make possible actions much more explicit. "Phicons" are physical icons that can be used to show possible actions. Phicons are really just small physical objects (think chess or checkers piece sized) that can be identified by a system through the use of tags that cameras can pick up and identify. Microsoft Surface uses these types of tags that can be recognizied by its five cameras that sit below the screen. Phicons can be kept off of the screen when not in use, serving a similar purpose as menus (keeping functionality out of the way), but without taking up screen real estate. Moreover, it becomes difficult to find an appropriate location for menus in order to support multiple concurrent users each with their own physical orientation to the screen. You can put images or text on phicons to give users information about what action or idea they represent.
For example, imagine a round magnifying glass phicon, maybe four or five inches in diameter that can simply be placed over a section of the screen to magnify it. The zoom amount can be increased or decreased by rotating it. Compare this to trying to look for or remember where the zoom function is hidden in a menu. This example introduces subtle yet important challenges with multi-user systems. In many multi-user systems, users will typically be positioned in such a way that they each have different orientations to the screen. Therefore, if instead of a phicon for magnification we wanted to reuse a traditional zoom function, where would we put it; what would be right side up for one user, would be upside down or sideways for others. In addition, if a user decides to increase the zoom, it would impact what all users are looking at. Using the phicon approach, users can choose to magnify only a specific area of content, the area surrounded by the magnification phicon.
Along with usefulness and usability, we always strive for desirability in designing good user experiences. In traditional systems design, visual aesthetics plays a big role in increasing the desirability of products, as well as contributing positively to usability when applied properly. In fact, studies have shown that systems that look more aesthetically pleasing are assumed to be more usable. When systems that differed mainly in aesthetics (with functionality and usability the same) were evaluated by users, aesthetics impacted how users rated the perceived usability. Another reason to design engaging experiences for your users is that the experience can have residual affect that lasts well beyond the interactions with your system. Emotional attachment, whether positive or negative that is created during interactions with a product transfers to how people feel about that product's brand.
With gesture-based systems we of course continue to include desirability as a major goal, but going beyond just visual aesthetics, something that we can call "interaction aesthetics" becomes even more important than with traditional systems. If visual aesthetics have to do with how we feel when we visually perceive something, interaction aesthetics have to do with our perception of how it feels when we interact with something. Interaction aesthetics are all about how it feels to use a product during an activity over time, including very subtle aspects of the interactions that we may not even notice consciously.
The ideal type of interaction is what philosopher Martin Heidegger called "ready-to-hand," which is when we don't even notice a tool that we are using, and in some ways the tool becomes an extension of our own body or self. For example, when using a hammer, once you get into the flow of hammering you think about hitting a nail more than you think about the hammer itself. To contrast the good engagement of "ready-to-hand," Heidegger used the term "present-at-hand" for the case where you are unfortunately too aware of the tool you wish to use. For example, when in the flow of writing a paper using a word processor on a computer, all of a sudden you flow is interrupted by the software asking you if you are sure you really want to do an action that you've already specified you want to do. Your goal as a designer is to keep people engaged in a seamless flow of activity, or as Martin Heidegger would put it, you want your gesture based system to always seem "ready-to-hand."
The big challenge for you then is to never force users to look for hidden features, actions, or interactions because this interrupts the flow of activity and therefore decreases engagement. Possible Interactions should be perceivable just by looking at the surface features and affordances of both digital (on screen) and physical objects (digital cameras and phicons) that make up your system.