From Text to Pictures

The researchers are developing new theoretical models and technology to automatically convert descriptive text into 3D scenes representing the text's meaning. They do this via the Scenario-Based Lexical Knowledge Resource (SBLR), a resource they are creating from existing sources (PropBank, WordNet, FrameNet) and from automated mining of Wikipedia and other un-annotated text. In addition to predicate-argument structure and semantic roles, the SBLR includes necessary roles, typical role fillers, contextual elements, and activity poses which enables analysis of input sentences at a deep level and assembly of appropriate elements from libraries of 3D objects to depict the fuller scene implied by a sentence. For example, Terry ate breakfast does not tell us where (kitchen, dining room, restaurant) or what he ate (cereal, doughnut, or rice, umeboshi, and natto). These elements must be supplied from knowledge about typical role fillers appropriate for the information that is specified in the input. Note that the SBLR has a component that varies by cultural context.

Textually-generated 3D scenes will have a profound, paradigm-shifting effect in human computer interaction, giving people unskilled in graphical design the ability to directly express intentions and constraints in natural language -- bypassing standard low-level direct-manipulation techniques. This research will open up the world of 3D scene creation to a much larger group of people and a much wider set of applications. In particular, the research will target middle-school age students who need to improve their communicative skills, including those whose first language is not English or who have learning difficulties: a field study in a New York after-school program will test whether use of the system can improve literacy skills. The technology also has the potential for interesting a more diverse population in computer science at an early age, as interactions with K-12 teachers have indicated.

Funding source

NSF IIS