Generative objects: summon AI-generated objects onto the surfaces around you#405
Generative objects: summon AI-generated objects onto the surfaces around you#405salmanmkc wants to merge 33 commits into
Conversation
A prompt becomes a placed, draggable object: GenerativeObjects.imagine() asks the AI image model to generate an image, decodes it into a texture, and drops a billboard into the scene in front of the user, occludable by real-world depth. Pure helpers (aspect-preserving scale, place-in-front pose) and the orchestration are unit-tested with mocked AI + texture source. Wired into Core/Options via enableGenerativeObjects() and exported from xrblocks.ts.
Add keyOutBackground (pure, tested) and a browser CanvasBackgroundTextureSource that decodes the generated image, keys out the plain background, and returns a CanvasTexture so the subject reads as a cutout rather than a flat card. Gated by GenerativeOptions.removeBackground (on by default); Core swaps in the canvas source when enabled.
Speak or pinch to summon an AI-generated object into your space via xb.core.generative.imagine(); generated subjects are keyed to cutouts, placed in front of you, draggable, and occluded by real depth. Voice trigger via SpeechRecognizer; pinch cycles preset prompts so it works without a mic.
Add enableGenerativeObjects() to the options list, an xb.core.generative.imagine usage snippet, and a generative/ directory-map entry.
Add @google/genai to the importmap (the demo failed to init Gemini without it) and a key-entry overlay that resolves the key from ?key= > localStorage > keys.json > a prompt, matching the world_companion/objects_3d demos.
Every pinch/click summoned a new object, which fought with grabbing an existing one. Track whether a select started on an existing generative object and, if so, let DragManager move it instead of summoning. Add a keyboard 'G' summon for desktop where dragging uses the mouse.
Add a quaternionFacingCamera helper and a GenerativeObjects.update() that turns tracked objects to face the user each frame, gated by the new GenerativeOptions.billboard flag (on by default). Keeps the flat cutout from looking paper-thin from the side. Pure helper + billboard behavior unit-tested.
Add a netblocks-styled 🎙️ push-to-talk button (speech was undiscoverable before) and move the status HUD to the top-left so it no longer collides with the simulator's settings gear.
Add an opt-in GenerativeOptions.relief that builds the object as a densely subdivided plane displaced by the generated image's brightness (three.js displacementMap + bumpMap on a lit standard material) instead of a flat cutout, giving real shaded surface relief. Approximate (brightness is not true depth) and needs a light in the scene; default off. Structure unit-tested.
Press R to switch subsequently summoned objects between flat cutout and 2.5D relief (pausing billboarding so you can orbit the relief), and add ambient + directional lights so the lit relief material shows shading.
Raycast the camera forward against the depth mesh and place the object there: stand it on horizontal surfaces, float it off vertical ones so it doesn't blend into walls, falling back to in-front-of-camera with no hit. Also opt the material into the occlusion shader (the layer alone only builds the mask) so it's hidden behind real geometry.
Add a draggable uiblocks control panel (summon/speak/relief/clear) that head-leashes to follow the user, plus a top-right on-screen button bar and a push-to-talk voice button. Summoning is now via the controls/voice only (removed click-to-spawn), enable spatial UI + the depth texture for occlusion, and use the 'flare' icon for summon.
draggable=true alone wasn't enough: DragManager.beginDragging bails when there's no draggingMode, so grabbing never started. Set draggingMode to TRANSLATING.
|
Hi Salman, Thank you for your contribution in this!!! I would like to request to switch to a demo. I won't say this is ready to be put inside the SDK (for now). We need to carefully think of the high-level picture of abstract --------> photorealistic Internally, we have a demo like this, but with better quality & confidential tech :) |
you sure the keys.json wasn't in the repo root or the wrong folder? should be relative and sure will move this to demo only, wow that's cool to know there's a better internal demo, is it sorta similar to likeness level quality? |
Per review on google#405, the generative objects feature moves out of the SDK and becomes a demo. Remove the xb.core.generative subsystem, the Options enableGenerativeObjects()/GenerativeOptions, the barrel exports for the orchestrator/object/options/texture-source, and the SKILL.md references. BackgroundKeyer (pure RGBA chroma-key) and GenerativeObjectUtils (generic billboard/face-camera math) stay in src/ as small, unit-tested helpers the demo imports; the orchestration moves to the demo in the next commit.
The generative objects orchestration now lives in demos/generative_object/src/ (GenerativeObjects, GenerativeObject, GenerativeOptions, TextureSource) built by rollup like the drone/animalattack demos, instead of the SDK. The demo owns a GenerativeObjects script and adds it via xb.add() so dependency injection still resolves AI/camera/scene/depth. Fixes carried over from the review while moving: - only wire depth occlusion when depth is actually present, so objects don't render transparent against an empty occlusion map when depth is off - prefer the depth mesh's geometric face normal for surface orientation (the per-vertex normals are not kept fresh) and update the full-resolution mesh so placement raycasts hit current geometry - a generation token so an in-flight generate that resolves after clearObjects() is discarded instead of adding a stale object - build the relief displacement map lazily and dispose every distinct texture Also splits generation into generateBillboard(image), the image-to-object half, to sketch where an SDK ai.generateBillboard(image) could sit, and loads the built src/build/main.js with the keys.json root fallback.
Two reasons a summoned object could be hard to see: - groundOnSurface raycast placed it on whatever surface was ahead, so looking across the room dropped it on a far wall, tiny and easy to miss. Cap the grounding distance (maxGroundDistance, 2 m) and fall back to in-front placement when the surface is farther. - the image prompt asked for a white background, which the background keyer then cut out along with pale subjects (a white paper airplane vanished). Ask for a saturated chroma-green background instead so the corner-sampled keyer keeps any non-green subject.
The control buttons' idle/hover fill colors (#2a2a2a -> #3a3a3a) differed by only a few percent of brightness, so hovering produced no perceptible change. Use a dark chip for idle and a clear purple for hover (with a brighter click flash), matching the agent_hands demo.
…panel The depth mesh is in the scene for occlusion and surface placement, so the reticle's whole-scene raycast also hits it; standing within ~1m of a wall makes it the closest hit and grabs hover from the control panel. No-op the depth mesh's raycast so the reticle skips it, and restore the real raycast briefly inside raycastSurface_ so object placement still grounds on the geometry. Same approach as the agent_hands demo.
…Mesh.raycast The previous fix no-op'd depthMesh.raycast (via a WeakMap save/restore in raycastSurface_). That was not exception-safe: a throw in intersectObject left the real raycast installed, permanently re-enabling wall hover, and it disabled the depth mesh for every raycaster, not just the reticle. Set depthMesh.ignoreReticleRaycast = true instead: the reticle skips it while .raycast stays intact, so raycastSurface_ places objects normally.
…ders Each generated object's occlusion shader was added to the engine-wide depth.occludableShaders set in setupOcclusion_ and never removed, so repeated summon/clear cycles grew the set unboundedly and the occlusion pass kept writing uniforms into stale shaders of disposed materials. Track a per-object cleanup that deletes the shader from the set, and run it in clearObjects().
…halo The flat billboard material was transparent with no alphaTest, so edge filtering blended the keyed-out chroma-key background color into a green fringe around the cutout. Add alphaTest so fully-keyed pixels are discarded (matching the relief material).
…e reticle ignoreReticleRaycast only excludes the depth mesh from the SDK reticle, but the spatial-UI buttons hover via their own scene raycast, so standing near a wall still stole hover. No-op depthMesh.raycast so every raycaster skips it, and restore it for raycastSurface_ in a try/finally so a throw can't leave it interactive.
adds a
GenerativeObjectsprimitive (xb.core.generative.imagine(prompt)) that turns a text/voice prompt into a placed, draggable object in your space. gemini generates an image, we key out the plain background into a clean cutout, and drop it onto the real-world surface you're looking at, occluded by depth and facing you.fills a gap: image gen existed only as a low-level call (Gemini.generate), nothing turned a prompt into a placed, interactive object. it's the runtime verb the gem/canvas can compose but can't synthesize itself, ("create a thing you pinch and drag") as a one-liner, which I believe Ruofei wanted in the past
what's in it
primitive (
src/generative/):GenerativeObjectsscript +imagine(prompt, opts)places aGenerativeObjecton the surface you're looking at. resolves null if AI is unavailable.wired into Core/Options via
enableGenerativeObjects()and exported fromxrblocks.ts. pure helpers (scale, placement, facing, background keying) are unit-tested; 45 colocated tests, full suite green, build/lint/prettier clean.demo (
demos/generative_object/):try it
serve the repo, open
demos/generative_object/index.html, paste a gemini key, hit summon (or speak). objects land on the surface you're looking at; grab to move.notes
?key=is prototyping-only, same caveat as the other AI demos.