Week Reviews

Week 29 2026: Image Enhancements

Leave a Comment / Web Design, Week Reviews / Justin

Accomplishments Importing Mobile Data Automated Tripod Removal Automated Furniture Removal Bonus Benefits of the Mobile Tool Adams Family Vintage Store Update Capturing photos at eye level from all angles comes with many visual challenges. How do we hide the photographer? How do we hide the camera’s stand? Through recent developments we’ve solved how to hide the photographer with layered captures. And way back in week 1 we identified how to hide the camera stand. This week expands and rethinks those workflows to integrate more holistically with the end-to-end process. Using similar processes to camera stand removal, we can expand that concept to removing furniture from the whole room. Together these changes make the functionality of the system more robust and holistic. Importing Mobile Data Import Data The mobile app allows us to capture images, remove people, and track the sequence of images collected. The hardware in the phone currently affords us an incredible preview of the work. Transferring this to a computer allows us to perform more operations on the photos, use the sequence, and reconstruct the 3D scene much faster. A new feature to export the data was added to the mobile app. This exports information like the order of images taken, which images overlap, how they overlap, and the masks drawn to cut out the parts of images with people in them. Importing this information to the computer’s application offers many benefits. It allows the computer to cut and combine the full resolution images, rather than compressed previews. It also minimizes the need to manually combine and link sections of point clouds, saving time and focus. Automated Tripod Removal Original Removed In the very first week of these updates I shared the first workflow for automating tripod removal in 360 photos. I learned a lot from that solution, such as the tools and steps required to efficiently and effectively remove tripod feet from the nadir (bottom) of a 360 photograph. This week I took that understanding and identified a new path, using updated tools and integrating directly with the overall application. This new process works faster, with less manual input, and can be triggered automatically upon image upload. The fewer steps required to go from captured photos to shared 3D experience the better. This improvement satisfies that goal, deprecating the previous self-confined process in favor of an integrated tool that can be automatically triggered. Automated Furniture Removal Original Removed Back in Week 23 I created a proof of concept to see that furniture removal could be effectively performed locally, in a reasonable time, and be used in 3D positioning and reconstruction. This week I took those concepts and began integrating them into the service. Now, after tripods are removed from the floor, the high res image can be ran through a process which selectively removes furniture and attempts to inpaint or imagine what is most likely behind it. This often leads to furniture free rooms that can later be used for reconstruction. Bonus: Benefits of the Mobile Tool During the process of bridging the mobile information and the computer tool, I reflected upon what benefit this sibling application has. At first glance it could appear superficial. A lot of time has been spent, and work done, to accomplish estimating depth and positioning 360 photos on the mobile phone. This still takes around 30s of processing time, and the visual consists of voxel cubes which differ greatly from the final result. The positioning and point clouds also differ, because this inference is only used for immediate visualization. What benefit do we get from all of this effort, maintenance, and increased processing? The mobile application satisfies 3 major concerns: Capturing the correct visual data, ensuring valid pose-to-pose connections, and reducing manual effort to connect photos. Let’s imagine some scenarios without the mobile app. For one, let’s imagine an open field with hiding spots far from the camera’s position and wireless signal. The photographer has two options. One, set a timer, run to a hiding spot, and then return to the camera. This takes extra time depending on how far the hiding spot(s) are, and how frequent they must be returned to. And this entrusts that the shot will not change over that time. If outdoors, perhaps a deer is nearby, or the sun is free from cloud cover. The scene may change during the attempt to hide. Option two would be to take two photos, and move their body between each. Then a postprocessing workflow could be performed to cut out the person from each image and link them together. Perhaps lighting or other features changed during this process that the photographer was unaware of, and postprocessing requires extra manual time. The mobile app relieves these issues by allowing the photographer to capture multiple photos immediately, and see direct feedback as to what the combined capture will look like. This saves time and ensures that it will reflect the final result. Let’s imagine another scenario where you’re capturing a home. Doorways are incredibly tricky for positioning 360 images, as the amount of overlapping data is often very small. One great current example is the 3rd floor center balcony of my Supalai Place environment. The photos on either side of the doorway were taken approximately 2 meters apart. This was consistent with other photos taken throughout the house. However, in this case the pose-to-pose alignment was very far off. The outdoor image was sharply angled, and more distant than the ground truth. This was simple to fix with the manual adjustment too; However, it was just that, manual. This consumes my time and focus. Some instances may be even too much for a manual adjustment to fix, and in those cases the photographer may have to return to the site, where features may have changed, leading to more time, effort, and money. With the mobile app, photographers can monitor their camera-to-camera poses on-site. When a pose is too far away or crooked, this visual can inform the photographer that an

Week 29 2026: Image Enhancements Read More »

Week 28 2026: Mobile Capture Improvements

Leave a Comment / Web Design, Week Reviews / Justin

Accomplishments Mobile: Voxel Empty Space Human Detection Saved Projects Bonus UX: Orthographic Transition Dynamic Location Hotspots Architecture Template HTML New improvements to the mobile capture tool improve its ability to capture and display environment details. Reducing noise from the output improves its visual representation, while saving projects allows scenes to be returned to and updated if pauses are necessary. And the new human detection system provides a workflow to automatically detect and remove humans while stitching multiple photos together. Beyond mobile, new improvements were made to the architecture and user experience. Now when looking at a diorama from a high angle the camera’s view smoothly translates from a natural perspective to a more architectural, or orthographic, flat look. Other changes were added too, like adjusting a hovered hotspots transparency, and hiding the current location hotspot of the camera. Changes were made to the architecture as well to introduce favorite icons and replace copy pasted html files with a reusable template. Mobile Improvements: Voxel Representation Human Detection & Edit The goal of mobile capture is to get on-site feedback for adjustments before returning to a workstation for processing. A sequence of unedited voxel clouds can become very noisy, as depth estimation often stretches and becomes incorrect at sharp edges and far distances. A voting process, similar to that used by the reconstruction phase, was implemented in order to swiftly remove noisy low confidence points. Human detection was also added as an improvement to the editing workflow. Using Apple’s lightweight ML models we can automatically detect humans in the live feed and captured images. These detections can be used to mask out parts of the image to be replaced by a following capture. In large open areas, hiding from the camera may take the phone out of range. This helps automate detection and hiding of people, and the photographer. Projects can now be saved and returned to later. Downsampled images, depthmaps, and position data are stored and ready to be returned to at later sessions. Bonus: UX Improvements Orthographic Perspective Hidden Hotspots New experiences have been made to the user experience (UX). These visual improvements relate to how dioramas can be viewed, and how location hotspots react to the viewer. The diorama was updated to slowly transition from a perspective, 3D depth, view of the environment, towards a more orthographic, 2D / Architectural, view of the environment. This change in perspective makes the visual appear more like a floorplan when viewed from up high. This view is often more familiar and interpretable than the angles visible when viewing in perspective 3D. Dynamic transparency for location hotspots was improved. Now when hovering over a hotspot, the 3D cursor hides and the hotspot transitions to a more solid color. This provides improved visual cues for which hotspot is to be clicked. Also, hotspots below the viewer now transition out, and in, over time. This prevents the user from re-selecting their current hotspot, causing an unnecessary transition. Architecture: Each experience begins with an HTML file. This file acts as the landing page for the experience. It loads basic things, like the favorite icon, and the experience’s root file. Most of the time these html files are very similar, with minor changes like the name of the file called, or the title of the page. To make this easier to maintain, I created a template html file. Now tours can access the same reused information, while injecting their specific information into it. This simplifies maintenance and allows for changes to one file to improve all experiences. Summary: Mobile capture represents the input, and the user’s experience represents the output. These improvements improve both and bring them closer together. Its important when capturing a user experience to identify problems and adjust to improve it as soon as possible. Changes like auto human detection and voxel noise removal make the capturer’s workflow faster and more clearly displays the idea of the final result at the initial stages. User experience improvements like shifting perspectives and dynamic hotspot transparencies affords the user more visual cues about the scene, and a more familiar display of information within it.

Week 28 2026: Mobile Capture Improvements Read More »

Week 27 2026: Visual Feedback

Leave a Comment / Web Design, Week Reviews / Justin

Accomplishments New Customization Tool Mobile Capture Tool Updates Minor UX Fixes Seeing is believing. When designing an experience, you want immediate feedback as you brainstorm and create your design. This feedback extends the width of the process, from capturing images to tailoring menus and color schemes to fit the needs of the project. New Customization Tool: 2D Updater VR Updater Thumbnail Updater Edits List The standard way for adjusting tours with certain tools involves manually describing changes in a text file, saving it, and refreshing a page to see how it looks. Experienced designers can quickly identify and update the files for the changes they’d like. For those less familiar, or looking to experiment, this can be a cumbersome and slow process. With this new customization tool, the skill barrier for adjusting experiences has been drastically decreased, and the speed to do so has been greatly improved. With this tool, changes can be made with sliders and drop downs which live update the custom grid menu’s both in 2D and VR. I have also added a feature to select and update thumbnails for each photo, so that they can be set to more visually desirable locations for the grid. This all happens live, with visual feedback for the designer to consider. And I have integrated last week’s camera position tools into this interface, and included a list of images to inform the designer which have been updated, and which have not yet been changed. Reducing the skill ceiling, improving efficiency, and simplifying the tasks of changing thumbnails and camera positions, all without manually editing fields in a file, greatly improves the process for designing a tour, shortens the time, and adds more value to the individual parts. Mobile Capture Tool Updates: Updated Visual Mobile Capture empowers designers to see the structure of the final result while on-site and during capture. This ensures that things are not missed, and gives the opportunity to reshoot or gather more information for areas that appear to need more detail. This can reduce re-shoots and manual processing later. This week, I implemented multiple new features to support this process. The first major feature is to connect the 360 camera to the application. For this case, I use an insta360 x5. Insta 360 offers a standard development kit (SDK) for interfacing with their cameras. Once approved, I was able to implement this within my application and begin viewing a live feed from the camera. The capture button now sends a signal to the camera to capture a photo, and it sends the photo back to the phone for further processing. The SDK and connection also afford the opportunity to adjust the camera’s settings remotely. Things like exposure, HDR, resolution, and more can be changed from the mobile phone without having to disconnect or manually adjust within the camera. With camera connectivity I could now build out sequential photography and point cloud representations. Each captured photo is run through the process against the previous “anchor” photo. This generates the new point cloud, and it’s pose relative to the last camera. Each new photo is then displayed as a new node, creating an interconnected tree to represent the space. Selecting an anchor node can let you return to previous areas for new branches, or refining existing ones. A big problem I face is capturing tours in open areas. Large rooms or outdoor scenes where there is no place to hide, or, the best place to hide is outside of the mobile phone’s connectivity range. This can mean spending more time hiding and returning than actually capturing the photos. For this I added an edit workflow. Now, after a photo is taken, you can choose to draw over parts of the photo you want cut out. Then, you can take a second photo with the camera, and combine them. As long as the camera does not move, the two photos should blend well together. Letting you appear in both shots, and removing you from their final result. Point clouds can be great visualization tools, but when zoomed they may appear too sparse to offer the detail needed to be seen. For this I have updated the 3D visualization on mobile to display a voxel grid. Voxels are just cubes, and these can be colored to display the scene, and update as it captures more data. These solid structures make it easier to see from close or far away, offering better visual feedback to the designer. Minor UX Fixes: A few small irregularities were identified and resolved through this week. One involving the VR menu. It would not maintain its position relative to the user’s head after a transition. Naturally, when going from one end of a house to another, we expect ourselves to feel like we have moved. However; subconsciously we see a menu as an extension of ourselves, like a watch or a tablet. When we teleport, we expect it to teleport with us. That makes it jarring when it stays in place, or changes slightly after every transition. This is now resolved, and it remains fixed to the viewer. This came alongside a minor fix, where in VR or 2D, sometimes selecting items from the menu would transition in 3D, causing a disorienting move through walls and fixtures, rather than a simple fade out and fade in. The multi-floor feature came with substantial changes to how dioramas are loaded and experienced. One consequence of this was a momentary lack of texture for the diorama view when transitioning from the depthmapped scene. I added a new step that pre-loads the textures for the diorama, so that they are immediately ready when transitioning. A minor issue was identified with the auto-sizing of the 2D menu. When entering and leaving full screen, the width of the 2D menu would change even if the browser size had not. Now the 2D menu remains consistent after exiting full screen. Summary: What you see is what you get. Giving the designer more

Week 27 2026: Visual Feedback Read More »

Week 26 2026: Mid-Year Review

Leave a Comment / Web Design, Week Reviews / Justin

Accomplishments Multi-Floor Display Updated Experiences Mid-Year Review Bonus Section Generation Camera Orientation Tools Local Language Automation Halfway around the sun. That is how far we have travelled over the last 6 months. An incredible distance. And with this week, another incredible distance has been accomplished, this time in the world of spatial experiences. Each week has included accomplishments and research produced to detail and create processes and tools for efficiently delivering better spatial experiences. Many improvements along the way contribute to this. Most notably, the User Experience, and the spatial reconstruction tool. While an incredible amount of improvements went into these, a meaningful summary can be found in my H1 video linked alongside this page. This week brought H1 to the finish line, and delivered a meaningful end-to-end workflow alongside tools for delivering meaningful spatial experiences more efficiently, and affordably, than 6 months prior. To complete this chapter, I implemented a few final features. I improved the multi-floor display tool first drafted in week 2. I also defined section generation in the frontend, to pair with the responsive grid designed that same week. I created a new translator tool with a frontend that now simplifies the control over what gets translated and now automatically performs the translations using a local model, rather than calling an API, while still operating nicely with the structures defined in week 5. And to better tailor the visual experience, I created two new editors for setting the camera’s position when viewing the diorama, as well as each scene’s starting view. Each new feature has been a step closer towards the original vision. Nothing is more telling than that than the fact that many of the features in the final week still cooperate with and improve upon the concepts defined in the early weeks. Multi-Floor Display Multi-Floor Display One of the driving inspirations for this greater effort was determining how best to share large, multi-floor scenes. Many spaces have multiple levels, and the larger the spaces, the more they benefit from spatial awareness and the user being able to see where to go. Using a standard 3D model is often opaque and can be difficult to navigate. If you have a 3-floor home, and a walled-off room in the center of floor 2, it is very unlikely you can see or click it. But what if the tour were smart enough to know when you want an unhindered view of the floor? When facing a floor on the horizontal plane, you often only see its walls, as the floor and ceiling are parallel to your vision. Here is a great opportunity to display other floors. Since you can’t see the ceiling or floor, levels above and below the selected one can fade in and become clickable to change the active floor. And when you angle up, the room’s floor comes into view and the walls fade away. These angles offer the opportunity to select locations, and require the other floors to disappear for focus and an uninterrupted click. This result solves one of the bigger problems for a challenging tour, Supalai Place. This 3-story building with interiors and exteriors had many rooms, often ones which may be difficult to select if not for multi-floor display. With this feature, it is now much simpler to move quickly across the house. It reduces manually walking through, or reading the gallery, to just 2 clicks. One to the diorama, and one to the room. Bonus: Section Generation The responsive grid works great to simplify navigation and describe areas within a building alongside text descriptions. A challenge with it was manually defining a group and naming it for each scene. This is trivial for a dozen photos. It is much more cumbersome for a series of 100+ items, and much more prone to human error. To solve this, I added a feature to the frontend where multiple images may be selected and set to a selection. These selections can then be exported alongside the other data and used when automatically writing the tour.xml file. This feature replaced manually placing keys throughout a file with a GUI-based definition including drag-selecting multiple cameras for a shared selection. Bonus: Camera Placement Tools A beautiful diorama is best seen from a good angle. If the camera places itself below or far away from the diorama, it may regularly cause users to have to move and zoom the camera to find a good location. This hinders the ability to navigate and appreciate a space. Also, when selecting a position, the camera’s angle may be unexpected. Staring at a blank wall or having a screen full of leaves may be momentarily jarring. And again, this causes the viewer to have to move and zoom in order to get an idea of what the scene is. These were difficult to accept. Fixes do exist for them; manually calculating a good camera position and angle can be done, as well as manually copying coordinates for each photo’s camera to look at. This can add minutes or hours when creating an experience. With two new tools, they now take seconds per instance. Each diorama needs the camera to be placed in the desired spot, and at a click of a button the replacement data is available. The same goes for camera orientation per image. For each image, just adjust the camera, click a button, and its information is prepared. Bonus: Local Language Automation Translation Tool Multi-lingual support currently requires two steps: generating the translations and displaying them. The process was mostly formed in week 5. Displaying them has remained largely the same. The user selects a language and immediately all translated text is updated. However, under the hood, a lot has changed. Previously, translations were done by manually typing a list of keys that were to be searched for and translated. Then, a paid API call was made to a machine learning tool in order to translate the files. And then new files were generated for each translation.

Week 26 2026: Mid-Year Review Read More »

Week 25 2026: Exporting the Experience

Leave a Comment / Web Design, Week Reviews / Justin

Accomplishments Exporting the Experience Remade Spatial Experiences Bonus Mobile Proof of Concept This week’s focus has turned back toward the penultimate application of the 3D reconstruction: the spatial experience. New features across the frontend and Blender were built to automate the generation and transfer of information necessary to generate and navigate a 3D scene. In addition, I spent some free time preparing a proof of concept on a mobile device. Exporting to Experience Exporting Hotspot Locations.png To take the information we have in the frontend and view it in a bespoke virtual tour, we must transfer and translate it in a way that the virtual tour software can use. The 3D model and textures are simple enough to copy and export within the frontend, then download like a normal file. Extracting the hotspots was also fairly simple and improved on the process built all the way back in Week 1. Now, rather than manually running a script in Blender, the hotspots will be placed upstream as soon as the point clouds are generated. Another callback to Week 1 can be seen in the updates to exporting scenes into code that works with the tour-building software. Previously, multiple scripts were used to patch different parts of the experience into the file(s). Now, a single script can be run to prepare a directory and series of compatible files. This reduction in complexity and maintenance comes from many of the core changes and pipelines developed throughout this project. The earlier problems can be solved, the more cohesive the output becomes as it compounds throughout the workflow. Taking the content generated by the 360 to 3D process and integrating it with the spatial experience connects the tool to its first purpose and gives value to its output. Remade Spatial Experiences Supalai Remade Villa Remade Pattaya Remade The 3D models for Pattaya, Villa Korbhun, and Supalai Place have all been successfully integrated and tested in their spatial experiences. This integration proves the mesh data can integrate and display well with the tour software. More than anything, it feels incredible to see a tour as complex as the Supalai Place home displayed and navigable in 3D. Bonus: Mobile Proof of Concept Villa Mask View 360 to 3D reconstruction has been solved, though some edge cases still persist. Larger tours may need manual intervention to select and connect smaller groups. Some scenes may not pose well together and appear considerably misaligned, affecting the whole environment. Both of these could be fixed with visual feedback during capture. I decided to see if it would be possible to convert part of the current process to run on a mobile device, targeting the iPhone 14 Pro. The iPhone 14 Pro only makes around 4GB of RAM available to the user and includes a CPU, GPU, and ANU, all powerful though far less so in comparison to the 24GB 4090 GPU I have been testing on. I still persisted in trying, with the understanding that tasks may be limited and slower. I recalled recent developments in model-efficiency techniques, and my familiarity with quantization left me determined to try. After dozens of attempts, I had experienced and learned a lot about the iPhone’s architecture and processes. Most methods used in PyTorch machine-learning models translate directly into counterparts available for Apple’s ANU chip; however, many do not. These often require replacements with similar but slower code. This translation from a PyTorch model to Apple’s Core ML was fairly simple and straightforward. What made things more complex was attempting to run inference with this model. Its size began reasonable; however, attempting to run it on the ANU, a chip meant for AI models, led to RAM ballooning, loading far more than the model, and crashing the application. When attempting to use the model, it would load into memory and often load more than its size due to certain compilations of translations for the ANU. One alternative was to run it on the CPU, where the model would remain its size on disk at the cost of slower inference. This worked great, leading to my first successful point cloud from a 360 photo, all performed on an iPhone. This was a great achievement, though it was far too slow and far too little. At least two images need to be inferenced and compared in order to pose them. With more photos comes more RAM and more time. That pushed the budget close to the limit, and it was very slow. This is when I began to investigate quantization and palettization, as well as changing the precision value. These techniques help shrink the machine-learning model while preserving quality the vast majority of times. Applying combinations of these techniques to different degrees decreased the memory requirements just enough to leave room for some parts of the application, but not much. Remember, the 3D scene, user interface, connection to a 360 camera, live preview, and so much more all have to be managed in memory. And each photo means another point cloud, which can take up significant space in memory. A tour of hundreds of photos may take over 1GB. How could we squeeze more space from this machine-learning model? This is where I learned machine-learning models have different subsections that can be broken down: encoders, aggregators, decoders, and more. These parts can exist on their own, which means we do not need them all in memory all the time, only one in memory at a time. It also means I could squeeze more size out of each, which improves efficiency both in RAM and in CPU performance. Further reducing the models worked out great, with my maximum memory footprint reaching around 50% of the available RAM. This gives incredible space for point clouds and other features, even for incredibly large tours. With the models now so small, I did attempt to run them on the ANU. While certain parts could run, the memory required to run them quickly ballooned far beyond comfortable levels and would likely

Week 25 2026: Exporting the Experience Read More »

Week 24 2026: Model Improvements

Leave a Comment / Web Design, Week Reviews / Justin

Accomplishments Mesh Improvements Texture Improvements Bonus UI Improvements The 3D model is the ultimate visual output of the 360 to 3D reconstruction process. It’s quality is judged subjectively by the user, and quantitatively by the file’s size. Qualitatively, we were able to improve the visual look of the 3D model by editing the UVs and better distributing the image textures across the scene, with more proper fades between. Quantitatively, we were able to improve the process by reducing reconstruction time by over 85%. I also improved the reconstruction workflow, allowing for groups of photos to be posed individually and connected at joints. Mesh Improvements: Slow Fast After noticing a reconstruction process take close to 1 hour for a 52 image sequence I was determined to improve the performance. After investigation I noticed two likely improvements. One was a case where cpu and gpu were switching contexts quite often, and another was related to limiting the distance with which a camera should vote on points. I was able to re-work the process to keep contexts focused on either cpu or gpu and swap more efficiently. I was also able to limit the valid range for each camera. The depth estimation is only valid to a certain distance, so a camera from one end of the house shouldn’t be contributing votes to the room on the other side. This massively reduced the calculations for determining the final point cloud and mesh. Combined these changes were able to increase efficiency from 51 minutes down to just 8 for a 52 image sequence. Texture Improvements: Before UV After UV Before Texture After Texture Balancing visual fidelity with file size is a challenging task. The level of fidelity required is determined by the final application through which the user will experience the 3D model. The current application for which intends to be environmental navigation tool for spatial tours. While last week’s result was adequate for this, sharp transitions and blotchy textures littered the scene. Seing slices of walls or pillows and sharp changes in contrast is uncanny and distracting. I was able to setup a process which establishes an area between the seams for blending. This softens each transition and provides a gradient for color to change over. This limits the expression of sharp contrast changes or color inequalities across photos and provides an image much more appealing. Part of this process included improvements to the UV maps. UV maps are the unfolded faces of an object, cut into islands. Just like coloring origami on a flat sheet of paper before folding it. Before the UV map improvements there were an incredible number of very small islands. These little triangles littered the scene and often had textures slightly different than the area around them, standing out quite obviously. Updates were made to grow these islands. Now more triangles are connected to their neighboring islands, and the islands overall are much larger. This allows us to better spread out the textures across surfaces for more complete and consistent visuals. Bonus: UI Updates: Villa Mask View Supalai F1 Actions Menu Pattaya Park Whole Park Min Floor 3 Floor 2 The user interface was completely overhauled to provide a simpler interface and better conform to the new workflow. Images can now be uploaded in groups and automatically process between downsample and masking operations. Masks and RGB images are visible directly from the image’s drop down in the persistent content window. Masked groups can be posed via an action and immediately display in the 3D viewport. From here a camera’s height can be provided to scale the whole cloud. And groups can be linked together at a joint to allow for more piecewise alignments. Merged groups can be modeled and when complete, again they appear directly in the 3D viewport. With these changes I have streamelined the end-to-end process and reduced views to maintain from several to just 1. Summary Better visual fidelity and faster processing. Simpler workflow and less to maintain. These simple benefits are the result of persistent iteration and reflection on a process that continues to evolve and further funnels it closer to its final permutation. It’s effectiveness continues to increase, and I am grateful for the changes accomplished this week.

Week 24 2026: Model Improvements Read More »

Week 23 2026: Texturing

Leave a Comment / Web Design, Week Reviews / Justin

Accomplishments Texturing Bonus Reconstruction Improvements Furniture Removal Proof of Concept Paint, color, and shades add visual depth and give texture to items and surfaces. In 3D graphics the process of coloring a 3D model is called texturing. This week I designed a process to take the photos from the 360 images at their camera positions and use them to paint color, and texture, onto the 3D model. Texture painting relies upon the accuracy of the 3D scene. Holes in the walls will not get any color and realism will be lost. The same goes for floating artifacts and blobs that catch color, and leave empty sides or shadows on walls behind them. Camera positions are also more important than ever here, the further off the positions are, the higher the likelihood of the seams between images appearing cut or shifted. New improvements were made to camera alignment and mesh quality, which further improved the resulting textures. And lastly, a proof of concept was performed on furniture removed photos. The process as of today was effective in reconstructing a model with furniture removed. Texturing Batch Selection Results Overview Adding texture takes our 3D models from looking like molded cream cheese and turns them into realistic models. The difference is similar to a plaster craft before and after paint is added. The process is straightforward. The 3D model is built of many tiny faces. And every face can be mapped to a two-dimensional square, kind of like how origami begins as a flat paper and molds into its intricate design. On this flat paper we can color in the faces. How do we choose what to color in what face? We use the camera’s positions, and what direction the face is pointing. The cameras find the places on the model, the walls, floors, and find which ones are closest and have a good angle in relation to the camera. Then it determines that these faces will be drawn by that camera. Using these techniques we can see the rooms brought much closer to life with the color added. Bonus: Reconstruction Improvements High or Low Res A High or Low Res B Side By Side Vertex Count Texturing quality relies heavily on the camera’s position and the models quality. A variety of improvements were identified to raise the quality in order to better texture the output. Through a variety of updates, slight camera drifts were resolved. These slight drifts lead to better alignment along the point clouds, and this better alignment lead to a more accurate resulting mesh. These changes to the process additionally came with an additional benefit of improved performance speed. One frequent issue noticed was holes appearing in the 3D model. Corners of rooms, sharp angles to cameras, and unseen crevices below desks and furniture often resulted in holes in the 3D model. Some investigation was performed and new hole-filling techniques were implemented to resolve these problems. While holes meant too little geometry, floaters meant too much. Around the scenes it was common to find floating blobs. Sometimes these made sense, like dangling tassles from a lamp where the rope was too small to make geometry for, but the tassles were large enough. Other times the floaters were less meaningful, like bands from sharp edges that had many closer cameras which would deny them. Adjustments to the voting criteria and other settings were capable of drastically reducing the floater count. Fewer floaters means less challenge in texturing the scene, as cameras do not have to worry about the unseen back side of the floaters (like the dark side of the moon) or the shadows they may cast on the walls. Another important step was to decimate, or reduce, the size of the model. The model is like an origami, and each side after a fold is called a face. The more faces we have the more steps to make the model, and the more processing is needed, and data transferred over the internet. With decimation we reduced the mesh by about 89%, close to 100,000 faces. This can greatly reduce file size while having little affect on the visual results. Bonus: Furniture Removal Proof of Concept Empty Image Furniture Image Side By Side 3D no furniture For the real estate domain an interesting feature may be to see a room free of furniture and clutter. In many cases the purchaser is buying the house, not the interior design and items that come with it. Through machine learning tools we can remove furniture from the spaces. The tools attempt to recreate the 360 image with the idea of no furniture. This inevitably leads to hallucinations. All of the information behind the furniture is imagined and not real. However, most of this information can be acceptable. A hardwood floor is likely to keep going under a bed if its seen on both sides. Or the wall is likely to remain in-tact even behind a curtain. These hallucinations can lead to inconsistencies between shots. Perhaps one image imagines the floor’s color to have been affected by the sun over years, and another may not. Small inconsistencies like this may stand out. Viewers would likely benefit from prior information that scenes like this include imagined information so that they can prepare for these uncanny effects. Luckily, even with the inconsistency of the imagery the layout itself remains mostly in-tact. And when run through the reconstruction process the resulting room is incredibly accurate. The layout remains in-tact and the hallucinations appear to be consistent enough to work well with the depth estimations. Summary Texturing takes us the final step back to the original proof of concept performed back in Week 12. In these 11 weeks we have been able to identify and resolve many of the challenges found in taking a sparse unordered dataset of 360 images and reproducing a textured 3D mesh that is web-ready and of good quality. This week brought us to that milestone through improvements in positioning, mesh generation, and

Week 23 2026: Texturing Read More »

Week 22 2026: Pose Improvements

Leave a Comment / Web Design, Week Reviews / Justin

Accomplishments Batch Match Testing Same Side Opening Check Mask Removal Narrow Passage Fallback Doorway Opening Check Rotation Aware Match Validation Positioning, or matching and alignment, has likely been returned to and iterated on the most throughout this development process. The real 3D world is complex, and a 2:1 series of RGB pixels only gives us so much information about it. Upon each iteration I have squeezed more and more useful information out of these 360 photos. This week was no different. Using the refined masks and depths extracted from each photo I was able to better match each camera to the point that each dataset accurately pairs each photo to a ground truth acceptable result. To confirm each step forward for one dataset was not a step back in another, I created a tool to automate the performance of the matching process for batches of projects. And to run these during my breaks, rather than overnight, I found an opportunity to reduce fine matching time again by limiting the coarse promoted doorway masks. Batch Match Testing: Batch Selection Results Overview The testing tools created last week offer a great summarized view of the results and how they compare to my expectations. However, I felt like my time could be better used rather than manually running one test, uploading a file, and running another, especially with no current notification that a test has been completed. I decided to automate the process. Now, I can select multiple defined projects and the tool will automatically go one by one, reset them to the matching stage, and perform the matching process. It will download the results files, make the comparisons, and move on to the next project. Thanks to this, I can test my changes across multiple datasets overnight or while I step away for lunch. As each change I make has a wider impact on the process, automated tools like this are imperative to secure against hidden regressions in performance. Same Side Opening Check: Results Before and After A recent change improved match confidence by promoting more through-doorway matches, and increasing the mask availability for doorways. This greatly improved matches, while also inflating the time to complete as more pairs were promoted from the fast coarse step and into the slow fine matching. I wanted to determine how to reduce these many new, improper, doorway promoted matches in a way that had no chance to affect the good through doorway candidates. My previous sentence had the key I needed. “through doorway” means through the doorway. I did not need to promote everything that had a meaningful number of keypoints in a doorway, I needed to promote matches that were through the doorway. This problem has already been solved in a recent week, if the keypoint lands on both images masks, they are likely on the same side of that doorway. If the keypoint lands on one mask, and not on the other’s, it is likely a trait of the other camera’s room, and the first camera sees it through the doorway. Reducing the promoted matches to only candidates that see through a doorway drastically reduced the output and processing time for fine matching. This made automated testing possible not only overnight, but during meals and other breaks. Mask Removal Blob Removal Before Blob Removal Before Small Mask Removal Before Small Mask Removal After The tool being used to detect opening masks is imperfect. From testing it seems our choices are to identify far too many masks, or far too few. We can trim data, we cannot spawn more, so the choice was made to take more masks and manage the consequences. Two consequences of this were keypoints landing on small, wrong masks and promoting improper doorway pairs, as well as hallucinated openings causing holes during reconstruction. The small masks issue was reasonably straightforward to fix. For a 360 image any valid mask through a doorway will likely take up some portion of the height and width of the screen. If the doorway is 50 ft away, the door will be very small and likely not a reasonable pair for the camera’s to use in positioning. If the doorway is 10ft away, it will likely appear much bigger in the photo and be a much more reasonable transition from one photo to the next. Knowing this, I was able to reduce the false positive masks by removing any blob below certain thresholds. This reduced the ceiling lights, windows, shelves, and other structures that had been falsely chosen as openings. The reconstruction piece was partly solved by removing small masks, however some bigger hallucinations did persist. Using the information about which side of a doorway an image is on, I was able to after matching construct an updated mask that only included known used masks through doorways. This stops 3D reconstruction from cutting holes in walls or cabinets. Far away cameras or ones at sharp angles that do not use a doorway may now hallucinate a flat surface where the doorway should be. Luckily, the 3D reconstruction process already downweights this information and should remove it during the process. Narrow Passage Fallback Overlapping Poses Proper Positioning A strange issue has been consistently noticed when positioning pieces of the first floor balcony. Even when connected to proper images their positioning was incredibly off. After investigating, this appeared to occur because the shared keypoints used to position were very few and only on one space between the cameras. For example, two cameras looking at eachother on a thin, long balcony. The sky is removed, and the many keypoints that remain are on buildings in the distance, or directly behind either camera. Some keypoints may be on the floor, or ceiling if present, and perhaps the wall. Featureless surfaces, like a tiled floor or flat wall, offer very few keypoints. The surfaces behind each camera on a long alley may be outside the known good range and get cut off. This leads to only using the

Week 22 2026: Pose Improvements Read More »

Week 21 2026: 3D Model Improvements

Leave a Comment / Web Design, Week Reviews / Justin

Accomplishments Improved 3D Model Quality Bonus Mask Comparison Tool Match Comparison tool A 3D visualization of an environment does not need to be perfect, it just needs to be believable. This week we took un-believable cut objects, or meshes, with artifacts and sharp corners and made them more believable with more identical features, smoothed surfaces, all with faster performance. In addition to this, I developed two new tools to better help me test the bulk affects of my changes across a variety of datasets. Improved 3D Model Quality Artiffact Model Flat Model Hole Filled Villa Result Pattaya Floorplan Pattaya Result Hallucinations So far two strategies have been tested to take the aligned point clouds and reconstruct a 3D mesh. Week 17’s results were sharp, jagged, and missing large regions. Last weeks outputs were aliased, cut like ribbons, and included strange carvings and artifacts. This week I decided to try a variety of ways to improve this process, one of which was to use the best of both worlds. Week 17’s process relied mainly on 2D math to carve and define the 3D model prior to constructing it. Last week’s process worked mainly in 3D to perform the steps and compute the model result from visible voxels. The 2D path lacked the ability to effectively occlude rays or work with unseen edges. The 3D path could handle those, though it’s reconstruction logic lead to either puffy or ribbon cut results often with artifacts. I was able to apply concepts from both processes together in a new hybrid pipeline. This allowed for the best of both worlds. The result was smooth, with correct normals for backface culling, holes were filled, walls were smooth, and ribbon cuts were nowhere to be found. This made for much more correct 3D models, though initially at a cost. The previous implementation had performed some sequences quite slowly. With review and revision, I was able to increase the performance to a reasonable level while maintaining quality. Some improvements still need to be made. Some floating blobs have appeared, and doorways at times can be covered by the hole-filling step. These results may hopefully be improved in future iterations, and at this time represent much more manageable artifacts compared to the ribbon cuts and jagged lines found in the previous week. Bonus: Mask Comparison Tool CVAT Tool Mask Comparison Up until now I have manually reviewed changes to all of the data outputs. This means I have skimmed images to determine if masks are appearing where I expect them. Up until now this has been as needed, and fast enough for development. As the datasets I am using continue to grow, so does the opportunity to regress or fail certain steps in the process. It’s unreasonable to expect myself to manually review hundreds of images after each change attempted. So I devised a tools to assist with this process. The first tool is for mask comparison. The masking step is currently very important. It identifies things like sky, glass, and opening doorways. It is imperfect, and as I change things, sometimes it catches more doorways while also forgetting others. So that the results do not regress, I must check that they always improve across the board. To do this I setup the mask comparison tool. This tool takes the annotated data for all opening masks and compares it to the ground truth data. Ground truth data is the data I expect of the mask. It is data I had to manually create. Using an interesting annotation tool called CVAT I skimmed through hundreds of images and updated the AI generated annotations to represent what I expected them to be for each opening. Now I can compare future runs of the process against this data to see if we are getting closer or further from the expectation. I can also compare against previous results to see if a new improvement has improved overall, and if some positions had gotten worse. Bonus: Match Comparison Tool Match Comparison Phase Comparison Comparing matches is another important step to the process. In itself are three sub-steps: coarse, fine, and spatial matching. Each step has a variety of conditions that can change its output, and changing one can affect the later steps. Being able to see the accuracy of a run compared to the ground helps me identify where things need improving, as well as where things may have gotten worse over multiple runs. Both of these tools still require some manual effort to receive the benefit; Though far less than previously required. Further automating this process to include overnight jobs could be beneficial for testing various datasets as a whole. Summary This week’s results include a huge milestone. The workflow can make a high resemblance 3D model for multiple datasets with only equirect images. To better research and develop further improvements to accuracy and to curb hallucinations I began creating testing tools to monitor the accuracy of the masks and match results. These should help preserve the current quality of output while I make changes in order to improve it.

Week 21 2026: 3D Model Improvements Read More »

Week 20 2026: 3D Reconstruction

Leave a Comment / Web Design, Week Reviews / Justin

Accomplishments Defined Requirements Drafted Reconstruction Plan Reconstruction – Second Iteration At this point we have taken 360 images and accurately graphed them and estimated the 3D structure as point clouds, or a matrix of floating points in 3D. Each camera has its own point cloud, and these point clouds overlap. To take these floating clouds of points and make a solid room we must identify these overlaps, and other irregularities, in order to make the structure as realistic as possible. I reviewed the current state of the point clouds and the issues that came up and began reviewing tools and techniques. Through this I drafted a plan to attempt to use these tools to solve the problems currently faced and began to implement it. Defined Requirements Every problem comes with a goal. The better the goal is defined, the more likely we are to achieve it. The goal for the reconstruction step starts off simple: “Convert overlapping 3D point clouds into single 3D mesh”. In practice, it grows much more complicated than that. You might recall that I had previously designed a reconstruction phase in week 17. These fused point clouds looked great from many directions, but not all. A variety of issues had appeared. One phase used cube-mapping which ripped seams through entire rooms. Another stage attempted to vote on geometric confidence, and at times would delete information on the other sides of a wall or retain hallucinated geometry in the wrong spaces. I took note of these and other issues and began fresh in assessing what a viable reconstruction pipeline would look like and need to perform. I also took the time to identify concrete examples of issues already found in the point clouds. Naming these issues and identifying their locations will give us valuable tests we can perform to ensure the pipeline is performing to our expectations. The issues to be tested include the following: Hallucinated Depths: The current depth estimation approach struggles within doorways, and the information stretched into the other room is often false. Hallucinated Depth Hallucinated Depth Flat Doorways: Some doorways have no depth at all and appear flat along the wall. A virtual tour needs to see through these gaps in order to display hotspots. These flat surfaces must be opened. Flat Wall Flat Wall Sharp Edges With No Alternative: Cameras can only see information from their perspective. So flat walls or sides of cabinets not visible to the camera do not initially receive depth data and appear as holes in the geometry. If no other camera sees that surface we must make a best guess at what that surface could look like so that we don’t have surprise holes everywhere. Sharp Edge Cloud Sharp Edge 2D Sharp Edges With Alternative: If one camera sees a sharp edge, and is unsure what’s behind it, another camera might see what is actually behind the edge. In cases like this, the camera that can see the geometry should be able to merge it with the other camera. Edge Hole Replacement Geometry Misplaced Surfaces – Far Apart: Depth accuracy is only consistent until about 2.5m with the current tool. After that, depths may band or stretch. In a large room this can cause depths to appear in the wrong place or beyond a wall. We must identify where those walls are and either merge or remove the wrongly positioned data. Extended Wall Occluding Wall Misplaced Surfaces – Close Together: Even nearby cameras can estimate depths at slightly different positions. The same wall may appear at different positions all within a foot of each other. This geometry should be identified as representing the same thing and merged into one single wall. Overlapping Wall Overlapping Wall These tests, among other things, represent the challenges faced in converting estimated depth maps into accurate 3D objects. Drafted Reconstruction Plan: With the requirements outlined, along with given information attained through previous phases in the process, we can begin drafting our plan. This week’s process consisted of over 10 steps. I will summarize the concepts here. The majority of time doorways provide inaccurate information. They are the most important part of an image for positioning, however the depth is incredibly misplaced. I determined that removing it altogether made the most sense. This meant that no geometry through doorways would appear or interfere with better geometry on the other side. It was also important to ensure that cameras do not provide empty votes to geometry on the other side of doorways. When voting a line is drawn from the camera to the cubic (foot, meter, centimeter, etc) that is being voted on. Usually this stops at the camera’s own geometry, however, if we remove the geometry through a doorway it could go infinitely! To avoid that I added a step to create an invisible wall right in the opening of a mesh that blocks the line, or ray, when it reaches it. This ensures negative voting remains within the bounds of the known good geometry at all times. Doorway Stop Edges, as previously mentioned, often result in empty space or missing geometry. I added a step that stretches out a flat surface between all edges, like a curtain over a window. These points weakly cover the flat space. They will remain if nothing else determines them to be inaccurate. Close Camera Identifying occluding walls was another important step identified. When a far camera tries to draw its wall on the other side of a much closer camera’s wall, that closer camera should block the ray, or line, from going through it. This also applies to empty space voting, so that the far camera cannot remove geometry it should not be able to see in the first place. Getting surface normals remains an important step. This labels a direction that the point is facing. Ceilings face downward, floors face up, some things may face at an angle. This will help us identify two sides of a wall when they overlap, as ideally they

Week 20 2026: 3D Reconstruction Read More »

Week 29 2026: Image Enhancements

Week 28 2026: Mobile Capture Improvements

Week 27 2026: Visual Feedback

Week 26 2026: Mid-Year Review

Week 25 2026: Exporting the Experience

Week 24 2026: Model Improvements

Week 23 2026: Texturing

Week 22 2026: Pose Improvements

Week 21 2026: 3D Model Improvements

Week 20 2026: 3D Reconstruction

Resources

Recent Post

Week 29 2026: Image Enhancements