Justin

Week 23 2026: Texturing

Leave a Comment / Web Design, Week Reviews / Justin

Accomplishments Texturing Bonus Reconstruction Improvements Furniture Removal Proof of Concept Paint, color, and shades add visual depth and give texture to items and surfaces. In 3D graphics the process of coloring a 3D model is called texturing. This week I designed a process to take the photos from the 360 images at their camera positions and use them to paint color, and texture, onto the 3D model. Texture painting relies upon the accuracy of the 3D scene. Holes in the walls will not get any color and realism will be lost. The same goes for floating artifacts and blobs that catch color, and leave empty sides or shadows on walls behind them. Camera positions are also more important than ever here, the further off the positions are, the higher the likelihood of the seams between images appearing cut or shifted. New improvements were made to camera alignment and mesh quality, which further improved the resulting textures. And lastly, a proof of concept was performed on furniture removed photos. The process as of today was effective in reconstructing a model with furniture removed. Texturing Batch Selection Results Overview Adding texture takes our 3D models from looking like molded cream cheese and turns them into realistic models. The difference is similar to a plaster craft before and after paint is added. The process is straightforward. The 3D model is built of many tiny faces. And every face can be mapped to a two-dimensional square, kind of like how origami begins as a flat paper and molds into its intricate design. On this flat paper we can color in the faces. How do we choose what to color in what face? We use the camera’s positions, and what direction the face is pointing. The cameras find the places on the model, the walls, floors, and find which ones are closest and have a good angle in relation to the camera. Then it determines that these faces will be drawn by that camera. Using these techniques we can see the rooms brought much closer to life with the color added. Bonus: Reconstruction Improvements Texturing quality relies heavily on the camera’s position and the models quality. A variety of improvements were identified to raise the quality in order to better texture the output. Through a variety of updates, slight camera drifts were resolved. These slight drifts lead to better alignment along the point clouds, and this better alignment lead to a more accurate resulting mesh. These changes to the process additionally came with an additional benefit of improved performance speed. One frequent issue noticed was holes appearing in the 3D model. Corners of rooms, sharp angles to cameras, and unseen crevices below desks and furniture often resulted in holes in the 3D model. Some investigation was performed and new hole-filling techniques were implemented to resolve these problems. While holes meant too little geometry, floaters meant too much. Around the scenes it was common to find floating blobs. Sometimes these made sense, like dangling tassles from a lamp where the rope was too small to make geometry for, but the tassles were large enough. Other times the floaters were less meaningful, like bands from sharp edges that had many closer cameras which would deny them. Adjustments to the voting criteria and other settings were capable of drastically reducing the floater count. Fewer floaters means less challenge in texturing the scene, as cameras do not have to worry about the unseen back side of the floaters (like the dark side of the moon) or the shadows they may cast on the walls. Another important step was to decimate, or reduce, the size of the model. The model is like an origami, and each side after a fold is called a face. The more faces we have the more steps to make the model, and the more processing is needed, and data transferred over the internet. With decimation we reduced the mesh by about 89%, close to 100,000 faces. This can greatly reduce file size while having little affect on the visual results. Bonus: Furniture Removal Proof of Concept High or Low Res A High or Low Res B Side By Side Vertex Count For the real estate domain an interesting feature may be to see a room free of furniture and clutter. In many cases the purchaser is buying the house, not the interior design and items that come with it. Through machine learning tools we can remove furniture from the spaces. The tools attempt to recreate the 360 image with the idea of no furniture. This inevitably leads to hallucinations. All of the information behind the furniture is imagined and not real. However, most of this information can be acceptable. A hardwood floor is likely to keep going under a bed if its seen on both sides. Or the wall is likely to remain in-tact even behind a curtain. These hallucinations can lead to inconsistencies between shots. Perhaps one image imagines the floor’s color to have been affected by the sun over years, and another may not. Small inconsistencies like this may stand out. Viewers would likely benefit from prior information that scenes like this include imagined information so that they can prepare for these uncanny effects. Luckily, even with the inconsistency of the imagery the layout itself remains mostly in-tact. And when run through the reconstruction process the resulting room is incredibly accurate. The layout remains in-tact and the hallucinations appear to be consistent enough to work well with the depth estimations. Summary Texturing takes us the final step back to the original proof of concept performed back in Week 12. In these 11 weeks we have been able to identify and resolve many of the challenges found in taking a sparse unordered dataset of 360 images and reproducing a textured 3D mesh that is web-ready and of good quality. This week brought us to that milestone through improvements in positioning, mesh generation, and texture addition. And we even had time to test the

Week 23 2026: Texturing Read More »

Week 22 2026: Pose Improvements

Leave a Comment / Web Design, Week Reviews / Justin

Accomplishments Batch Match Testing Same Side Opening Check Mask Removal Narrow Passage Fallback Doorway Opening Check Rotation Aware Match Validation Positioning, or matching and alignment, has likely been returned to and iterated on the most throughout this development process. The real 3D world is complex, and a 2:1 series of RGB pixels only gives us so much information about it. Upon each iteration I have squeezed more and more useful information out of these 360 photos. This week was no different. Using the refined masks and depths extracted from each photo I was able to better match each camera to the point that each dataset accurately pairs each photo to a ground truth acceptable result. To confirm each step forward for one dataset was not a step back in another, I created a tool to automate the performance of the matching process for batches of projects. And to run these during my breaks, rather than overnight, I found an opportunity to reduce fine matching time again by limiting the coarse promoted doorway masks. Batch Match Testing: Batch Selection Results Overview The testing tools created last week offer a great summarized view of the results and how they compare to my expectations. However, I felt like my time could be better used rather than manually running one test, uploading a file, and running another, especially with no current notification that a test has been completed. I decided to automate the process. Now, I can select multiple defined projects and the tool will automatically go one by one, reset them to the matching stage, and perform the matching process. It will download the results files, make the comparisons, and move on to the next project. Thanks to this, I can test my changes across multiple datasets overnight or while I step away for lunch. As each change I make has a wider impact on the process, automated tools like this are imperative to secure against hidden regressions in performance. Same Side Opening Check: Results Before and After A recent change improved match confidence by promoting more through-doorway matches, and increasing the mask availability for doorways. This greatly improved matches, while also inflating the time to complete as more pairs were promoted from the fast coarse step and into the slow fine matching. I wanted to determine how to reduce these many new, improper, doorway promoted matches in a way that had no chance to affect the good through doorway candidates. My previous sentence had the key I needed. “through doorway” means through the doorway. I did not need to promote everything that had a meaningful number of keypoints in a doorway, I needed to promote matches that were through the doorway. This problem has already been solved in a recent week, if the keypoint lands on both images masks, they are likely on the same side of that doorway. If the keypoint lands on one mask, and not on the other’s, it is likely a trait of the other camera’s room, and the first camera sees it through the doorway. Reducing the promoted matches to only candidates that see through a doorway drastically reduced the output and processing time for fine matching. This made automated testing possible not only overnight, but during meals and other breaks. Mask Removal Blob Removal Before Blob Removal Before Small Mask Removal Before Small Mask Removal After The tool being used to detect opening masks is imperfect. From testing it seems our choices are to identify far too many masks, or far too few. We can trim data, we cannot spawn more, so the choice was made to take more masks and manage the consequences. Two consequences of this were keypoints landing on small, wrong masks and promoting improper doorway pairs, as well as hallucinated openings causing holes during reconstruction. The small masks issue was reasonably straightforward to fix. For a 360 image any valid mask through a doorway will likely take up some portion of the height and width of the screen. If the doorway is 50 ft away, the door will be very small and likely not a reasonable pair for the camera’s to use in positioning. If the doorway is 10ft away, it will likely appear much bigger in the photo and be a much more reasonable transition from one photo to the next. Knowing this, I was able to reduce the false positive masks by removing any blob below certain thresholds. This reduced the ceiling lights, windows, shelves, and other structures that had been falsely chosen as openings. The reconstruction piece was partly solved by removing small masks, however some bigger hallucinations did persist. Using the information about which side of a doorway an image is on, I was able to after matching construct an updated mask that only included known used masks through doorways. This stops 3D reconstruction from cutting holes in walls or cabinets. Far away cameras or ones at sharp angles that do not use a doorway may now hallucinate a flat surface where the doorway should be. Luckily, the 3D reconstruction process already downweights this information and should remove it during the process. Narrow Passage Fallback Overlapping Poses Proper Positioning A strange issue has been consistently noticed when positioning pieces of the first floor balcony. Even when connected to proper images their positioning was incredibly off. After investigating, this appeared to occur because the shared keypoints used to position were very few and only on one space between the cameras. For example, two cameras looking at eachother on a thin, long balcony. The sky is removed, and the many keypoints that remain are on buildings in the distance, or directly behind either camera. Some keypoints may be on the floor, or ceiling if present, and perhaps the wall. Featureless surfaces, like a tiled floor or flat wall, offer very few keypoints. The surfaces behind each camera on a long alley may be outside the known good range and get cut off. This leads to only using the

Week 22 2026: Pose Improvements Read More »

Week 21 2026: 3D Model Improvements

Leave a Comment / Web Design, Week Reviews / Justin

Accomplishments Improved 3D Model Quality Bonus Mask Comparison Tool Match Comparison tool A 3D visualization of an environment does not need to be perfect, it just needs to be believable. This week we took un-believable cut objects, or meshes, with artifacts and sharp corners and made them more believable with more identical features, smoothed surfaces, all with faster performance. In addition to this, I developed two new tools to better help me test the bulk affects of my changes across a variety of datasets. Improved 3D Model Quality Artiffact Model Flat Model Hole Filled Villa Result Pattaya Floorplan Pattaya Result Hallucinations So far two strategies have been tested to take the aligned point clouds and reconstruct a 3D mesh. Week 17’s results were sharp, jagged, and missing large regions. Last weeks outputs were aliased, cut like ribbons, and included strange carvings and artifacts. This week I decided to try a variety of ways to improve this process, one of which was to use the best of both worlds. Week 17’s process relied mainly on 2D math to carve and define the 3D model prior to constructing it. Last week’s process worked mainly in 3D to perform the steps and compute the model result from visible voxels. The 2D path lacked the ability to effectively occlude rays or work with unseen edges. The 3D path could handle those, though it’s reconstruction logic lead to either puffy or ribbon cut results often with artifacts. I was able to apply concepts from both processes together in a new hybrid pipeline. This allowed for the best of both worlds. The result was smooth, with correct normals for backface culling, holes were filled, walls were smooth, and ribbon cuts were nowhere to be found. This made for much more correct 3D models, though initially at a cost. The previous implementation had performed some sequences quite slowly. With review and revision, I was able to increase the performance to a reasonable level while maintaining quality. Some improvements still need to be made. Some floating blobs have appeared, and doorways at times can be covered by the hole-filling step. These results may hopefully be improved in future iterations, and at this time represent much more manageable artifacts compared to the ribbon cuts and jagged lines found in the previous week. Bonus: Mask Comparison Tool CVAT Tool Mask Comparison Up until now I have manually reviewed changes to all of the data outputs. This means I have skimmed images to determine if masks are appearing where I expect them. Up until now this has been as needed, and fast enough for development. As the datasets I am using continue to grow, so does the opportunity to regress or fail certain steps in the process. It’s unreasonable to expect myself to manually review hundreds of images after each change attempted. So I devised a tools to assist with this process. The first tool is for mask comparison. The masking step is currently very important. It identifies things like sky, glass, and opening doorways. It is imperfect, and as I change things, sometimes it catches more doorways while also forgetting others. So that the results do not regress, I must check that they always improve across the board. To do this I setup the mask comparison tool. This tool takes the annotated data for all opening masks and compares it to the ground truth data. Ground truth data is the data I expect of the mask. It is data I had to manually create. Using an interesting annotation tool called CVAT I skimmed through hundreds of images and updated the AI generated annotations to represent what I expected them to be for each opening. Now I can compare future runs of the process against this data to see if we are getting closer or further from the expectation. I can also compare against previous results to see if a new improvement has improved overall, and if some positions had gotten worse. Bonus: Match Comparison Tool Match Comparison Phase Comparison Comparing matches is another important step to the process. In itself are three sub-steps: coarse, fine, and spatial matching. Each step has a variety of conditions that can change its output, and changing one can affect the later steps. Being able to see the accuracy of a run compared to the ground helps me identify where things need improving, as well as where things may have gotten worse over multiple runs. Both of these tools still require some manual effort to receive the benefit; Though far less than previously required. Further automating this process to include overnight jobs could be beneficial for testing various datasets as a whole. Summary This week’s results include a huge milestone. The workflow can make a high resemblance 3D model for multiple datasets with only equirect images. To better research and develop further improvements to accuracy and to curb hallucinations I began creating testing tools to monitor the accuracy of the masks and match results. These should help preserve the current quality of output while I make changes in order to improve it.

Week 21 2026: 3D Model Improvements Read More »

Week 20 2026: 3D Reconstruction

Leave a Comment / Web Design, Week Reviews / Justin

Accomplishments Defined Requirements Drafted Reconstruction Plan Reconstruction – Second Iteration At this point we have taken 360 images and accurately graphed them and estimated the 3D structure as point clouds, or a matrix of floating points in 3D. Each camera has its own point cloud, and these point clouds overlap. To take these floating clouds of points and make a solid room we must identify these overlaps, and other irregularities, in order to make the structure as realistic as possible. I reviewed the current state of the point clouds and the issues that came up and began reviewing tools and techniques. Through this I drafted a plan to attempt to use these tools to solve the problems currently faced and began to implement it. Defined Requirements Every problem comes with a goal. The better the goal is defined, the more likely we are to achieve it. The goal for the reconstruction step starts off simple: “Convert overlapping 3D point clouds into single 3D mesh”. In practice, it grows much more complicated than that. You might recall that I had previously designed a reconstruction phase in week 17. These fused point clouds looked great from many directions, but not all. A variety of issues had appeared. One phase used cube-mapping which ripped seams through entire rooms. Another stage attempted to vote on geometric confidence, and at times would delete information on the other sides of a wall or retain hallucinated geometry in the wrong spaces. I took note of these and other issues and began fresh in assessing what a viable reconstruction pipeline would look like and need to perform. I also took the time to identify concrete examples of issues already found in the point clouds. Naming these issues and identifying their locations will give us valuable tests we can perform to ensure the pipeline is performing to our expectations. The issues to be tested include the following: Hallucinated Depths: The current depth estimation approach struggles within doorways, and the information stretched into the other room is often false. Hallucinated Depth Hallucinated Depth Flat Doorways: Some doorways have no depth at all and appear flat along the wall. A virtual tour needs to see through these gaps in order to display hotspots. These flat surfaces must be opened. Flat Wall Flat Wall Sharp Edges With No Alternative: Cameras can only see information from their perspective. So flat walls or sides of cabinets not visible to the camera do not initially receive depth data and appear as holes in the geometry. If no other camera sees that surface we must make a best guess at what that surface could look like so that we don’t have surprise holes everywhere. Sharp Edge Cloud Sharp Edge 2D Sharp Edges With Alternative: If one camera sees a sharp edge, and is unsure what’s behind it, another camera might see what is actually behind the edge. In cases like this, the camera that can see the geometry should be able to merge it with the other camera. Edge Hole Replacement Geometry Misplaced Surfaces – Far Apart: Depth accuracy is only consistent until about 2.5m with the current tool. After that, depths may band or stretch. In a large room this can cause depths to appear in the wrong place or beyond a wall. We must identify where those walls are and either merge or remove the wrongly positioned data. Extended Wall Occluding Wall Misplaced Surfaces – Close Together: Even nearby cameras can estimate depths at slightly different positions. The same wall may appear at different positions all within a foot of each other. This geometry should be identified as representing the same thing and merged into one single wall. Overlapping Wall Overlapping Wall These tests, among other things, represent the challenges faced in converting estimated depth maps into accurate 3D objects. Drafted Reconstruction Plan: With the requirements outlined, along with given information attained through previous phases in the process, we can begin drafting our plan. This week’s process consisted of over 10 steps. I will summarize the concepts here. The majority of time doorways provide inaccurate information. They are the most important part of an image for positioning, however the depth is incredibly misplaced. I determined that removing it altogether made the most sense. This meant that no geometry through doorways would appear or interfere with better geometry on the other side. It was also important to ensure that cameras do not provide empty votes to geometry on the other side of doorways. When voting a line is drawn from the camera to the cubic (foot, meter, centimeter, etc) that is being voted on. Usually this stops at the camera’s own geometry, however, if we remove the geometry through a doorway it could go infinitely! To avoid that I added a step to create an invisible wall right in the opening of a mesh that blocks the line, or ray, when it reaches it. This ensures negative voting remains within the bounds of the known good geometry at all times. Doorway Stop Edges, as previously mentioned, often result in empty space or missing geometry. I added a step that stretches out a flat surface between all edges, like a curtain over a window. These points weakly cover the flat space. They will remain if nothing else determines them to be inaccurate. Close Camera Identifying occluding walls was another important step identified. When a far camera tries to draw its wall on the other side of a much closer camera’s wall, that closer camera should block the ray, or line, from going through it. This also applies to empty space voting, so that the far camera cannot remove geometry it should not be able to see in the first place. Getting surface normals remains an important step. This labels a direction that the point is facing. Ceilings face downward, floors face up, some things may face at an angle. This will help us identify two sides of a wall when they overlap, as ideally they

Week 20 2026: 3D Reconstruction Read More »

Week 19 2026: Positioning

Leave a Comment / Web Design, Week Reviews / Justin

Accomplishments Rethink Matching and Alignment New Spatial Awareness Checks Doorway Mask Improvement Doorway Side Check Test New Depth Estimator Bonus Frontend Updates Breaking down problems into smaller problems is often the right call, though sometimes even this can have unintended consequences. I had broken down the problem of finding camera positions into two steps: matching cameras, and aligning them together. With non-determinism and a lack of data, these steps alone could not guarantee a consistently correct outcome. I returned to look at the bigger picture and saw that positioning the cameras would benefit from a conceptual shift. Rather than matching then aligning, we focus on the original problem “posing” and with the combined data can determine better more consistent outcomes. This conceptual shift helped me identify three new spatial awareness checks, lead me to reviewing new AI models for depth and mask detection, and improving the doorway identification system. All these changes necessitated frontend updates for compatibility, and while performing these I also added new tools for visualizations. Rethink Matching and Alignment Through Doorway Matches Let’s review the data we have available. To start we begin with a sparse and unordered group of 360 equirectangular images. Some rooms of the property may only be covered by one image through a narrow doorway. We derive depth through an AI tool, and attempt to identify doorways with another. Both of these values vary in accuracy. We feed the images and masks through two image matchers. The coarse matcher is like a big rake and swiftly removes the majority of very different photos, and narrows them down to the ones that look most similar. The fine matcher is like a much smaller comb, that takes longer and removes the most un-alike images. Up until now we determined camera connections solely on the heuristic of how “alike” two camera scenes were. Here is where I needed to step back. Image matchers match images with likeness. That means two different bedrooms will score higher than a bedroom and a restaurant. The fine matching does a great job at determining if it’s the same bedroom, and that is why it works so well for matching rooms. It can match between rooms well too. It can tell every pixel that exists within the doorway. However, even with all the pixels in a doorway that may only be 15 percent of a scene, and two rooms with similar paint and tile may score a similar rate. Two similar looking rooms may very well be a better match than two different rooms connected by a door, as that is what matching is for, similarity, not structure. So I looked to clearly define what I want. I wanted matching, because similarity is an important heuristic that tells me the likelihood that two images are related. And I wanted those matches in their correct positions in 3D space. They couldn’t just be similar, they had to be an extension of the same space. So I decided to collapse the separate steps of “match then align” and rework them under the umbrella of “positioning”. Here the steps work towards the same goal, finding the proper position for each image. Matching still runs, but it no longer determines the final result. The non-deterministic heuristic of match confidence is now constrained to act as a gate, or threshold. Two highly confident matches will no longer compete to be the primary based on visual likeness, instead two visually alike images will proceed to the new stage spatial awareness. Spatial awareness matches images beyond their aesthetics. It uses the accurate area of estimated depth to align the two images. Based on this alignment it determines four things: camera proximity, line of sight, distance, and spatial agreement. Based on these we cull spatially inconsistent matches, and determine the most spatially similar and close camera positions to be the final topology. This change has made outputs more consistent and accurate across every dataset. Spatial Awareness Checks Far Camera Close Camera Determining a camera’s pose requires that the images are to be similar and the space encompassed in it to agree. A variety of values are used to determine overall spatial agreement, including camera proximity, line of sight, distance, and point cloud consistency. Camera proximity and line of sight do a great job to remove bad matches quickly. Once two point clouds are aligned it checks each camera’s expected position. Cameras that are in nearly the same position are often missed copies or bad alignments. The algorithms currently in use often find no translation for bad matches. This is a big indicator of connections that are misleading and should be removed. Camera line of sight is another great step in removing bad connections. Say you have two images around an L shaped corner. They are very similar, and have many matching points at the joint of the hallway. However, the cameras themselves cannot see each other, a wall lies between them. This puts the match on hold. The desired use case, virtual tours, should often include line of sight photography for the user to virtually “walk” through the space, and view the next room’s hotspot in 3D. This promotes cameras that can see each other, as well as swiftly removes outliers where two distinctly different rooms become aligned, and the cameras cannot connect at all. Images without line of sight remain as a fallback if no good matches are found. This can be useful in some instances where opened doors always block one entrance or the other. Camera distance tells us how far it thinks the cameras are apart. Once aligned we can estimate this. This helps us build a graph with connections that are closer, and more likely follows the path of the photographer throughout the shoot. This also helps us manage the inaccuracies of the depth estimation. The further from the camera, the less accurate the 3D point cloud. So closer cameras persist higher accuracy point clouds throughout the 3D scene. Spatial agreement, or point cloud

Week 19 2026: Positioning Read More »

Week 18 2026: Speed & UI

Leave a Comment / Web Design, Week Reviews / Justin

Accomplishments Caching Smart Bind Mounts Project Based Processing Improved Coarse Matching Bonus UI Design UI Implementation Manual Adjustments The time it takes to test a new feature can be substantial. In previous weeks we’ve seen single runs of a service take over 17 minutes, and get brought down to under 2. This week I looked for more opportunities to reduce time between attempts and completion. Through this process I was able to save substantial time building through caching and clever docker ordering. I was able to save time between datasets by preserving previous runs as projects. And during this search I was able to find an alternative tool that performed even stronger with my dataset for the same time. With improved development speeds across the board I was able to design, implement, and validate a user interface (UI) for this tool all within the same week. This UI further improves my ability to test features through project management and monitoring. What weeks ago took an hour of manual copy, paste, and wait, now takes just a few clicks and moments. Caching Without Cache With Cache Caching allows us to take work that was completed once before, and use it again. Our phones save our passwords, and save us time logging into sites. Imagine if instead, we had to fill out a captcha and come up with a new one every time. It’s better just to remember it, or let our phones do the work. I did this by caching the compiled torch, and changing the order for how docker copies the large (3+GB) AI models used in this project. Torch is a tool used in operating AI models. This tool is written generically, and can be applied across many different hardware devices. Often when it’s applied, it gets compiled. This means that the generic tool is broken down and translated to run a specific way for that hardware. This tool is multiple gigabytes (GB), so compiling this tool every time for the same hardware is very wasteful. I set up a cache and check system. After compilation it saves the information, then the next time the container is built it checks if there is a saved version, and if so, if it works for the current hardware. If yes, it loads that instead of compiling it again, saving seconds to minutes depending on the build. Another trick I learned relates to the way docker “layers” an image. Docker takes a variety of information and prepares it in a way that it can be run on any hardware using docker. In this process it stacks layers of information on top of each other. When building a docker image, it checks the previous image to see what layers changed. If it finds the lowest layer that changed, it rebuilds everything above it. Up until now, my large AI models appeared in the later docker layers. This means that just changing a single word in a log statement could take minutes to rebuild and test! To fix this, I moved the docker images up in the process. Smart Bind Mounts A smaller trick I employed is something called a “bind mount”. This hooks an existing docker container directly into the repository open in my code editor. This allows for certain files to be immediately available to the container upon saving, and further improves speed. Project Based Processing A notable quality of life improvement for this week is certainly project based processing. Previously I would only work with one dataset at a time. Running it through the long processes and verifying output. Afterwards I would need to erase all of the data, and revert back to square one in order to run a different dataset. This week that all changed. With a major refactor of every service, I was able to store and retrieve data within subdirectories linked to the specific project. This keeps data separate, and allows me to pick and choose what dataset I want to work with without removing any data. This preserves each unit of work completed up to this point per project. It also allows for more consistent results, as small changes between runs can be difficult to pinpoint when all previous steps had to be repeated. Improved Coarse Matching Weight Comparison Time To Complete While searching for alternative options to reduce the speed of the slowest service (matching), I came across the LoMa tool. An incredibly recent tool aimed at providing high quality matching across images at a speed comparable to LightGlue. This alone enticed me to try. With a quick implementation I was able to test a LightGlue replacement and to my surprise saw confidence levels during the coarse phase drastically improve. Close rooms appeared closer, and far rooms appeared further. This gave me confidence that it would improve the quality of top matches being fed into the fine matching stage. The times appeared nearly identical, so I chose to swap out for this new tool that appears to better fit the current use case. Note: Non-determinism While testing coarse matching I began to notice something interesting in the matching process as a whole. It appeared inconsistent. Run to run I would notice slight changes to the confidence scores. Previously I had believed this to be a result of my changes, though now it appeared to happen even without my intervention. I noticed that quite a few steps during this phase seemed to produce random, non-deterministic results. Non-determinism means that the results cannot be consistently determined. They vary, and may deviate slightly through each iteration. This quality occurred throughout the process. Spherical RANSAC was changing slightly, the distribution of points had potential to, and most of all the LoMa and RoMa passes. I found solutions to achieve determinism within these steps. Some of them, like the ones in RANSAC, I applied. However, the changes necessary to add determinism to LoMa and RoMa drastically reduce the speed. I contemplated if the speed adjustment was worth it for

Week 18 2026: Speed & UI Read More »

Week 17 2026: Match Speed

Leave a Comment / Web Design, Week Reviews / Justin

Accomplishments Fine Matching Speed Reduced By 90% Bonus Improved Pose Graph Accuracy Normal Estimation Depth Confidence Voting Fused Point Cloud As the quantity and capacity of datasets grows, so does the time it takes to test them all. To test under so many conditions can be cumbersome, and even moreso when one of the steps is taking over 15 minutes to complete. This week I set out with the main goal of increasing productivity by improving the fine matching speed as best as I can. I was able to reduce this by over 90%, while also having enough time to further improve the pose graph, and begin implementing the following meshing step. Fine Matching Speed Reduced By 90% Slow Speed High Speed By far the longest step in the pipeline is currently the fine matching process. Multiple factors exist which make this process so time consuming. We discussed many of these in Week 14. We solved the N-Choose-2 comparison problem by using the lightning fast LightGlue as an early coarse refinement step, narrowing down what might need to be compared. We also ran a spherical ransac operation (at the time, also reduced by 90% speed) on the matches to further confirm them. And then we progressed into a time intensive fine-matching stage with RoMa V2. RoMa V2 provides further confidence that the images are actually matches, which are the strongest, and where in the image they match. It is very time intensive, and through a variety of steps we were able to reduce this time by 90% as well. We were able to do this through a variety of steps. First we were able to introduce threading. Some tasks needed to be handled by the CPU, and some by the GPU. Previously, the GPU was stopping while the CPU completed its tasks, leaving time on the table. Now the GPU performs all its tasks, handing off the information as it goes for the CPU to pickup and work in tandem. We also receive a huge time improvement from using the fuse-local-corr, a feature provided by RoMa’s developer that was made specifically to provide faster time on certain hardware. We also added a change to compile the libraries to work with our specific hardware. This runs on startup, and takes around 2 minutes, but once performed, each subsequent run performs much faster! And lastly, we tweaked the way we distributed our matches across tiles to be more efficient. Through these changes, RoMa 2’s time was brought down significantly. It went from taking over 17 minutes, to just over 80 seconds. This greatly improves our ability to test matches, latter improvements, while also increasing efficiency as datasets continue to grow. Bonus: Improved Pose Graph Accuracy Floor Two Update RotatioFloor One Updaten Topology RotatioFloor One Updaten Topology RotatioFloor One Updaten Topology Last week we tested 4 datasets and only half of them appeared very accurate. After reviewing the results I was able to identify a variety of improvements to be performed. One being opening match rescue. In some cases openings from the coarse matching have very low confidence and do not enter into the second round of matching. Now, regardless of confidence, if a certain fraction of the matching points appear within an opening we promote it to the following round. This is purely additive, so that this low confidence candidate does not take the place of a higher confidence value in the limited number of positions being picked for round two. Instead, the number of items for round two expands. Now our doorways are preserved through coarse matching to be confirmed in fine matching. I also made some improvements to the distance calculation formula. Some matches had still been winning competitions even if more points were further. To combat this I added a maximum limit to the point depth. This means that extremely far points should not contribute to the confidence of the matched images. Skyscrapers, or fixtures beyond a long corridor, should not outweigh the features nearby, even if there are many more of them. Bonus: Normal Estimation Normal Map With so many kinds of estimation (Depth, Pose Graph, Keypoint, etc) its nice to finally hear about a “Normal” one. To those unfamiliar with 3D graphics the concept of ‘normals’ may sound anything but. Normals are a thing, every point in a 3D scene has them. You can imagine them as a little toothpick sticking out of the point in the direction it most ‘flatly’ faces. For example, the floor beneath your feat, those normals point straight up. The ceiling, those point straight down. Your glass of water, each degree points out in each direction. To reconstruct a 3D scene every point in our point clouds must have them too. Thankfully this is a common problem, and calculations exist to help in solving for the normals based on a depthmap and the camera’s position. Now the pipeline has accurate normals for the generated 3D depthmaps to be used in later steps. Bonus: Depth Confidence Voting Individual Maps Confidence Voting Results So far we have our matches, our pose graph, and our depth maps. When posed, most of the point clouds overlap. There are overlapping walls. There are false positives, where edges stretch out or small plants or items end up in different positions. Far away data like high ceilings or distant rooms may be inconsistent over few point clouds. There are even so many loose points that you can’t even fly through the aligned space to admire it visually! What can we do to reduce the quantity of false positive data across our depthmaps? We can vote. Since each of the camera’s knows where they are in relation to eachother, and each has it’s 2D depthmap available with the distance to each point, we can use these to construct a 3D matrix and compare across images what should be where. This way, if 3 cameras next to the bed all say the bed is in the center of the room,

Week 17 2026: Match Speed Read More »

Week 16 2026: Alignment Improvements

Leave a Comment / Web Design, Week Reviews / Justin

Accomplishments Depth Aware Alignment Precise Yaw Calculation Pose Graph Estimation Bonus Improved Opening Detection Pose Graph Accuracy Check Continuing the effort of refining the camera’s determined positions I considered what further improvements could be made to the alignment process. The current pose graph and topology continue to improve yet remain imperfect. I considered the data I have available and have confidence in. I have nearby depth data, highly accurate matching range, and accurate image segmentation. With this, I improved matching and topology by using depth to adjust match confidence. I also improved yaw rotation calculation with an epipolar calculation. And lastly I improved opening mask identification by combining offset images. Depth Aware Alignment No Cieling Floor Ceiling and Floor Pre Depth Topology Post Depth Topology Pre Depth Alignment Post Depth Alignment At first glance the topology ma look incredible, however under closer inspection issues appear. I noticed that nodes like Main Entrance 3, and Main Room 7 were connected. These are great matches about 10 feet apart; However, an even better match for the main entrance would be Main Room 2, which is about half the distance. With perfect metric depths at any distance this likely wouldn’t cause issue. However, the current depth estimation tool worsens drastically as distance grows. This means that matching points on the far end of either node may be 15-20 feet apart, and the points between them are more likely to reach the edge of the estimation’s accuracy. The closer the matches, the more likely the depths are to be accurate. I determined the best approach would be a two step process. The first step would be do run an additional match check on the ceiling and floor of each image. This is likely to further increase the number of nearby matches for each image and give us better grounding during alignment. The next step is to construct the topology based on the distance of each match. If one image has two pairs that have high visual overlap, and one has much closer overlap than the other, the closer match should prevail. With these additions, the topology improved, nearby matches became preferred, and alignment improved as well with the new floor and ceiling anchoring. Precise Yaw Calculation Rotation Visualization Rotation Topology Up until now alignment has been performed by a 9 degree of freedom (DOF) solver, constrained to 4 degrees of freedom (rotation: yaw, translation: x y z, scale: none). I believe that the data collected by the 360 camera, along with the confident matches identified, could accurately handle yaw rotation separate from translation. This decouples the rotation from the noisy depthmap data used in the previous 4-DOF alignment solving process. To achieve this I implemented an epipolar calculation process. This process is similar to the doorway recovery process described in Week 15. It takes both equirectangular images and projects them onto a sphere. The matched points are used to cast rays, or lines, out from the center of the sphere. The spheres are then rotated to determine the best rotation where both sets of rays intersect most accurately. This gives a highly accurate rotation, which can be seen by the improvements to the pose graph. Pose Graph Optimization While the yaw has grown incredibly accurate there is still potential for slight rotation inaccuracies to compound and drift with time. In an attempt to overcome this, I have implemented a step for Pose Graph Optimization. This takes the estimated rotations amongst the high confidence connections in the topology and adjusts all rotations to minimize the overall drift. Bonus: Improved Opening Detection Door Mask Door Edge Doorway Mistake Door Repaired I had noticed there were instances where images on either side of a doorway were not being connected in the minimum spanning tree (MST). Instead similar rooms were beating the doorways in the confidence rankings. This could be two bedrooms with the same sliding screen doors and tile floors, or it could be two balconies that see the same building in a distance. I began reviewing the matches and identified a variety of causes that could lead to this behavior. The first and most obvious cause was improper opening masking. For one equirect image the doorframe was split, half on the left edge and half on the right. The image recognition tool I use is not 360 aware and does not understand that what ends on the edge begins on the other side. Because of this it could not accurately identify that a doorway existed on both sides of the image. I was able to solve this by performing an additional mask step. I would offset the image, rotating it 180 degrees so that the door is in the center, and run the mask again. Then I would combine the masks to have the full picture. Then I noticed a second masking issue. Some openings were being confused for glass! This was probably due to the glossy, well kept tile floor and the cleanliness of the space. I tested a variety of solutions for this improvement with varying complexity. I learned that the image recognition model outputs individual object masks and confidence values. That means that each blob of glass gets its own level of confidence. I considered comparing confidences until I noticed that the doorways received far less confidence than the glass in this instance. Perhaps there is a more complex and interesting solution to be found with more effort. Though at this step I pivoted, deciding on the simpler solution of excluding glass from openings. Previously, glass masks took precedent over opening masks, removing features. Now, matches are preserved if they exist in an opening, and are still compared with the later processes. Some adjustments were made to the distance calculations as well. In week 15 we implemented the first attempt at depth aware matching and saw improvements for the interior of a space. This week we improved upon this implementation by drawing a hard cutoff for matches after a certain distance. This

Week 16 2026: Alignment Improvements Read More »

Week 15 2026: Camera Positioning

Leave a Comment / Web Design, Week Reviews / Justin

Accomplishments Depthmap improvements Removed Sky Flattened Glass Metric Depth Estimation Alignment improvements Distributed Sample of Matched Points Restricted Alignment Vectors (Yaw only, no scale) Doorway Recovery Sharp Edge Recovery Each step of the process relies on the accuracy of the data prepared by each of the previous steps. To most accurately determine the position of each camera, we must confirm that the matches are accurate in 3D space, and that they fit together just like they would in the real world. This week, we were able to greatly improve our alignment accuracy by using available data to refine the depthmaps, and identify low-accuracy depth matches and recover with an alternative alignment process. Depthmap Improvements Sky Removal EXR Sky Removal This simple step removes all points related to the sky for a given depthmap. This reduces the visual clutter when building the process while also simplifying the process of cleaning up the point clouds down the line. The empty void of the atmosphere has no accurate depth, therefore has no value for these stages of the process. Flattened Glass Glass Mask Hole In Glass Flattened Glass Clear glass in architecture is often used to see beyond a point. When looking at glass windows or walls in 3D it’s common to expect to see the solid structure, rather than sporadic holes in the space. I added a procedure to flatten the points identified as glass output by the depthmapping tool. The depthmapping tool itself did fairly well at identifying tinted glass as a flat object. I noticed that in some cases the glass would be ignored ad the depth would be generated through the glass. This inconsistency lead to surprising holes in the wall. Glass also can cause a slight refraction, or bend in the light going through it. This caused the holes to further corrupt the depth estimation with inaccurate refracted expectations. Flattening the glass worked well and even improved the overall shape of the room in some cases. This improves the accuracy of the individual point cloud and should reduce the time spent during some future cleanup phase. Metric Depth Estimation Side By Side Alignment Comparison Top Down Alignment Comparison Kitchen Misalignment Flat Floor Point Cloud 1 Flat Floor Point Cloud 2 Real World Step In Week 12’s report I discussed the tradeoffs between two tools used for depth estimation. One offered metric accuracy (most) of the time and had warped structure, the other had accurate structure but the metric accuracy varied widely. I moved forward with the second tool for its improved geometric accuracy, believing that someday metric accuracy could be applied with a scale adjustment. Today was that day. While ideating ways to improve my image alignments I reflected on what intrinsic data is available from the included photos. Upon this reflection I realized two important values I had taken for granted. The camera’s height and being level with the horizon (more on that in a following section). During shooting the camera remains at about eye level on a stand. This stand maintains roughly the same height throughout the capture process. Because the depthmaps tend to be accurate in the floor immediately below the camera, we can scale this distance to align with the value of the camera’s stand. This drastically improved the consistency across depthmaps. While the metric depth estimation drastically improves the process, some challenges still persist. The current stand in use is adjustable. That means over time, and from small changes and gravity the camera may accumulate centimeters of change in height. Also, slight angles like those of a hill or slope may lead to improper ground estimation and a larger inaccuracy to scale. Both of these can be solved with real-world adjustments to the stand and capture process. A fixed height stand with a tilt-able base could keep the camera level and over the pole while fixing its height to the ground with great accuracy. Another challenge this scale does not yet solve is that of incorrect depth over slight changes in elevation. For example, in the related images there exists a ground truth elevation change between the hard-wood floor and the kitchen area. The depthmaps for both the kitchen and the hardwood floor ignored this change and hallucinated a level area. Because of this, the floor for the kitchen appears confused, with two estimates, one from each camera. This may be something able to be rectified during a point cloud cleanup phase, though if unchecked has potential to accumulate scale drift over time. The metric depth estimation greatly improves the accuracy of depthmaps while drastically reducing the opportunity and magnitude of scale drift. All identified challenges have potential solutions to be later implemented. Alignment Improvements Distributed Matches Sparse Keypoints Real Pixel Density Sparse Alignment Distributed Alignment Sparse Points Distributed Points This week began with only the highest confidence matches being used to align two point clouds. Sometimes the highest confidence matches may group themselves in small areas. This makes alignment difficult. Think of it like a door. With hinges it has 1 or 2 matches on the same vertical line. A door still connects at those points whether its open or closed. Now think of a dead bolt, or a lock. When the door must be locked AND on its hinges, it only has one position. Instead of taking only the highest confidence matches, I decided to use a distribution of high confidence matches. In most cases this spread the matches across a wider area. While the matches may be less perfect, the overall structure will be more perfect. For example we can look at the secondary entrance and the shoe closet. The sparse, highest confidence, keypoints cluster in a few small straight lines. Then we look at the dense matching. Here the points cluster along a much wider field of view, almost the whole horizon. By transitioning from sparse, highest confidence, keypoints to a distributed selection of high confidence keypoints I greatly improved the degree of accuracy when aligning two point

Week 15 2026: Camera Positioning Read More »

Week 14 2026: Match Filtering

Leave a Comment / Web Design, Week Reviews / Justin

Accomplishments Improved Matching Efficiency Improved Matching Quality Minimum Spanning Tree With Maximum Confidence Feature Masking Manually curating a few known-good images to be matched and processed is quick and relatively immune to false positives. However, applying that same manual process to 50+ images grinds workflows to a halt and quickly becomes strewn with incorrect matches. Without GPS or sequential metadata, chaotic image datasets suffer from the exponential performance scaling of any-to-any (n-choose-2) comparisons. This is especially painful for CPU-bound processes. Furthermore, this expanded dataset compounds the likelihood of introducing false positives that can slip through current feature-matching algorithms. To effectively work with large datasets, we need to generate a minimum spanning tree where all images are connected, with high confidence regarding what each image matches to and exactly where those matches are located. Achieving this requires a multi-step approach: reducing CPU operations, iteratively filtering matches, and masking out corrupting environmental features like clouds and glass. Improved Matching Efficiency AnyLoc Reduced Time Lightglue Time AnyLoc islands n choose 2 graph The fundamental purpose of the matching stage is to look at every photo and compare it against every other photo to find the best connections. Historically, our pipeline handled this in three steps: ALIKED (extracting specific pixels as features), LightGlue (comparing extracted pixels across every image in an any-to-any configuration), and Spherical RANSAC (converting images to spheres to validate matches without distortion). The problem? Matching any one photo to every other photo grows exponentially as the dataset scales. A small batch of just 52 photos requires over 1,300 unique comparisons. Our primary bottleneck was Spherical RANSAC, a heavily CPU-bound operation. By switching from a standard Python library to a just-in-time (JIT) compiled version, we were able to execute the math directly on the hardware rather than pushing it through a Python abstraction layer. This single optimization cut the operation time by over 90%, bringing a 20+ minute process for 52 images down to just 2 or 3 minutes. We also theorized using AnyLoc as a pre-check. AnyLoc is a “Global Descriptor”; instead of extracting individual features, it generalizes the image (e.g., turning a photo into a mathematical representation of “Inside, big bedroom, beige walls”). Because it runs entirely on the GPU, it is blazing fast, tearing through the 52 images in just 35 seconds. However, AnyLoc tends to group high-confidence matches by type rather than topology. It creates disconnected “islands” of similar rooms (like matching two separate bedrooms together) rather than finding the connective path (like a hallway leading to a bedroom). While AnyLoc is incredibly fast and worth considering for future iterations, its tendency to falsely combine distinct but similar rooms means it requires careful handling before full inclusion in the pipeline. Ultimately, by removing the abstraction bottleneck of our CPU process, we successfully reduced our overall matching speed by 95%. Quality Improvement Lightglue False Positive Ransac False Positive RoMa2 Doorway Speed is irrelevant if the feature matching yields incorrect results. The matches generated by LightGlue and Spherical RANSAC often include false positives. Two identical doors, two sides of a symmetrical feature, or even similar moldings and furniture in completely separate rooms can trick standard matchers into drawing a connection. To solve this, we looked to RoMa V2 (Robust Matching), a dense feature matcher. What sets RoMa apart is its ability to rank every pixel in the image by confidence and match them at extreme angles. When we pass our challenging image pairs through RoMa V2, the false positive matches still appear, but they are flagged with remarkably low confidence scores and lack shared pixel density. This allows us to aggressively filter out the low-confidence noise while retaining dense, high-confidence pixel groups—even when looking through narrow doorways. The trade-off for this accuracy is compute time. Dense matching requires dense math. RoMa V2 takes significantly longer than LightGlue, scaling exponentially and taking over 5 minutes for a 52-image dataset. Because of this, RoMa V2 is not a standalone silver bullet. It cannot process the entire dataset in a reasonable timeframe on its own, meaning we still rely on LightGlue as an early-stage filter to narrow down the workload before RoMa V2 confirms the final matches. Minimum Spanning Tree & Maximum Confidence Path AnyLoc Topology LightGlue Topology LightGlue + RoMa2 If we don’t confirm that every point cloud has valid matches and belongs to a unified tree of connections, we end up with incorrect geometry and floating, disconnected islands. LightGlue is reasonably fast, but its confidence scores for a correct match through a doorway might be identical to the score of a false positive between two disconnected rooms with similar moldings. RoMa V2 is the slowest, but it is the most accurate and provides the highest confidence for tricky bridges like doorways. To get the minimum spanning tree with the highest confidence paths as quickly as possible, we re-architected the workflow to play to the strengths of both tools. First, we use LightGlue to perform the any-to-any matching and rank the pairs by confidence, narrowing down the field of likely candidates. We then intentionally skip the long Spherical RANSAC step, as RoMa V2 will handle match quality confirmation. Next, we determine the top-ranked confidence values and begin building our topological tree. If any disconnected islands or orphaned nodes exist, we expand the tree to include slightly lower confidence values until a continuous path connects everything. Finally, RoMa V2 processes this curated list of rankings. It confirms the confidence values, outputs high-confidence dense matching points for each pair, and validates the bridges through doorways. This dual-model approach yields a unified minimum spanning tree connected entirely by maximum-confidence paths. Feature Masking Bathroom Masked Matches Balcony Masked Matches Main Room Masked Matches Bathroom Mask Balcony Mask Main Room Mask Glass, mirrors, and the sky are notorious problem-causers in computer vision. To an algorithm, a mirror on a wall often looks like a doorway into a completely identical room. A glass window might not register as a solid surface, but rather a hole leading to

Week 14 2026: Match Filtering Read More »

Week 23 2026: Texturing

Week 22 2026: Pose Improvements

Week 21 2026: 3D Model Improvements

Week 20 2026: 3D Reconstruction

Week 19 2026: Positioning

Week 18 2026: Speed & UI

Week 17 2026: Match Speed

Week 16 2026: Alignment Improvements

Week 15 2026: Camera Positioning

Week 14 2026: Match Filtering

Resources

Recent Post

Week 23 2026: Texturing