Week Reviews

Progress summary for a specific week

Week 21 2026: 3D Model Improvements

Accomplishments Improved 3D Model Quality Bonus Mask Comparison Tool Match Comparison tool A 3D visualization of an environment does not need to be perfect, it just needs to be believable. This week we took un-believable cut objects, or meshes, with artifacts and sharp corners and made them more believable with more identical features, smoothed surfaces, all with faster performance. In addition to this, I developed two new tools to better help me test the bulk affects of my changes across a variety of datasets. Improved 3D Model Quality Artiffact Model Flat Model Hole Filled Villa Result Pattaya Floorplan Pattaya Result Hallucinations So far two strategies have been tested to take the aligned point clouds and reconstruct a 3D mesh. Week 17’s results were sharp, jagged, and missing large regions. Last weeks outputs were aliased, cut like ribbons, and included strange carvings and artifacts. This week I decided to try a variety of ways to improve this process, one of which was to use the best of both worlds. Week 17’s process relied mainly on 2D math to carve and define the 3D model prior to constructing it. Last week’s process worked mainly in 3D to perform the steps and compute the model result from visible voxels. The 2D path lacked the ability to effectively occlude rays or work with unseen edges. The 3D path could handle those, though it’s reconstruction logic lead to either puffy or ribbon cut results often with artifacts. I was able to apply concepts from both processes together in a new hybrid pipeline. This allowed for the best of both worlds. The result was smooth, with correct normals for backface culling, holes were filled, walls were smooth, and ribbon cuts were nowhere to be found. This made for much more correct 3D models, though initially at a cost. The previous implementation had performed some sequences quite slowly. With review and revision, I was able to increase the performance to a reasonable level while maintaining quality. Some improvements still need to be made. Some floating blobs have appeared, and doorways at times can be covered by the hole-filling step. These results may hopefully be improved in future iterations, and at this time represent much more manageable artifacts compared to the ribbon cuts and jagged lines found in the previous week. Bonus: Mask Comparison Tool CVAT Tool Mask Comparison Up until now I have manually reviewed changes to all of the data outputs. This means I have skimmed images to determine if masks are appearing where I expect them. Up until now this has been as needed, and fast enough for development. As the datasets I am using continue to grow, so does the opportunity to regress or fail certain steps in the process. It’s unreasonable to expect myself to manually review hundreds of images after each change attempted. So I devised a tools to assist with this process. The first tool is for mask comparison. The masking step is currently very important. It identifies things like sky, glass, and opening doorways. It is imperfect, and as I change things, sometimes it catches more doorways while also forgetting others. So that the results do not regress, I must check that they always improve across the board. To do this I setup the mask comparison tool. This tool takes the annotated data for all opening masks and compares it to the ground truth data. Ground truth data is the data I expect of the mask. It is data I had to manually create. Using an interesting annotation tool called CVAT I skimmed through hundreds of images and updated the AI generated annotations to represent what I expected them to be for each opening. Now I can compare future runs of the process against this data to see if we are getting closer or further from the expectation. I can also compare against previous results to see if a new improvement has improved overall, and if some positions had gotten worse. Bonus: Match Comparison Tool Match Comparison Phase Comparison Comparing matches is another important step to the process. In itself are three sub-steps: coarse, fine, and spatial matching. Each step has a variety of conditions that can change its output, and changing one can affect the later steps. Being able to see the accuracy of a run compared to the ground helps me identify where things need improving, as well as where things may have gotten worse over multiple runs. Both of these tools still require some manual effort to receive the benefit; Though far less than previously required. Further automating this process to include overnight jobs could be beneficial for testing various datasets as a whole. Summary This week’s results include a huge milestone. The workflow can make a high resemblance 3D model for multiple datasets with only equirect images. To better research and develop further improvements to accuracy and to curb hallucinations I began creating testing tools to monitor the accuracy of the masks and match results. These should help preserve the current quality of output while I make changes in order to improve it.

Week 21 2026: 3D Model Improvements Read More »

Week 20 2026: 3D Reconstruction

Accomplishments Defined Requirements Drafted Reconstruction Plan Reconstruction – Second Iteration At this point we have taken 360 images and accurately graphed them and estimated the 3D structure as point clouds, or a matrix of floating points in 3D. Each camera has its own point cloud, and these point clouds overlap. To take these floating clouds of points and make a solid room we must identify these overlaps, and other irregularities, in order to make the structure as realistic as possible. I reviewed the current state of the point clouds and the issues that came up and began reviewing tools and techniques. Through this I drafted a plan to attempt to use these tools to solve the problems currently faced and began to implement it. Defined Requirements Every problem comes with a goal. The better the goal is defined, the more likely we are to achieve it. The goal for the reconstruction step starts off simple: “Convert overlapping 3D point clouds into single 3D mesh”. In practice, it grows much more complicated than that. You might recall that I had previously designed a reconstruction phase in week 17. These fused point clouds looked great from many directions, but not all. A variety of issues had appeared. One phase used cube-mapping which ripped seams through entire rooms. Another stage attempted to vote on geometric confidence, and at times would delete information on the other sides of a wall or retain hallucinated geometry in the wrong spaces. I took note of these and other issues and began fresh in assessing what a viable reconstruction pipeline would look like and need to perform. I also took the time to identify concrete examples of issues already found in the point clouds. Naming these issues and identifying their locations will give us valuable tests we can perform to ensure the pipeline is performing to our expectations. The issues to be tested include the following: Hallucinated Depths: The current depth estimation approach struggles within doorways, and the information stretched into the other room is often false. Hallucinated Depth Hallucinated Depth Flat Doorways: Some doorways have no depth at all and appear flat along the wall. A virtual tour needs to see through these gaps in order to display hotspots. These flat surfaces must be opened. Flat Wall Flat Wall Sharp Edges With No Alternative: Cameras can only see information from their perspective. So flat walls or sides of cabinets not visible to the camera do not initially receive depth data and appear as holes in the geometry. If no other camera sees that surface we must make a best guess at what that surface could look like so that we don’t have surprise holes everywhere. Sharp Edge Cloud Sharp Edge 2D Sharp Edges With Alternative: If one camera sees a sharp edge, and is unsure what’s behind it, another camera might see what is actually behind the edge. In cases like this, the camera that can see the geometry should be able to merge it with the other camera. Edge Hole Replacement Geometry Misplaced Surfaces – Far Apart: Depth accuracy is only consistent until about 2.5m with the current tool. After that, depths may band or stretch. In a large room this can cause depths to appear in the wrong place or beyond a wall. We must identify where those walls are and either merge or remove the wrongly positioned data. Extended Wall Occluding Wall Misplaced Surfaces – Close Together: Even nearby cameras can estimate depths at slightly different positions. The same wall may appear at different positions all within a foot of each other. This geometry should be identified as representing the same thing and merged into one single wall. Overlapping Wall Overlapping Wall These tests, among other things, represent the challenges faced in converting estimated depth maps into accurate 3D objects. Drafted Reconstruction Plan: With the requirements outlined, along with given information attained through previous phases in the process, we can begin drafting our plan. This week’s process consisted of over 10 steps. I will summarize the concepts here. The majority of time doorways provide inaccurate information. They are the most important part of an image for positioning, however the depth is incredibly misplaced. I determined that removing it altogether made the most sense. This meant that no geometry through doorways would appear or interfere with better geometry on the other side. It was also important to ensure that cameras do not provide empty votes to geometry on the other side of doorways. When voting a line is drawn from the camera to the cubic (foot, meter, centimeter, etc) that is being voted on. Usually this stops at the camera’s own geometry, however, if we remove the geometry through a doorway it could go infinitely! To avoid that I added a step to create an invisible wall right in the opening of a mesh that blocks the line, or ray, when it reaches it. This ensures negative voting remains within the bounds of the known good geometry at all times. Doorway Stop Edges, as previously mentioned, often result in empty space or missing geometry. I added a step that stretches out a flat surface between all edges, like a curtain over a window. These points weakly cover the flat space. They will remain if nothing else determines them to be inaccurate. Close Camera Identifying occluding walls was another important step identified. When a far camera tries to draw its wall on the other side of a much closer camera’s wall, that closer camera should block the ray, or line, from going through it. This also applies to empty space voting, so that the far camera cannot remove geometry it should not be able to see in the first place. Getting surface normals remains an important step. This labels a direction that the point is facing. Ceilings face downward, floors face up, some things may face at an angle. This will help us identify two sides of a wall when they overlap, as ideally they

Week 20 2026: 3D Reconstruction Read More »

Week 19 2026: Positioning

Accomplishments Rethink Matching and Alignment New Spatial Awareness Checks Doorway Mask Improvement Doorway Side Check Test New Depth Estimator Bonus Frontend Updates Breaking down problems into smaller problems is often the right call, though sometimes even this can have unintended consequences. I had broken down the problem of finding camera positions into two steps: matching cameras, and aligning them together. With non-determinism and a lack of data, these steps alone could not guarantee a consistently correct outcome. I returned to look at the bigger picture and saw that positioning the cameras would benefit from a conceptual shift. Rather than matching then aligning, we focus on the original problem “posing” and with the combined data can determine better more consistent outcomes. This conceptual shift helped me identify three new spatial awareness checks, lead me to reviewing new AI models for depth and mask detection, and improving the doorway identification system. All these changes necessitated frontend updates for compatibility, and while performing these I also added new tools for visualizations. Rethink Matching and Alignment Through Doorway Matches Let’s review the data we have available. To start we begin with a sparse and unordered group of 360 equirectangular images. Some rooms of the property may only be covered by one image through a narrow doorway. We derive depth through an AI tool, and attempt to identify doorways with another. Both of these values vary in accuracy. We feed the images and masks through two image matchers. The coarse matcher is like a big rake and swiftly removes the majority of very different photos, and narrows them down to the ones that look most similar. The fine matcher is like a much smaller comb, that takes longer and removes the most un-alike images. Up until now we determined camera connections solely on the heuristic of how “alike” two camera scenes were. Here is where I needed to step back. Image matchers match images with likeness. That means two different bedrooms will score higher than a bedroom and a restaurant. The fine matching does a great job at determining if it’s the same bedroom, and that is why it works so well for matching rooms. It can match between rooms well too. It can tell every pixel that exists within the doorway. However, even with all the pixels in a doorway that may only be 15 percent of a scene, and two rooms with similar paint and tile may score a similar rate. Two similar looking rooms may very well be a better match than two different rooms connected by a door, as that is what matching is for, similarity, not structure. So I looked to clearly define what I want. I wanted matching, because similarity is an important heuristic that tells me the likelihood that two images are related. And I wanted those matches in their correct positions in 3D space. They couldn’t just be similar, they had to be an extension of the same space. So I decided to collapse the separate steps of “match then align” and rework them under the umbrella of “positioning”. Here the steps work towards the same goal, finding the proper position for each image. Matching still runs, but it no longer determines the final result. The non-deterministic heuristic of match confidence is now constrained to act as a gate, or threshold. Two highly confident matches will no longer compete to be the primary based on visual likeness, instead two visually alike images will proceed to the new stage spatial awareness. Spatial awareness matches images beyond their aesthetics. It uses the accurate area of estimated depth to align the two images. Based on this alignment it determines four things: camera proximity, line of sight, distance, and spatial agreement. Based on these we cull spatially inconsistent matches, and determine the most spatially similar and close camera positions to be the final topology. This change has made outputs more consistent and accurate across every dataset. Spatial Awareness Checks Far Camera Close Camera Determining a camera’s pose requires that the images are to be similar and the space encompassed in it to agree. A variety of values are used to determine overall spatial agreement, including camera proximity, line of sight, distance, and point cloud consistency. Camera proximity and line of sight do a great job to remove bad matches quickly. Once two point clouds are aligned it checks each camera’s expected position. Cameras that are in nearly the same position are often missed copies or bad alignments. The algorithms currently in use often find no translation for bad matches. This is a big indicator of connections that are misleading and should be removed. Camera line of sight is another great step in removing bad connections. Say you have two images around an L shaped corner. They are very similar, and have many matching points at the joint of the hallway. However, the cameras themselves cannot see each other, a wall lies between them. This puts the match on hold. The desired use case, virtual tours, should often include line of sight photography for the user to virtually “walk” through the space, and view the next room’s hotspot in 3D. This promotes cameras that can see each other, as well as swiftly removes outliers where two distinctly different rooms become aligned, and the cameras cannot connect at all. Images without line of sight remain as a fallback if no good matches are found. This can be useful in some instances where opened doors always block one entrance or the other. Camera distance tells us how far it thinks the cameras are apart. Once aligned we can estimate this. This helps us build a graph with connections that are closer, and more likely follows the path of the photographer throughout the shoot. This also helps us manage the inaccuracies of the depth estimation. The further from the camera, the less accurate the 3D point cloud. So closer cameras persist higher accuracy point clouds throughout the 3D scene. Spatial agreement, or point cloud

Week 19 2026: Positioning Read More »

Week 18 2026: Speed & UI

Accomplishments Caching Smart Bind Mounts Project Based Processing Improved Coarse Matching Bonus UI Design UI Implementation Manual Adjustments The time it takes to test a new feature can be substantial. In previous weeks we’ve seen single runs of a service take over 17 minutes, and get brought down to under 2. This week I looked for more opportunities to reduce time between attempts and completion. Through this process I was able to save substantial time building through caching and clever docker ordering. I was able to save time between datasets by preserving previous runs as projects. And during this search I was able to find an alternative tool that performed even stronger with my dataset for the same time. With improved development speeds across the board I was able to design, implement, and validate a user interface (UI) for this tool all within the same week. This UI further improves my ability to test features through project management and monitoring. What weeks ago took an hour of manual copy, paste, and wait, now takes just a few clicks and moments. Caching Without Cache With Cache Caching allows us to take work that was completed once before, and use it again. Our phones save our passwords, and save us time logging into sites. Imagine if instead, we had to fill out a captcha and come up with a new one every time. It’s better just to remember it, or let our phones do the work. I did this by caching the compiled torch, and changing the order for how docker copies the large (3+GB) AI models used in this project. Torch is a tool used in operating AI models. This tool is written generically, and can be applied across many different hardware devices. Often when it’s applied, it gets compiled. This means that the generic tool is broken down and translated to run a specific way for that hardware. This tool is multiple gigabytes (GB), so compiling this tool every time for the same hardware is very wasteful. I set up a cache and check system. After compilation it saves the information, then the next time the container is built it checks if there is a saved version, and if so, if it works for the current hardware. If yes, it loads that instead of compiling it again, saving seconds to minutes depending on the build. Another trick I learned relates to the way docker “layers” an image. Docker takes a variety of information and prepares it in a way that it can be run on any hardware using docker. In this process it stacks layers of information on top of each other. When building a docker image, it checks the previous image to see what layers changed. If it finds the lowest layer that changed, it rebuilds everything above it. Up until now, my large AI models appeared in the later docker layers. This means that just changing a single word in a log statement could take minutes to rebuild and test! To fix this, I moved the docker images up in the process. Smart Bind Mounts A smaller trick I employed is something called a “bind mount”. This hooks an existing docker container directly into the repository open in my code editor. This allows for certain files to be immediately available to the container upon saving, and further improves speed. Project Based Processing A notable quality of life improvement for this week is certainly project based processing. Previously I would only work with one dataset at a time. Running it through the long processes and verifying output. Afterwards I would need to erase all of the data, and revert back to square one in order to run a different dataset. This week that all changed. With a major refactor of every service, I was able to store and retrieve data within subdirectories linked to the specific project. This keeps data separate, and allows me to pick and choose what dataset I want to work with without removing any data. This preserves each unit of work completed up to this point per project. It also allows for more consistent results, as small changes between runs can be difficult to pinpoint when all previous steps had to be repeated. Improved Coarse Matching Weight Comparison Time To Complete While searching for alternative options to reduce the speed of the slowest service (matching), I came across the LoMa tool. An incredibly recent tool aimed at providing high quality matching across images at a speed comparable to LightGlue. This alone enticed me to try. With a quick implementation I was able to test a LightGlue replacement and to my surprise saw confidence levels during the coarse phase drastically improve. Close rooms appeared closer, and far rooms appeared further. This gave me confidence that it would improve the quality of top matches being fed into the fine matching stage. The times appeared nearly identical, so I chose to swap out for this new tool that appears to better fit the current use case. Note: Non-determinism While testing coarse matching I began to notice something interesting in the matching process as a whole. It appeared inconsistent. Run to run I would notice slight changes to the confidence scores. Previously I had believed this to be a result of my changes, though now it appeared to happen even without my intervention. I noticed that quite a few steps during this phase seemed to produce random, non-deterministic results. Non-determinism means that the results cannot be consistently determined. They vary, and may deviate slightly through each iteration. This quality occurred throughout the process. Spherical RANSAC was changing slightly, the distribution of points had potential to, and most of all the LoMa and RoMa passes. I found solutions to achieve determinism within these steps. Some of them, like the ones in RANSAC, I applied. However, the changes necessary to add determinism to LoMa and RoMa drastically reduce the speed. I contemplated if the speed adjustment was worth it for

Week 18 2026: Speed & UI Read More »

Week 17 2026: Match Speed

Accomplishments Fine Matching Speed Reduced By 90% Bonus Improved Pose Graph Accuracy Normal Estimation Depth Confidence Voting Fused Point Cloud As the quantity and capacity of datasets grows, so does the time it takes to test them all. To test under so many conditions can be cumbersome, and even moreso when one of the steps is taking over 15 minutes to complete. This week I set out with the main goal of increasing productivity by improving the fine matching speed as best as I can. I was able to reduce this by over 90%, while also having enough time to further improve the pose graph, and begin implementing the following meshing step. Fine Matching Speed Reduced By 90% Slow Speed High Speed By far the longest step in the pipeline is currently the fine matching process. Multiple factors exist which make this process so time consuming. We discussed many of these in Week 14. We solved the N-Choose-2 comparison problem by using the lightning fast LightGlue as an early coarse refinement step, narrowing down what might need to be compared. We also ran a spherical ransac operation (at the time, also reduced by 90% speed) on the matches to further confirm them. And then we progressed into a time intensive fine-matching stage with RoMa V2. RoMa V2 provides further confidence that the images are actually matches, which are the strongest, and where in the image they match. It is very time intensive, and through a variety of steps we were able to reduce this time by 90% as well. We were able to do this through a variety of steps. First we were able to introduce threading. Some tasks needed to be handled by the CPU, and some by the GPU. Previously, the GPU was stopping while the CPU completed its tasks, leaving time on the table. Now the GPU performs all its tasks, handing off the information as it goes for the CPU to pickup and work in tandem. We also receive a huge time improvement from using the fuse-local-corr, a feature provided by RoMa’s developer that was made specifically to provide faster time on certain hardware. We also added a change to compile the libraries to work with our specific hardware. This runs on startup, and takes around 2 minutes, but once performed, each subsequent run performs much faster! And lastly, we tweaked the way we distributed our matches across tiles to be more efficient. Through these changes, RoMa 2’s time was brought down significantly. It went from taking over 17 minutes, to just over 80 seconds. This greatly improves our ability to test matches, latter improvements, while also increasing efficiency as datasets continue to grow. Bonus: Improved Pose Graph Accuracy Floor Two Update RotatioFloor One Updaten Topology RotatioFloor One Updaten Topology RotatioFloor One Updaten Topology Last week we tested 4 datasets and only half of them appeared very accurate. After reviewing the results I was able to identify a variety of improvements to be performed. One being opening match rescue. In some cases openings from the coarse matching have very low confidence and do not enter into the second round of matching. Now, regardless of confidence, if a certain fraction of the matching points appear within an opening we promote it to the following round. This is purely additive, so that this low confidence candidate does not take the place of a higher confidence value in the limited number of positions being picked for round two. Instead, the number of items for round two expands. Now our doorways are preserved through coarse matching to be confirmed in fine matching. I also made some improvements to the distance calculation formula. Some matches had still been winning competitions even if more points were further. To combat this I added a maximum limit to the point depth. This means that extremely far points should not contribute to the confidence of the matched images. Skyscrapers, or fixtures beyond a long corridor, should not outweigh the features nearby, even if there are many more of them. Bonus: Normal Estimation Normal Map With so many kinds of estimation (Depth, Pose Graph, Keypoint, etc) its nice to finally hear about a “Normal” one. To those unfamiliar with 3D graphics the concept of ‘normals’ may sound anything but. Normals are a thing, every point in a 3D scene has them. You can imagine them as a little toothpick sticking out of the point in the direction it most ‘flatly’ faces. For example, the floor beneath your feat, those normals point straight up. The ceiling, those point straight down. Your glass of water, each degree points out in each direction. To reconstruct a 3D scene every point in our point clouds must have them too. Thankfully this is a common problem, and calculations exist to help in solving for the normals based on a depthmap and the camera’s position. Now the pipeline has accurate normals for the generated 3D depthmaps to be used in later steps. Bonus: Depth Confidence Voting Individual Maps Confidence Voting Results So far we have our matches, our pose graph, and our depth maps. When posed, most of the point clouds overlap. There are overlapping walls. There are false positives, where edges stretch out or small plants or items end up in different positions. Far away data like high ceilings or distant rooms may be inconsistent over few point clouds. There are even so many loose points that you can’t even fly through the aligned space to admire it visually! What can we do to reduce the quantity of false positive data across our depthmaps? We can vote. Since each of the camera’s knows where they are in relation to eachother, and each has it’s 2D depthmap available with the distance to each point, we can use these to construct a 3D matrix and compare across images what should be where. This way, if 3 cameras next to the bed all say the bed is in the center of the room,

Week 17 2026: Match Speed Read More »

Week 16 2026: Alignment Improvements

Accomplishments Depth Aware Alignment Precise Yaw Calculation Pose Graph Estimation Bonus Improved Opening Detection Pose Graph Accuracy Check Continuing the effort of refining the camera’s determined positions I considered what further improvements could be made to the alignment process. The current pose graph and topology continue to improve yet remain imperfect. I considered the data I have available and have confidence in. I have nearby depth data, highly accurate matching range, and accurate image segmentation. With this, I improved matching and topology by using depth to adjust match confidence. I also improved yaw rotation calculation with an epipolar calculation. And lastly I improved opening mask identification by combining offset images. Depth Aware Alignment No Cieling Floor Ceiling and Floor Pre Depth Topology Post Depth Topology Pre Depth Alignment Post Depth Alignment At first glance the topology ma look incredible, however under closer inspection issues appear. I noticed that nodes like Main Entrance 3, and Main Room 7 were connected. These are great matches about 10 feet apart; However, an even better match for the main entrance would be Main Room 2, which is about half the distance. With perfect metric depths at any distance this likely wouldn’t cause issue. However, the current depth estimation tool worsens drastically as distance grows. This means that matching points on the far end of either node may be 15-20 feet apart, and the points between them are more likely to reach the edge of the estimation’s accuracy. The closer the matches, the more likely the depths are to be accurate. I determined the best approach would be a two step process. The first step would be do run an additional match check on the ceiling and floor of each image. This is likely to further increase the number of nearby matches for each image and give us better grounding during alignment. The next step is to construct the topology based on the distance of each match. If one image has two pairs that have high visual overlap, and one has much closer overlap than the other, the closer match should prevail. With these additions, the topology improved, nearby matches became preferred, and alignment improved as well with the new floor and ceiling anchoring. Precise Yaw Calculation Rotation Visualization Rotation Topology Up until now alignment has been performed by a 9 degree of freedom (DOF) solver, constrained to 4 degrees of freedom (rotation: yaw, translation: x y z, scale: none). I believe that the data collected by the 360 camera, along with the confident matches identified, could accurately handle yaw rotation separate from translation. This decouples the rotation from the noisy depthmap data used in the previous 4-DOF alignment solving process. To achieve this I implemented an epipolar calculation process. This process is similar to the doorway recovery process described in Week 15. It takes both equirectangular images and projects them onto a sphere. The matched points are used to cast rays, or lines, out from the center of the sphere. The spheres are then rotated to determine the best rotation where both sets of rays intersect most accurately. This gives a highly accurate rotation, which can be seen by the improvements to the pose graph. Pose Graph Optimization While the yaw has grown incredibly accurate there is still potential for slight rotation inaccuracies to compound and drift with time. In an attempt to overcome this, I have implemented a step for Pose Graph Optimization. This takes the estimated rotations amongst the high confidence connections in the topology and adjusts all rotations to minimize the overall drift. Bonus: Improved Opening Detection Door Mask Door Edge Doorway Mistake Door Repaired I had noticed there were instances where images on either side of a doorway were not being connected in the minimum spanning tree (MST). Instead similar rooms were beating the doorways in the confidence rankings. This could be two bedrooms with the same sliding screen doors and tile floors, or it could be two balconies that see the same building in a distance. I began reviewing the matches and identified a variety of causes that could lead to this behavior. The first and most obvious cause was improper opening masking. For one equirect image the doorframe was split, half on the left edge and half on the right. The image recognition tool I use is not 360 aware and does not understand that what ends on the edge begins on the other side. Because of this it could not accurately identify that a doorway existed on both sides of the image. I was able to solve this by performing an additional mask step. I would offset the image, rotating it 180 degrees so that the door is in the center, and run the mask again. Then I would combine the masks to have the full picture. Then I noticed a second masking issue. Some openings were being confused for glass! This was probably due to the glossy, well kept tile floor and the cleanliness of the space. I tested a variety of solutions for this improvement with varying complexity. I learned that the image recognition model outputs individual object masks and confidence values. That means that each blob of glass gets its own level of confidence. I considered comparing confidences until I noticed that the doorways received far less confidence than the glass in this instance. Perhaps there is a more complex and interesting solution to be found with more effort. Though at this step I pivoted, deciding on the simpler solution of excluding glass from openings. Previously, glass masks took precedent over opening masks, removing features. Now, matches are preserved if they exist in an opening, and are still compared with the later processes. Some adjustments were made to the distance calculations as well. In week 15 we implemented the first attempt at depth aware matching and saw improvements for the interior of a space. This week we improved upon this implementation by drawing a hard cutoff for matches after a certain distance. This

Week 16 2026: Alignment Improvements Read More »

Week 15 2026: Camera Positioning

Accomplishments Depthmap improvements Removed Sky Flattened Glass Metric Depth Estimation Alignment improvements Distributed Sample of Matched Points Restricted Alignment Vectors (Yaw only, no scale) Doorway Recovery Sharp Edge Recovery Each step of the process relies on the accuracy of the data prepared by each of the previous steps. To most accurately determine the position of each camera, we must confirm that the matches are accurate in 3D space, and that they fit together just like they would in the real world. This week, we were able to greatly improve our alignment accuracy by using available data to refine the depthmaps, and identify low-accuracy depth matches and recover with an alternative alignment process. Depthmap Improvements Sky Removal EXR Sky Removal This simple step removes all points related to the sky for a given depthmap. This reduces the visual clutter when building the process while also simplifying the process of cleaning up the point clouds down the line. The empty void of the atmosphere has no accurate depth, therefore has no value for these stages of the process. Flattened Glass Glass Mask Hole In Glass Flattened Glass Clear glass in architecture is often used to see beyond a point. When looking at glass windows or walls in 3D it’s common to expect to see the solid structure, rather than sporadic holes in the space. I added a procedure to flatten the points identified as glass output by the depthmapping tool. The depthmapping tool itself did fairly well at identifying tinted glass as a flat object. I noticed that in some cases the glass would be ignored ad the depth would be generated through the glass. This inconsistency lead to surprising holes in the wall. Glass also can cause a slight refraction, or bend in the light going through it. This caused the holes to further corrupt the depth estimation with inaccurate refracted expectations. Flattening the glass worked well and even improved the overall shape of the room in some cases. This improves the accuracy of the individual point cloud and should reduce the time spent during some future cleanup phase. Metric Depth Estimation Side By Side Alignment Comparison Top Down Alignment Comparison Kitchen Misalignment Flat Floor Point Cloud 1 Flat Floor Point Cloud 2 Real World Step In Week 12’s report I discussed the tradeoffs between two tools used for depth estimation. One offered metric accuracy (most) of the time and had warped structure, the other had accurate structure but the metric accuracy varied widely. I moved forward with the second tool for its improved geometric accuracy, believing that someday metric accuracy could be applied with a scale adjustment. Today was that day. While ideating ways to improve my image alignments I reflected on what intrinsic data is available from the included photos. Upon this reflection I realized two important values I had taken for granted. The camera’s height and being level with the horizon (more on that in a following section). During shooting the camera remains at about eye level on a stand. This stand maintains roughly the same height throughout the capture process. Because the depthmaps tend to be accurate in the floor immediately below the camera, we can scale this distance to align with the value of the camera’s stand. This drastically improved the consistency across depthmaps. While the metric depth estimation drastically improves the process, some challenges still persist. The current stand in use is adjustable. That means over time, and from small changes and gravity the camera may accumulate centimeters of change in height. Also, slight angles like those of a hill or slope may lead to improper ground estimation and a larger inaccuracy to scale. Both of these can be solved with real-world adjustments to the stand and capture process. A fixed height stand with a tilt-able base could keep the camera level and over the pole while fixing its height to the ground with great accuracy. Another challenge this scale does not yet solve is that of incorrect depth over slight changes in elevation. For example, in the related images there exists a ground truth elevation change between the hard-wood floor and the kitchen area. The depthmaps for both the kitchen and the hardwood floor ignored this change and hallucinated a level area. Because of this, the floor for the kitchen appears confused, with two estimates, one from each camera. This may be something able to be rectified during a point cloud cleanup phase, though if unchecked has potential to accumulate scale drift over time. The metric depth estimation greatly improves the accuracy of depthmaps while drastically reducing the opportunity and magnitude of scale drift. All identified challenges have potential solutions to be later implemented. Alignment Improvements Distributed Matches Sparse Keypoints Real Pixel Density Sparse Alignment Distributed Alignment Sparse Points Distributed Points This week began with only the highest confidence matches being used to align two point clouds. Sometimes the highest confidence matches may group themselves in small areas. This makes alignment difficult. Think of it like a door. With hinges it has 1 or 2 matches on the same vertical line. A door still connects at those points whether its open or closed. Now think of a dead bolt, or a lock. When the door must be locked AND on its hinges, it only has one position. Instead of taking only the highest confidence matches, I decided to use a distribution of high confidence matches. In most cases this spread the matches across a wider area. While the matches may be less perfect, the overall structure will be more perfect. For example we can look at the secondary entrance and the shoe closet. The sparse, highest confidence, keypoints cluster in a few small straight lines. Then we look at the dense matching. Here the points cluster along a much wider field of view, almost the whole horizon. By transitioning from sparse, highest confidence, keypoints to a distributed selection of high confidence keypoints I greatly improved the degree of accuracy when aligning two point

Week 15 2026: Camera Positioning Read More »

Week 14 2026: Match Filtering

Accomplishments Improved Matching Efficiency Improved Matching Quality Minimum Spanning Tree With Maximum Confidence Feature Masking Manually curating a few known-good images to be matched and processed is quick and relatively immune to false positives. However, applying that same manual process to 50+ images grinds workflows to a halt and quickly becomes strewn with incorrect matches. Without GPS or sequential metadata, chaotic image datasets suffer from the exponential performance scaling of any-to-any (n-choose-2) comparisons. This is especially painful for CPU-bound processes. Furthermore, this expanded dataset compounds the likelihood of introducing false positives that can slip through current feature-matching algorithms. To effectively work with large datasets, we need to generate a minimum spanning tree where all images are connected, with high confidence regarding what each image matches to and exactly where those matches are located. Achieving this requires a multi-step approach: reducing CPU operations, iteratively filtering matches, and masking out corrupting environmental features like clouds and glass. Improved Matching Efficiency AnyLoc Reduced Time Lightglue Time AnyLoc islands n choose 2 graph The fundamental purpose of the matching stage is to look at every photo and compare it against every other photo to find the best connections. Historically, our pipeline handled this in three steps: ALIKED (extracting specific pixels as features), LightGlue (comparing extracted pixels across every image in an any-to-any configuration), and Spherical RANSAC (converting images to spheres to validate matches without distortion). The problem? Matching any one photo to every other photo grows exponentially as the dataset scales. A small batch of just 52 photos requires over 1,300 unique comparisons. Our primary bottleneck was Spherical RANSAC, a heavily CPU-bound operation. By switching from a standard Python library to a just-in-time (JIT) compiled version, we were able to execute the math directly on the hardware rather than pushing it through a Python abstraction layer. This single optimization cut the operation time by over 90%, bringing a 20+ minute process for 52 images down to just 2 or 3 minutes. We also theorized using AnyLoc as a pre-check. AnyLoc is a “Global Descriptor”; instead of extracting individual features, it generalizes the image (e.g., turning a photo into a mathematical representation of “Inside, big bedroom, beige walls”). Because it runs entirely on the GPU, it is blazing fast, tearing through the 52 images in just 35 seconds. However, AnyLoc tends to group high-confidence matches by type rather than topology. It creates disconnected “islands” of similar rooms (like matching two separate bedrooms together) rather than finding the connective path (like a hallway leading to a bedroom). While AnyLoc is incredibly fast and worth considering for future iterations, its tendency to falsely combine distinct but similar rooms means it requires careful handling before full inclusion in the pipeline. Ultimately, by removing the abstraction bottleneck of our CPU process, we successfully reduced our overall matching speed by 95%. Quality Improvement Lightglue False Positive Ransac False Positive RoMa2 Doorway Speed is irrelevant if the feature matching yields incorrect results. The matches generated by LightGlue and Spherical RANSAC often include false positives. Two identical doors, two sides of a symmetrical feature, or even similar moldings and furniture in completely separate rooms can trick standard matchers into drawing a connection. To solve this, we looked to RoMa V2 (Robust Matching), a dense feature matcher. What sets RoMa apart is its ability to rank every pixel in the image by confidence and match them at extreme angles. When we pass our challenging image pairs through RoMa V2, the false positive matches still appear, but they are flagged with remarkably low confidence scores and lack shared pixel density. This allows us to aggressively filter out the low-confidence noise while retaining dense, high-confidence pixel groups—even when looking through narrow doorways. The trade-off for this accuracy is compute time. Dense matching requires dense math. RoMa V2 takes significantly longer than LightGlue, scaling exponentially and taking over 5 minutes for a 52-image dataset. Because of this, RoMa V2 is not a standalone silver bullet. It cannot process the entire dataset in a reasonable timeframe on its own, meaning we still rely on LightGlue as an early-stage filter to narrow down the workload before RoMa V2 confirms the final matches. Minimum Spanning Tree & Maximum Confidence Path AnyLoc Topology LightGlue Topology LightGlue + RoMa2 If we don’t confirm that every point cloud has valid matches and belongs to a unified tree of connections, we end up with incorrect geometry and floating, disconnected islands. LightGlue is reasonably fast, but its confidence scores for a correct match through a doorway might be identical to the score of a false positive between two disconnected rooms with similar moldings. RoMa V2 is the slowest, but it is the most accurate and provides the highest confidence for tricky bridges like doorways. To get the minimum spanning tree with the highest confidence paths as quickly as possible, we re-architected the workflow to play to the strengths of both tools. First, we use LightGlue to perform the any-to-any matching and rank the pairs by confidence, narrowing down the field of likely candidates. We then intentionally skip the long Spherical RANSAC step, as RoMa V2 will handle match quality confirmation. Next, we determine the top-ranked confidence values and begin building our topological tree. If any disconnected islands or orphaned nodes exist, we expand the tree to include slightly lower confidence values until a continuous path connects everything. Finally, RoMa V2 processes this curated list of rankings. It confirms the confidence values, outputs high-confidence dense matching points for each pair, and validates the bridges through doorways. This dual-model approach yields a unified minimum spanning tree connected entirely by maximum-confidence paths. Feature Masking Bathroom Masked Matches Balcony Masked Matches Main Room Masked Matches Bathroom Mask Balcony Mask Main Room Mask Glass, mirrors, and the sky are notorious problem-causers in computer vision. To an algorithm, a mirror on a wall often looks like a doorway into a completely identical room. A glass window might not register as a solid surface, but rather a hole leading to

Week 14 2026: Match Filtering Read More »

Week 13 2026: Modular and Repeatable

Accomplishments Built Docker Images Refactored Proof of Concept Bonus Depth Model Comparison Many Image Point Clouds This week’s goal was to take a rapidly built prototype and convert it into a foundation that’s simpler to deploy and adjust. I set up the files to work with a tool that packages them in a way that they can easily be run on any device. I also rewrote much of the code so that steps can happen independent of each other. As a bonus, I was able to begin modifying the components to accept more than 2 images for 3D reconstruction, and generate a combined point cloud. Built Docker Images Docker Images The goal of Docker is compatibility. It packages a program along with all the tools it needs to run on any system. It’s kind of like having a universal adapter where, no matter where you go you can plug into the wall and charge. A docker image is the output. This includes the program and all the tools needed to run it anywhere. Creating the docker image comes with some steps and considerations. This includes writing out which tools the program will need to run. Since our 360 to 3D pipeline includes multiple smaller programs, it is common for their needs to overlap. To satisfy this we use something called a “base image” which includes all of the shared tools which may be required. Setting up the base image can be tedious, as it takes time to determine the best version numbers of each tool which are compatible with all the rest. For small tools quantified by Kilobytes or Megabytes, this may not be concerning. However, multiple programs in this pipeline rely on tools that are Gigabytes in size! Reusing these is very important to preserve efficiency and save on system memory. Another consideration is shared storage. Docker images by default write within their own spaces and do not have access to each other. This lead me to creating a “shared volume”. With a shared volume, each image can contribute, read, and write their own data to a shared space. This is helpful in large data scenarios (like 360 photography) where each program may require access to the full dataset. Preparing the Docker images helps transition a rapidly produced proof of concept into something less brittle, that can be preserved and recreated on any system running Docker. Refactored Proof of Concept A rapidly made proof of concept (POC) often includes shortcuts to more efficiently determine the effectiveness of a process. Upon approval, a new design should be constructed which includes the slower developed supportive steps which improve the ability to efficiently make changes and preserve functionality going forward. I converted my POC from sequential “get the job done” steps into a more modular “accomplish this step” approach. I added application program interface (API) endpoints to the program. Instead of manually updating a file and running the program once, I can now call the program any time and submit different information at that time. This gives swift flexibility to where, when, and how I interact with it. This also gives me a better opportunity to multi-task. I set up each endpoint to begin a job. Instead of waiting for the sequential job to be completed, I can now periodically request a status update from a different API endpoint. The benefits of this will be much appreciated as processing time increases in line with the size of the data set being uploaded. Breaking down sequential processes into atomic steps decouples logic and improves cohesion. This ensures that each atomic step only relies on what it absolutely needs and only affects what’s crucial. Setting a foundation in this manner permits future steps to be added with narrow impact to the other programs. Bonus: Depth Model Comparison Model Comparison Combined Mesh Comparison Strange Projection This week I experimented with the different machine learning models in the tool which calculates the depth for each photo. One model seemed to offer improved depth detection through doorways. This will be helpful when feature matching must take place beyond a shared doorframe. Overall the newly tested model appears to offer similar results. A few small oddities have been identified, like a strange bubble appearing around the entryway table, and a broad stretching happening to a small plastic standee placed a shelf. I believe a step to clean up, reduce, and combine point clouds will be needed. While these appear to be new issues, I believe they may be solved by a clean up phase and the benefits of expanded doorway depth may be retained. Bonus: Depth Model Comparison 3 Camera Pose 3 Image Point Cloud N Choose 2 With the improved foundation afforded by the containerization (Docker) and refactoring I was able to more swiftly modify the process to remove the previous limit of 2 images for pairing. A myriad of minor tweaks and process changes were necessary for this step. The new layout made it quick to identify and apply the required changes. The point cloud result excites me. This visualization represents the processes ability to not only combine two 3D projections, but also refine and expand that combination as more are added. Making these changes also called to mind the many changes that are necessary to perform this process efficiently as the number of photos scales. Steps like downsampling and generating depth maps scale 1:1 with the number of photos uploaded. If it takes a photo 0.1 seconds (s) to generate a depthmap, each extra photo will be an extra 0.1s. This scaling is not so true for our feature matching step. Combining two photos, A and B, takes 1 step, matching A-to-B. Combining three photos takes 3 steps, matching A-to-B, A-to-C, and C-to-B. This can be written as a formula, and simplified to N * (N – 1)/2. That is the total number of matches for the N number of items. This grows incredibly fast. While 4 items only need 2 more matching steps

Week 13 2026: Modular and Repeatable Read More »

Week 12 2026: Proof of Concept 360 -> 3D

Accomplishments 360 -> 3D End-to-End Proof of Concept Recent weeks of research have outlined the problems that must be solved to deliver a pipeline where sparse 360 imagery can be accurately reconstructed into a 3D mesh. A variety of tools and algorithms were compared in order to support this process. Ultimately an initial pipeline was developed where with just two sparse 360 images a merged 3D mesh could be completed. 360 -> 3D End-to-End Proof of Concept Workflow The whole process of converting 360 images into a 3D model requires a variety of smaller problems to be solved. The workflow intends to solve the following problems: Image downsampling Feature extraction Feature Matching Match Filtering Depthmap Estimation + Point Cloud Generation Convert matches to 3D Align point clouds Convert Point Cloud to Mesh Image Downsampling Image downsampling means to shrink an images size. This step is required to conform to a downstream tool’s image size requirement. Downsampling is a double edged sword. In this case, it greatly reduces processing time; However, it also reduces the detail used in identifying features and calculating depth. A future iteration may improve on this by downsampling later in the process. Feature Extraction Feature Extraction 1 Feature Extraction 1 The feature extraction step looks for unique areas of pixels across an image. These coordinates are later used to compare two images and see what features may overlap. Feature Matching Matched Features The features calculated in the previous step are compared across two images. The algorithm does it’s best to find groups of pixels that appear similar across the two images. In the example you might notice that some of the matches appear incorrect. The ceiling’s light fixture is very noticeable. The camera moves to the opposite side of the light. Because the fixture is symmetrical, the same features are found on both sides and incorrectly matched by the matcher. A smaller example can be seen on the black sliding doors. Match Filtering Filtered Matches Special feature filtering techniques exist for equirectangular images which help to filter out false positives like those we found in the previous step. The result is far fewer matches, and a far higher percentage of valid matches. The density of matches on the main door and wall art largely remain, while the ceiling light and sliding door are almost entirely removed. Quality is preferred to quantity in this step, and will help us better align the output in a later stage. Depthmap Estimation + Point Cloud Generation Comparing Techniques In deciding which tool to use for depthmap and point cloud generation I began by comparing two tools. Tool A was built to accept equirectangular images and output point clouds with reliable metric depth. Tool B was built to accept equirectangular images and output point clouds with reliable relative depth. Testing Tool A Top View Scale Comparison Side View Scale Comparison Storage Closet Washroom Comparison Main Room Comparison Depthmap Main Entrance Warping I ran a variety of equirectangular images through Tool A and laid out the output to visually assess the results. From the top and side view I notice surprising consistency in scale across the images. Each room appears to be the same size compared to other projections from the same room. Take note of the two examples of the storage closet washroom, as we will compare these with the output from Tool B. There was an example where the scale of the room appeared to be almost half that of other similar images. Take a look at the living room comparison. The larger point clouds were consistent with the scale estimates across all the other point clouds. The smaller ones appear as a noticeable outlier. These photos were taken on the stairs at a different elevation from all the other photos. My best guess is that this is the result of a bias introduced in the AI model. Perhaps the AI model’s training data heavily favored images taken from the floor of a room compared to ones on a staircase. If this is true, the output may assume that the ceiling and floor are equally close and may pull them to the size of the average room. Another oddity shows if we look at the point clouds for the main entrance. The walls bow in considerably. From close up, these warped angles appear across the entire output of point clouds. Testing Tool B Side View Scale Comparison Storage Closet Washroom Comparison Depthmap Running a similar series of rectangular images through Tool B we received a series of results with qualities considerably different than Tool A. The scale across images at times appeared consistent; However, they often were not. Look at the storage closet washroom in this example. The size of one is almost half the size of the other! On the bright side, we can see across images that there is far less warping and walls appear straight. Verdict ToolA Warping Tool B Consistency The goal of the depth estimation step is to prepare an estimate of what the geometry around the camera should look like in 3D. The metric accuracy of Tool A would support a result where near-accurate measurements could be made against the 3D geometry. To get this benefit we would have to overcome two problems. We would need to resolve the warping AND ensure scale consistency even across outlier positions like the staircase. For Tool B we cannot rely on metric accuracy. We can rely on geometric consistency and would only need to solve the scale problem in order to use them. I decided to proceed with Tool B as it is able to satisfy the requirements of the step with the least added steps required. Metric accuracy would be a great feature, and can be added in the future as tool B’s output should give us a 1:x scale replica in most cases. Convert Matches to 3D EXR Depthmap Matching Points in 3D The matching points output by an earlier step represent where points match

Week 12 2026: Proof of Concept 360 -> 3D Read More »

Recent Post

  • All Post
  • Immersive Media
  • Photography
  • Software Development
  • Travel
  • Web Design
  • Week Reviews

© 2025 Justin Codair