Accomplishments
- Depth Aware Alignment
- Precise Yaw Calculation
- Pose Graph Estimation
Bonus
- Improved Opening Detection
- Pose Graph Accuracy Check
Continuing the effort of refining the camera’s determined positions I considered what further improvements could be made to the alignment process. The current pose graph and topology continue to improve yet remain imperfect. I considered the data I have available and have confidence in. I have nearby depth data, highly accurate matching range, and accurate image segmentation. With this, I improved matching and topology by using depth to adjust match confidence. I also improved yaw rotation calculation with an epipolar calculation. And lastly I improved opening mask identification by combining offset images.
Depth Aware Alignment
At first glance the topology ma look incredible, however under closer inspection issues appear. I noticed that nodes like Main Entrance 3, and Main Room 7 were connected. These are great matches about 10 feet apart; However, an even better match for the main entrance would be Main Room 2, which is about half the distance.
With perfect metric depths at any distance this likely wouldn’t cause issue. However, the current depth estimation tool worsens drastically as distance grows. This means that matching points on the far end of either node may be 15-20 feet apart, and the points between them are more likely to reach the edge of the estimation’s accuracy. The closer the matches, the more likely the depths are to be accurate.
I determined the best approach would be a two step process. The first step would be do run an additional match check on the ceiling and floor of each image. This is likely to further increase the number of nearby matches for each image and give us better grounding during alignment. The next step is to construct the topology based on the distance of each match. If one image has two pairs that have high visual overlap, and one has much closer overlap than the other, the closer match should prevail.
With these additions, the topology improved, nearby matches became preferred, and alignment improved as well with the new floor and ceiling anchoring.
Precise Yaw Calculation
Up until now alignment has been performed by a 9 degree of freedom (DOF) solver, constrained to 4 degrees of freedom (rotation: yaw, translation: x y z, scale: none). I believe that the data collected by the 360 camera, along with the confident matches identified, could accurately handle yaw rotation separate from translation. This decouples the rotation from the noisy depthmap data used in the previous 4-DOF alignment solving process.
To achieve this I implemented an epipolar calculation process. This process is similar to the doorway recovery process described in Week 15. It takes both equirectangular images and projects them onto a sphere. The matched points are used to cast rays, or lines, out from the center of the sphere. The spheres are then rotated to determine the best rotation where both sets of rays intersect most accurately. This gives a highly accurate rotation, which can be seen by the improvements to the pose graph.
Pose Graph Optimization
While the yaw has grown incredibly accurate there is still potential for slight rotation inaccuracies to compound and drift with time. In an attempt to overcome this, I have implemented a step for Pose Graph Optimization. This takes the estimated rotations amongst the high confidence connections in the topology and adjusts all rotations to minimize the overall drift.
Bonus: Improved Opening Detection
I had noticed there were instances where images on either side of a doorway were not being connected in the minimum spanning tree (MST). Instead similar rooms were beating the doorways in the confidence rankings. This could be two bedrooms with the same sliding screen doors and tile floors, or it could be two balconies that see the same building in a distance. I began reviewing the matches and identified a variety of causes that could lead to this behavior.
The first and most obvious cause was improper opening masking. For one equirect image the doorframe was split, half on the left edge and half on the right. The image recognition tool I use is not 360 aware and does not understand that what ends on the edge begins on the other side. Because of this it could not accurately identify that a doorway existed on both sides of the image. I was able to solve this by performing an additional mask step. I would offset the image, rotating it 180 degrees so that the door is in the center, and run the mask again. Then I would combine the masks to have the full picture.
Then I noticed a second masking issue. Some openings were being confused for glass! This was probably due to the glossy, well kept tile floor and the cleanliness of the space. I tested a variety of solutions for this improvement with varying complexity. I learned that the image recognition model outputs individual object masks and confidence values. That means that each blob of glass gets its own level of confidence. I considered comparing confidences until I noticed that the doorways received far less confidence than the glass in this instance. Perhaps there is a more complex and interesting solution to be found with more effort. Though at this step I pivoted, deciding on the simpler solution of excluding glass from openings. Previously, glass masks took precedent over opening masks, removing features. Now, matches are preserved if they exist in an opening, and are still compared with the later processes.
Some adjustments were made to the distance calculations as well. In week 15 we implemented the first attempt at depth aware matching and saw improvements for the interior of a space. This week we improved upon this implementation by drawing a hard cutoff for matches after a certain distance. This will help keep balconies from matching with eachother due to distant architecture over doorways.
Bonus: Pose Graph Accuracy Check:
Throughout this week I reviewed the current state of the pipeline’s accuracy against 4 datasets available to me. I am proud to say the current pipeline provides a fully accurate pose graph for 2 of the 4 datasets, with the final 2 very near complete.
I ran this sequence against the photos taken during my Pattaya and Chiang Mai stays. Both recreated perfectly, except for the few photos taken at alternative heights (bed and bathtub views). The first floor of Supalai Place is incredibly close. And the second floor was mostly resolved after the improved opening detection updates.
Summary
This week we continue to improve the pose graph by using the meaningful and accurate data available to us. Nearby depths are now used to better compare matches. Depth overall is ignored in favor of angles, and used to more accurately measure rotation. And each rotation is now used to better distribute error across the graph. Improvements were also made to make openings more assertive.
Seeing the progress so far, and the ability to accurately construct accurate pose graphs across a few distinct datasets fills me with determination and courage to see how far we can go!


