Justin

Week 14 2026: Match Filtering

Accomplishments Improved Matching Efficiency Improved Matching Quality Minimum Spanning Tree With Maximum Confidence Feature Masking Manually curating a few known-good images to be matched and processed is quick and relatively immune to false positives. However, applying that same manual process to 50+ images grinds workflows to a halt and quickly becomes strewn with incorrect matches. Without GPS or sequential metadata, chaotic image datasets suffer from the exponential performance scaling of any-to-any (n-choose-2) comparisons. This is especially painful for CPU-bound processes. Furthermore, this expanded dataset compounds the likelihood of introducing false positives that can slip through current feature-matching algorithms. To effectively work with large datasets, we need to generate a minimum spanning tree where all images are connected, with high confidence regarding what each image matches to and exactly where those matches are located. Achieving this requires a multi-step approach: reducing CPU operations, iteratively filtering matches, and masking out corrupting environmental features like clouds and glass. Improved Matching Efficiency AnyLoc Reduced Time Lightglue Time AnyLoc islands n choose 2 graph The fundamental purpose of the matching stage is to look at every photo and compare it against every other photo to find the best connections. Historically, our pipeline handled this in three steps: ALIKED (extracting specific pixels as features), LightGlue (comparing extracted pixels across every image in an any-to-any configuration), and Spherical RANSAC (converting images to spheres to validate matches without distortion). The problem? Matching any one photo to every other photo grows exponentially as the dataset scales. A small batch of just 52 photos requires over 1,300 unique comparisons. Our primary bottleneck was Spherical RANSAC, a heavily CPU-bound operation. By switching from a standard Python library to a just-in-time (JIT) compiled version, we were able to execute the math directly on the hardware rather than pushing it through a Python abstraction layer. This single optimization cut the operation time by over 90%, bringing a 20+ minute process for 52 images down to just 2 or 3 minutes. We also theorized using AnyLoc as a pre-check. AnyLoc is a “Global Descriptor”; instead of extracting individual features, it generalizes the image (e.g., turning a photo into a mathematical representation of “Inside, big bedroom, beige walls”). Because it runs entirely on the GPU, it is blazing fast, tearing through the 52 images in just 35 seconds. However, AnyLoc tends to group high-confidence matches by type rather than topology. It creates disconnected “islands” of similar rooms (like matching two separate bedrooms together) rather than finding the connective path (like a hallway leading to a bedroom). While AnyLoc is incredibly fast and worth considering for future iterations, its tendency to falsely combine distinct but similar rooms means it requires careful handling before full inclusion in the pipeline. Ultimately, by removing the abstraction bottleneck of our CPU process, we successfully reduced our overall matching speed by 95%. Quality Improvement Lightglue False Positive Ransac False Positive RoMa2 Doorway Speed is irrelevant if the feature matching yields incorrect results. The matches generated by LightGlue and Spherical RANSAC often include false positives. Two identical doors, two sides of a symmetrical feature, or even similar moldings and furniture in completely separate rooms can trick standard matchers into drawing a connection. To solve this, we looked to RoMa V2 (Robust Matching), a dense feature matcher. What sets RoMa apart is its ability to rank every pixel in the image by confidence and match them at extreme angles. When we pass our challenging image pairs through RoMa V2, the false positive matches still appear, but they are flagged with remarkably low confidence scores and lack shared pixel density. This allows us to aggressively filter out the low-confidence noise while retaining dense, high-confidence pixel groups—even when looking through narrow doorways. The trade-off for this accuracy is compute time. Dense matching requires dense math. RoMa V2 takes significantly longer than LightGlue, scaling exponentially and taking over 5 minutes for a 52-image dataset. Because of this, RoMa V2 is not a standalone silver bullet. It cannot process the entire dataset in a reasonable timeframe on its own, meaning we still rely on LightGlue as an early-stage filter to narrow down the workload before RoMa V2 confirms the final matches. Minimum Spanning Tree & Maximum Confidence Path AnyLoc Topology LightGlue Topology LightGlue + RoMa2 If we don’t confirm that every point cloud has valid matches and belongs to a unified tree of connections, we end up with incorrect geometry and floating, disconnected islands. LightGlue is reasonably fast, but its confidence scores for a correct match through a doorway might be identical to the score of a false positive between two disconnected rooms with similar moldings. RoMa V2 is the slowest, but it is the most accurate and provides the highest confidence for tricky bridges like doorways. To get the minimum spanning tree with the highest confidence paths as quickly as possible, we re-architected the workflow to play to the strengths of both tools. First, we use LightGlue to perform the any-to-any matching and rank the pairs by confidence, narrowing down the field of likely candidates. We then intentionally skip the long Spherical RANSAC step, as RoMa V2 will handle match quality confirmation. Next, we determine the top-ranked confidence values and begin building our topological tree. If any disconnected islands or orphaned nodes exist, we expand the tree to include slightly lower confidence values until a continuous path connects everything. Finally, RoMa V2 processes this curated list of rankings. It confirms the confidence values, outputs high-confidence dense matching points for each pair, and validates the bridges through doorways. This dual-model approach yields a unified minimum spanning tree connected entirely by maximum-confidence paths. Feature Masking Bathroom Masked Matches Balcony Masked Matches Main Room Masked Matches Bathroom Mask Balcony Mask Main Room Mask Glass, mirrors, and the sky are notorious problem-causers in computer vision. To an algorithm, a mirror on a wall often looks like a doorway into a completely identical room. A glass window might not register as a solid surface, but rather a hole leading to

Week 14 2026: Match Filtering Read More »

Week 13 2026: Modular and Repeatable

Accomplishments Built Docker Images Refactored Proof of Concept Bonus Depth Model Comparison Many Image Point Clouds This week’s goal was to take a rapidly built prototype and convert it into a foundation that’s simpler to deploy and adjust. I set up the files to work with a tool that packages them in a way that they can easily be run on any device. I also rewrote much of the code so that steps can happen independent of each other. As a bonus, I was able to begin modifying the components to accept more than 2 images for 3D reconstruction, and generate a combined point cloud. Built Docker Images Docker Images The goal of Docker is compatibility. It packages a program along with all the tools it needs to run on any system. It’s kind of like having a universal adapter where, no matter where you go you can plug into the wall and charge. A docker image is the output. This includes the program and all the tools needed to run it anywhere. Creating the docker image comes with some steps and considerations. This includes writing out which tools the program will need to run. Since our 360 to 3D pipeline includes multiple smaller programs, it is common for their needs to overlap. To satisfy this we use something called a “base image” which includes all of the shared tools which may be required. Setting up the base image can be tedious, as it takes time to determine the best version numbers of each tool which are compatible with all the rest. For small tools quantified by Kilobytes or Megabytes, this may not be concerning. However, multiple programs in this pipeline rely on tools that are Gigabytes in size! Reusing these is very important to preserve efficiency and save on system memory. Another consideration is shared storage. Docker images by default write within their own spaces and do not have access to each other. This lead me to creating a “shared volume”. With a shared volume, each image can contribute, read, and write their own data to a shared space. This is helpful in large data scenarios (like 360 photography) where each program may require access to the full dataset. Preparing the Docker images helps transition a rapidly produced proof of concept into something less brittle, that can be preserved and recreated on any system running Docker. Refactored Proof of Concept A rapidly made proof of concept (POC) often includes shortcuts to more efficiently determine the effectiveness of a process. Upon approval, a new design should be constructed which includes the slower developed supportive steps which improve the ability to efficiently make changes and preserve functionality going forward. I converted my POC from sequential “get the job done” steps into a more modular “accomplish this step” approach. I added application program interface (API) endpoints to the program. Instead of manually updating a file and running the program once, I can now call the program any time and submit different information at that time. This gives swift flexibility to where, when, and how I interact with it. This also gives me a better opportunity to multi-task. I set up each endpoint to begin a job. Instead of waiting for the sequential job to be completed, I can now periodically request a status update from a different API endpoint. The benefits of this will be much appreciated as processing time increases in line with the size of the data set being uploaded. Breaking down sequential processes into atomic steps decouples logic and improves cohesion. This ensures that each atomic step only relies on what it absolutely needs and only affects what’s crucial. Setting a foundation in this manner permits future steps to be added with narrow impact to the other programs. Bonus: Depth Model Comparison Model Comparison Combined Mesh Comparison Strange Projection This week I experimented with the different machine learning models in the tool which calculates the depth for each photo. One model seemed to offer improved depth detection through doorways. This will be helpful when feature matching must take place beyond a shared doorframe. Overall the newly tested model appears to offer similar results. A few small oddities have been identified, like a strange bubble appearing around the entryway table, and a broad stretching happening to a small plastic standee placed a shelf. I believe a step to clean up, reduce, and combine point clouds will be needed. While these appear to be new issues, I believe they may be solved by a clean up phase and the benefits of expanded doorway depth may be retained. Bonus: Depth Model Comparison 3 Camera Pose 3 Image Point Cloud N Choose 2 With the improved foundation afforded by the containerization (Docker) and refactoring I was able to more swiftly modify the process to remove the previous limit of 2 images for pairing. A myriad of minor tweaks and process changes were necessary for this step. The new layout made it quick to identify and apply the required changes. The point cloud result excites me. This visualization represents the processes ability to not only combine two 3D projections, but also refine and expand that combination as more are added. Making these changes also called to mind the many changes that are necessary to perform this process efficiently as the number of photos scales. Steps like downsampling and generating depth maps scale 1:1 with the number of photos uploaded. If it takes a photo 0.1 seconds (s) to generate a depthmap, each extra photo will be an extra 0.1s. This scaling is not so true for our feature matching step. Combining two photos, A and B, takes 1 step, matching A-to-B. Combining three photos takes 3 steps, matching A-to-B, A-to-C, and C-to-B. This can be written as a formula, and simplified to N * (N – 1)/2. That is the total number of matches for the N number of items. This grows incredibly fast. While 4 items only need 2 more matching steps

Week 13 2026: Modular and Repeatable Read More »

Week 12 2026: Proof of Concept 360 -> 3D

Accomplishments 360 -> 3D End-to-End Proof of Concept Recent weeks of research have outlined the problems that must be solved to deliver a pipeline where sparse 360 imagery can be accurately reconstructed into a 3D mesh. A variety of tools and algorithms were compared in order to support this process. Ultimately an initial pipeline was developed where with just two sparse 360 images a merged 3D mesh could be completed. 360 -> 3D End-to-End Proof of Concept Workflow The whole process of converting 360 images into a 3D model requires a variety of smaller problems to be solved. The workflow intends to solve the following problems: Image downsampling Feature extraction Feature Matching Match Filtering Depthmap Estimation + Point Cloud Generation Convert matches to 3D Align point clouds Convert Point Cloud to Mesh Image Downsampling Image downsampling means to shrink an images size. This step is required to conform to a downstream tool’s image size requirement. Downsampling is a double edged sword. In this case, it greatly reduces processing time; However, it also reduces the detail used in identifying features and calculating depth. A future iteration may improve on this by downsampling later in the process. Feature Extraction Feature Extraction 1 Feature Extraction 1 The feature extraction step looks for unique areas of pixels across an image. These coordinates are later used to compare two images and see what features may overlap. Feature Matching Matched Features The features calculated in the previous step are compared across two images. The algorithm does it’s best to find groups of pixels that appear similar across the two images. In the example you might notice that some of the matches appear incorrect. The ceiling’s light fixture is very noticeable. The camera moves to the opposite side of the light. Because the fixture is symmetrical, the same features are found on both sides and incorrectly matched by the matcher. A smaller example can be seen on the black sliding doors. Match Filtering Filtered Matches Special feature filtering techniques exist for equirectangular images which help to filter out false positives like those we found in the previous step. The result is far fewer matches, and a far higher percentage of valid matches. The density of matches on the main door and wall art largely remain, while the ceiling light and sliding door are almost entirely removed. Quality is preferred to quantity in this step, and will help us better align the output in a later stage. Depthmap Estimation + Point Cloud Generation Comparing Techniques In deciding which tool to use for depthmap and point cloud generation I began by comparing two tools. Tool A was built to accept equirectangular images and output point clouds with reliable metric depth. Tool B was built to accept equirectangular images and output point clouds with reliable relative depth. Testing Tool A Top View Scale Comparison Side View Scale Comparison Storage Closet Washroom Comparison Main Room Comparison Depthmap Main Entrance Warping I ran a variety of equirectangular images through Tool A and laid out the output to visually assess the results. From the top and side view I notice surprising consistency in scale across the images. Each room appears to be the same size compared to other projections from the same room. Take note of the two examples of the storage closet washroom, as we will compare these with the output from Tool B. There was an example where the scale of the room appeared to be almost half that of other similar images. Take a look at the living room comparison. The larger point clouds were consistent with the scale estimates across all the other point clouds. The smaller ones appear as a noticeable outlier. These photos were taken on the stairs at a different elevation from all the other photos. My best guess is that this is the result of a bias introduced in the AI model. Perhaps the AI model’s training data heavily favored images taken from the floor of a room compared to ones on a staircase. If this is true, the output may assume that the ceiling and floor are equally close and may pull them to the size of the average room. Another oddity shows if we look at the point clouds for the main entrance. The walls bow in considerably. From close up, these warped angles appear across the entire output of point clouds. Testing Tool B Side View Scale Comparison Storage Closet Washroom Comparison Depthmap Running a similar series of rectangular images through Tool B we received a series of results with qualities considerably different than Tool A. The scale across images at times appeared consistent; However, they often were not. Look at the storage closet washroom in this example. The size of one is almost half the size of the other! On the bright side, we can see across images that there is far less warping and walls appear straight. Verdict ToolA Warping Tool B Consistency The goal of the depth estimation step is to prepare an estimate of what the geometry around the camera should look like in 3D. The metric accuracy of Tool A would support a result where near-accurate measurements could be made against the 3D geometry. To get this benefit we would have to overcome two problems. We would need to resolve the warping AND ensure scale consistency even across outlier positions like the staircase. For Tool B we cannot rely on metric accuracy. We can rely on geometric consistency and would only need to solve the scale problem in order to use them. I decided to proceed with Tool B as it is able to satisfy the requirements of the step with the least added steps required. Metric accuracy would be a great feature, and can be added in the future as tool B’s output should give us a 1:x scale replica in most cases. Convert Matches to 3D EXR Depthmap Matching Points in 3D The matching points output by an earlier step represent where points match

Week 12 2026: Proof of Concept 360 -> 3D Read More »

Week 11 2026: Architecting 360 -> 3D Reconstruction

Accomplishments Pose Estimation Testing 360 -> 3D Tool Matrix 360-to-3D Reconstruction Architecture GMAT Study Guide Architecture.  “The art and technique of designing and building” (Link) Constructing something big often begins with outlining and constructing small atomic steps. To start baking a cake, first you crack an egg. I began testing options to estimate camera positions from 360 photos and have since refined the steps related to their 3D reconstruction. I studied and reviewed dozens of tools and libraries related to this process and compiled a matrix of related details. I also outlined my goals and steps towards them for taking the GMAT assessment tests. Pose Estimation Testing Determining where each photo was taken is a foundational step for beginning 3D reconstruction. While this is trivial in many cases, efficient, affordable, and quality 3D environment reconstruction comes with constraints that make it much more challenging. I tested a variety of commercial solutions and available libraries to better understand the process and understand how my requirements affect it. Matterport Test RealSee AI Test Commercial tools exist that perform the structure from motion (SfM) step to determine the camera’s positions as part of a greater process. Similar datasets were provided to Matterport and Realsee AI. Matterport was able to take the sparse camera data and combine them, while Realsee AI seemed to fail at aligning most of the cameras. Matterport bundles the camera positions as an export with the completed mesh at a cost. This cost may be fair and affordable for many projects; However, it raises the barrier to entry and puts environmental preservation and immersive experiences out of reach for many places. Metashape Camera Poses Metashape Mesh DAP Point Cloud 1 Camera RealityScan and Metashape are commercial tools with much different associated costs. In 1 year, RealityScan will cost at most what it costs to deliver 21 scans with Matterport. Metashape is a one-time purchase and is more affordable then Matterport after 59 reconstructions. RealityScan does not yet natively support 360 photos and failed to correctly position most images. Metashape did a much better job with some test data. Every camera was correctly aligned. I tried to see if this would also lead to a usable mesh, however the result was left with many holes. Further, with the sparse test data of Supalai Place, it failed to position more than 50% of the cameras. Sphere with Tracking Points 15 / 37 Camera Accurate Result I moved on from commercial solutions and began looking at libraries. One workflow involved SuperPoint, SphereGlue, and OpenMVG. It first took each 360 photo and identified the important areas on its surface. It then sought to match those areas across the other 360 photos. This worked surprisingly well for a first attempt, getting 15/37 photos correctly oriented. Getting the following 22 photo positions was much harder. The original output used a process that attempts to triangulate positions based on multiple photos. While the 22 photos may share much overlap, the points reviewed were not sufficient for the algorithm to pair them with the other 15 photos and they were dropped. Tracking Points in Chain Large Scale Stretching Overlap with Tangent This will likely be the case for sparse photo sets indoors where each image is only guaranteed 1 line of sight to another image. To overcome this I experimented with a chain where images were oriented sequentially. This ensured all images would be positioned. This came at the cost of quality. Since the positions aren’t all compared to each other the distance between images is harder to calculate consistently. This appears to lead to overlapping images and tangents at the start or end of the chain. Rather than a proper series of two circles the output was consistently one circle with another either stretched, overlayed, or including a straight tangent line. The quality output received from Matterport’s system indicates that a meaningful reconstruction can be performed on data as sparse as mine. The varying results delivered by the other tools confirm that sparse data comes with unique qualities that if not addressed lead to missing or misplaced cameras. I will continue to identify these qualities and review other libraries towards a solution. 360 to 3D Tool Matrix 3D Tool Matrix Inputting 360 photos and outputting a 3D model can have vastly different outcomes depending on the steps taken in between. The industry is wide and new. This leads to many different takes on how to perform the process. Through continued exposure to this environment I have begun to identify a variety of important qualities in assessing the tools. I decided to build a matrix. This matrix includes information like license, commercial availability, purpose, and if it’s outdated or replaced by another tool. This is especially important in assessing the overall viability of a workflow, and keeping track of the tools that have been tested. With over 90 tools there is a lot to keep track of! 360-to-3D Reconstruction Architecture AI Architecture Diagram Understanding the goal, our requirements, and the limitations of the available tools are key to laying out a successful roadmap. Goal: Input 360 equirectangular images from a sparse dataset and generate a navigable digital twin. Requirements: Indoor and outdoor images accepted. Must work even with only single line of sight to another camera. Even through doorways. (Ex: One image in the hallway, one in the bathroom.) Must work even when rooms are repeated. (Ex: hotel or dorm may have two identical rooms.) Mirrors and reflections should appear flat and not hallucinate rooms beyond the plane. Should work with non-standard layouts like curved glass walls, large rooms and open spaces. Tools: Segment Anything Model (SAM) 3: Identify segments based on context like “mirror, window, sky” LightGlue: Identify features and match them across images. SphereSFM – COLMAP: Uses matched features to construct structure from motion and deliver camera pose estimation. Depth Anything Panorama (DAP): Generate depthmap from equirect photos. LGET-NET: Room layout estimation from 360 equirect images. Open3D: 3D Point Cloud Reconstruction Truncated Signed Distance Function

Week 11 2026: Architecting 360 -> 3D Reconstruction Read More »

Week 10 2026: Research – Hosting & Reconstruction

Accomplishments Hosting Architecture and Pricing Research Multi-floor Diorama Workflow Research Bonus  360 Reconstruction Research “Plans are nothing; planning is everything.” – Dwight D. Eisenhower Virtual tours must be delivered efficiently and reliably. These two requirements drove much of my research this week. While I didn’t make any direct updates to the software, the development path has become much clearer. I spent time refining the hosting and service requirements for delivering 360 tour files over the web, and outlined workflows for handling multi-floor and large-space 3D reconstructions. After refining the initial workflow and identifying its bottlenecks, I began exploring and evaluating alternative solutions. Multi-floor Diorama Workflow Research improper Occlusion Manual Reconstruction Texturing Manual Reconstruction Texturing Photogrammetry Texturing Dioramas provide an eye-catching overview of a 3D scene and offer an intuitive way to navigate a tour. In a previous post, I made my first attempt at creating a usable tour for a dense, three-story property. However, the workflow I used was time-consuming, prone to system crashes, and ultimately unscalable. This week, I challenged myself to test alternative workflows to deliver this service more effectively. As a quick refresher, dioramas are 3D models that represent a physical space. These models need to reflect the environment with enough visual accuracy so that viewers can easily figure out where they are and where they want to go. Crucially, this navigation must happen without lag. Because 3D models and their associated images can have large file sizes that take time to load over the internet, we must carefully balance visual quality with performance. In my previous workflow, I started with a flat “plane”, essentially a digital piece of paper. I would cut and stretch the edges until it matched the floor plan, then stretch the sides up to build the walls. After that, I added the ceiling and interior details like countertops, appliances, and inner walls. The result was an incredibly low-poly object with a small file size, but the manual labor required to build it was unreasonable. Furthermore, the process put incredible stress on the 3D modeling program, which struggled to load over 100 uncompressed 11K .jpg files just to calculate the model’s textures. It was a great learning exercise, but it highlighted the severe challenges of recreating spaces manually. For my second iteration, I tried to speed up the manual recreation by starting with a cube instead of a flat plane. Stretching and cutting a cube handled the floor, walls, and ceiling simultaneously, drastically increasing the speed of the build. However, this process still took considerable time and completely missed smaller, non-cubic details like couches or shelves. Without these details, the visual “hotspots” weren’t properly occluded from one another, causing their circular icons to appear cut or incomplete. Adding those missing details manually would simply add that saved time right back into the process. This second workflow also involved texturing the 3D model by projecting portions of the 360-degree photos onto the digital walls and floors. When done perfectly, the high resolution of the 360 images creates a fantastic display. In practice, however, it requires massive manual effort. The 3D surfaces are divided into sections, and the software decides which parts of the image to project based on distance and angle—not based on what looks best. This forces you to go back through the scene and manually adjust the surfaces to use the correct image projections. The smallest model I created for the three-story home had over 5,000 triangle surfaces requiring manual review. That is simply not acceptable. In my third attempt, I decided to sacrifice file size in favor of speed. I used the raw photogrammetry scan taken on location as the 3D model itself. Unfortunately, phone-generated photogrammetry scans still require significant manual intervention. The datasets needed to generate full-house scans are often too large for a phone’s memory, meaning you have to take multiple smaller scans and manually align them. The edges of these stitched pieces often feature incomplete or warped geometry—a wall might flare out unnaturally, or a room at the edge of the scan might devolve into a discolored blob. All of this requires manual cleanup. Photogrammetry also struggles heavily with reflections. This can be as obvious as a mirror or as subtle as the glossy sheen on a wooden dresser. Reflective surfaces create strange, warped artifacts in the scene that look incredibly uncanny to a viewer trying to admire a space. Furthermore, if certain angles aren’t captured perfectly, they render as large, undefined blobs. All these anomalies degrade the final output and require time-consuming manual fixes. The result is far from perfect, heavily bloated in file size, and costly to produce in terms of time. So, how can we improve? There are a few remaining options that might reduce the manual labor required for these low-poly reconstructions: The tool used for texturing the walls has a feature to texture based on vertex groups. This has the potential to quickly map individual cameras to vast areas of triangles, greatly reducing texturing time. Camera positions could be calculated using Structure from Motion (SfM) algorithms. This would eliminate the need to manually place and align each camera in the digital space. We could abandon the diorama entirely for larger scenes, using a basic, untextured mesh solely for occlusion and hotspot placement. While each of these options improves the process slightly, they still fail to overcome the biggest hurdle: the manual labor required to piece everything together. To offer 3D reconstructions at larger scales, the process must be heavily automated. It needs to produce a consistent, high-quality output that visually supports seamless navigation, and it must compress down enough to download quickly and run smoothly within the memory constraints of mobile devices. Hosting Architecture and Pricing Research AI Architecture Diagram Everything you see on the internet lives on a physical computer somewhere and travels over a network to reach you. Companies offer this service at a cost, differentiating themselves based on how they deliver that content. The virtual tours I create, along

Week 10 2026: Research – Hosting & Reconstruction Read More »

Week 9 2026: VR Image Gallery

Accomplishments Minor UX improvements and Bugfixes Research Video Playback Challenges Paginated VR Grid Menu Bonus Thumbnail Update Tool To continue improving the user experience (UX) for viewers in virtual reality (VR) I decided to create a 3D version of the image gallery menu designed in Week 2. A gallery menu option allows viewers to select the exact scene they want to view in the fewest clicks. This greatly reduces the friction when navigating dense tours or sporadic image galleries. Through this implementation, I identified edge cases affecting the existing vr menus. This includes an issue that relies on the video playback process and requires more investigation. As a bonus, I was able to produce a tool for editing the panorama thumbnails that appear in the custom grid view to allow for optional adjustments. Minor UX improvements and Bugfixes: Info spot with html popup 3D Cursor VR The increased focus on VR functionality has lead to more user journeys being tested and improvements being identified. Hotspots, the images we use in 3D, are used for a variety of cases and should appear or disappear based on specific criteria. They also animate. When using menus like the media controls menu, a viewer will be surprised if these animations are delayed, or are not synchronized between the immersive view and the 2D view. The 360 images transition too, and it’s important that these transitions handle well in VR as to not disorient the viewer. All of these challenges found resolution through my work this week. The two categories of hotspot that had incorrect visibility were the information hotspots (infospots) and the VR 3D cursor. Information hotspots were implemented in week 6 and when hovered over they spawn a 2D HTML snippet (or iframe) onto the screen. This occurs in 2D space and uses the document object model (DOM) which is not available in 3D immersive VR scenes. Because the HTML could not be displayed, I added a value to each infospot which triggers the html display. I then added check when entering and exiting VR to show or hide them. Now the infospot feature does not appear when in an immersive VR view. In VR we don’t have a traditional mouse and cursor. We often select items by pointing with our controller or finger. Usually a line is drawn from the hand to the nearest surface where you are pointing and a small circle or 3D cursor appears to show what you are about to click. This feedback tells viewers they can interact, and what they are about to interact with. While working with last week’s VR menu, I noticed that hiding the menu also hid the 3D cursor. The logic that hides the vr menu searches the scene for all hotspots with a specific tag, animates them out, and removes them from the scene. The 3D cursor in use comes from a third party tool. Unknowingly, I had provided the same tag to my vr menu items that was being used by the 3D cursor. I updated the tag on my items to be more specific to my use case, and this fixed the issue. When you turn a faucet or flick a switch, it’s expected that water or light immediately appears. That is the nature of reality. Cause begets action immediately. Virtual reality is different, it can have a cause and pause. A delayed reaction immediately signals to our brain “something might be wrong”. While interacting with the media player I noticed a variety of steps which lead to animations and reactions lagging behind or completely ignoring my input. Animating between scenes, or transitioning, in VR would freeze. Under certain conditions toggling the media center would stall or never react, and in many cases would get desynchronized from its 2D counterpart. For the media controls menu most of the cases identified were resolved by setting flags and adjusting timings. Some extra steps were added to check and confirm synchronization between 2D and 3D menus. In the case of the VR transitions I had to get a bit creative. Scenes transition differently depending on how you are moving between them. In a tour without 3D elements the 360 images can simply fade and blend from one to another. In 3D tours the camera’s position teleports to another location during the transition. During this move there are a few moments where the 3D space appears stretched and warped due to how the previous image is painted onto it. Instead we use an intermediate scene of all black. Imagine navigating a website and every time you clicked a new tab the whole screen, including the menu bar, went completely black for a moment. You may ask yourself what might be wrong. Without a menu bar we cannot return home or navigate to another page, our only option is to reset the browser or go back. In VR these options are much more costly. If you’re immersed in an environment, asking you to restart the browser or refresh the page is like asking you to get out of the pool and dry off just so that you can hop in again. It’s cumbersome and sours the experience. And further, you’re not just staring at a black screen, you become immersed in a black void. With no visual anchor points you can quickly feel disoriented and concerned, even in this brief moment of transition. To solve the concern about losing control and the disorientation of a black void I updated the transitions to retain only the vr menu. This gives a consistent reference point, and the feeling of control and ability to make a change if the black screen were to persist beyond the consistent transition period. Research Video Playback Challenges Bad Bunny Youtube Video “Andrea” Adjusting timings and setting flags solved most, but not all, challenges with animating the media controls menu. I wondered why I had not seen these issues during development, and why they were only appearing intermittently. Issues with logic and

Week 9 2026: VR Image Gallery Read More »

Week 8 2026: VR Menu

Accomplishments Added Control Panel to the VR mode. VR Related Bugfixes Progress bars visual update The virtual tours I create are intended to be accessible across three device categories: traditional computers, touchscreen mobile devices, and virtual reality (VR) headsets. Recently, research and development has focused on 2D features accessible to the first two categories. This week, I shifted my focus to the VR user experience (UX). Through this effort, I identified and resolved bugs, improved feature parity by providing a menu in VR, and implemented shared features that also enhance the 2D experience. VR Bugfixes In the overview, I describe three categories of devices relevant for testing their unique user experiences. This neglects to mention how VR devices effectively function as two categories. VR devices often offer a traditional 2D web browsing experience, as well as an immersive 360 experience that places you inside the webpage. In this 2D experience, two issues were immediately apparent. The control center buttons were not visible, and there was no option to enter VR. These hurdles needed to be resolved. After investigating, I found both stemmed from the same source: a neglected feature from the template where VR devices display a prompt encompassing the screen, immediately offering the user the option to enter VR. At some point, the necessary calls to load and handle this screen were removed, and the action to hide the control center was built for a previous, much shorter version. I restored the prompt and adjusted the control panel animations so everything would display properly. Now the viewer can again begin on a 2D screen and choose to enter VR. VR Menu 3D VR Menu AI Dolphin& Bunny Using VR Pear Phone On Hover Changes A viewer can enter a virtual experience, but how can they control it? In some 3D tours there are hotspots, or markers, placed on the ground or throughout the space that can be clicked. These take the viewer to the next photo. But how would you control a video or an unrelated gallery? How would you hide hotspots to remove distractions and better immerse yourself in the scene? One option would be to map these actions to buttons on a controller. However, this could increase the learning curve and decrease engagement. Viewers would need to learn that “B” means hide icons, “A” means play, “X” means pause, and so on. Volume and video scrubbing would become even more complicated. Maintenance would also increase, since controllers vary across devices, and some devices have no controllers at all. Instead, we can provide a menu in VR. This emulates a 2D experience for controlling video, altering the scene, and iterating through gallery photos. Initially, I hoped to reuse the logic and design of the 2D menu within virtual reality. This cannot currently be done in modern web browsers. The 2D menu is built into the document object model (DOM). The DOM works well on 2D displays, but its structure does not translate into three-dimensional space. Since VR headsets are essentially two 2D screens positioned directly in front of your eyes, it might seem possible to place the menu directly on those screens. However, with screens only centimeters from your eyes, this would likely feel like a sign hanging from both eyelashes. The lack of depth would be disorienting and uncomfortable. A new solution was required. You may recall the previously mentioned hotspots—2D images positioned in 3D space that perform actions when clicked. These are already familiar to users and are often the primary method of interaction within 3D environments. Using this approach, we can rebuild the user interface with necessary adjustments to meet the expectations and limitations of 3D space. In 2D DOM menus, we have convenient grouping tools called containers. These allow items to be spaced evenly horizontally or vertically. Containers provide alignment controls, and spacing is defined locally within them. This means moving one container moves everything inside it. This convenience does not exist in 3D with hotspots. Each item must be individually placed relative to where the viewer is looking. This makes maintenance and future changes more complex. If you initially place a toolbar at the bottom and later move it to the top, every toolbar item must be updated individually. I was grateful to have waited until the 2D menu matured before building the 3D menu. Since the 2D design has been tested, the likelihood of major changes to the 3D version is reduced. During early testing of the 3D menu, I noticed that when scenes changed, the menu immediately disappeared. Opinions on this behavior may vary, which suggests that changing a scene does not inherently imply the menu should hide. These actions can be logically separate and therefore should be. A viewer may prefer to keep the menu positioned to their right while iterating through scenes or preparing to start a video. After brief investigation, I found a way to keep the menu visible and in its relative position while moving through scenes in both galleries and 3D tours. Another difference unique to the 3D menu is button indicators. In 2D, a highlight indicates when a button is hovered. In 3D, I initially struggled to replicate this behavior. After stepping away briefly, I considered alternative approaches. Instead of mimicking a flat highlight, I leveraged 3D space to animate movement similar to physical buttons. This animation feels responsive and clearly conveys interaction, so I chose to adopt it. Working in 3D comes both with challenges and benefits. One key benefit is the consistency of the viewer’s perspective. In 2D we have to account for devices of so many shapes, sizes, and resolutions. One menu must look good on a standard desktop monitor as well as ultra-wide screens, tall cell phones, older phones at 1080p, and newer screens beyond 4K. I put forth effort to craft one design that hopes to satisfy all viewport configurations, even ones as strange as the (fictional) Pear Phone. In VR, this challenge is reduced. Most human heads and

Week 8 2026: VR Menu Read More »

Crane Beach Dune Trail – Winter

Open 360 Viewer Crane Beach Dune Trail Join me on this winter walk through the dunes of Crane beach in Ipswich, MA. This journey spawned as an attempt to spot a snowy owl. While we did not see any on our trip, maybe you will spot one in the 360 photos!    This being my first winter hike, I did not know what to expect. To our luck, the trail was well trodden. In only a few areas did my feet sink into the snow. The greatest depth was only 12 inches. Overall the walk was easy and the lack of wind made for a comfortable time.    During our journey we came across other hikers, as well as an eagle and some small birds. A deer was briefly spotted too. The dunes amongst the untouched snowscape provided an interesting visual contrast. A beach in the New England winter may not sound appealing to most. Many may be surprised with how good a time can still be had.  DCIMCamera01IMG_20260217_142920_00_042.ins Do you have a space you think I should check out? Or one that may be worth capturing in 360? I would love for you to share your suggestions in the comments or by messaging me below. 

Crane Beach Dune Trail – Winter Read More »

Week 7 2026: Video

Accomplishments Video Integration Rebuilt User Interface Bonus: Multi-Crop tool One of the paramount goals of virtual tour design is to immerse the viewer enough to suspend disbelief so that their mind may experience what it feels like to be somewhere else. These feelings are often the result of what is currently stimulating our senses mixed with our memories of similar stimuli. Many moments are still, like sitting in a kitchen or viewing a painting in a museum. These still moments often provide a simple stimulus to our sense of sight. Animated moments like walking through a forest or watching the waves on a beach give off strong stimuli to multiple senses. Imagine the whistle of wind through the trees or the touch of mist from a crashing wave. 360 video captures the life of a scene along with audio. Providing a seamless integration of 360 videos into tours can further engage the viewer with the experience and increase the overall impact. Video Integration UI With Media Controls UI Without Media Controls Getting 360 videos to play on their own is a simple process, with examples easily available. Integrating these videos with photo galleries and depthmapped images in a 3D space, however, came with challenges. Most examples display videos as their own scene. This decision creates conflicts when integrating them into 3D tours. In 3D scenes, images are depthmapped onto the space. This “paints” the photo onto a 3D model at an angle from where the camera is located, providing the optical illusion that you are inside the space. This comes at the cost of slightly more processing to place and stretch the image onto the 3D model. Since this only occurs once per photo, the performance impact is negligible. Videos, however, are a series of photos—sometimes 30 or 60 frames per second. The same negligible performance cost for a single photo grows massively when applied to video. Have you ever visited a website and had an unexpected video start playing? Surprises like this can make us uncomfortable. This discomfort puts us on alert and can negatively affect our feelings about an experience. Viewers should decide when a video plays. How can we play a video in a 3D space while still using it as a depthmap when navigating the scene, and also give the user agency over when it starts? The solution is to use both a photo and a video. We can export the first frame of a video and use that image for the depthmap. This way, a user can navigate to the scene and be met with a familiar experience. We can then add a toggle button that alters the user interface and loads the video. Since the image is the first frame of the video, the transition occurs seamlessly. The scene can then load the video without using depthmapping, avoiding the associated performance challenges. Rebuilt User Interface Up to this point, the user interface (UI) had been a series of modifications and Frankenstein-style enhancements to the default skin provided in the tool I use. To compound this, each tour type (Gallery, 2D, 3D) had its own skin with its own modifications. Every change I had made so far was quick and effective. The video integration toggle, however, came with a series of more complex changes and animations that broke that streak. The UI’s architecture was chaotic, included unused features, and was becoming bloated. This made it very difficult to add new features, especially since they needed to be implemented across multiple tour types. I have significantly grown my understanding of the framework and UI design since beginning this project. With this in mind, I decided to rebuild the UI. Visually, nothing changed outside of the new features. Under the hood, however, the code’s cognitive complexity has been drastically reduced. Actions have been separated into their own files, unused features have been removed, and configurability has been added. This configurability allows the same UI file to be used across different projects. Now, updates and maintenance can be performed on a single file, and the effects cascade to all related projects. Items in a visual display are often nested within other items, which are further nested within others. If one item changes, parent items may also change or resize depending on their relationship. This made it difficult to animate background resizing because it was a parent to the items within it. When the items inside changed size, the parent would immediately grow. By changing the layout architecture, it became far simpler to provide a dynamic display of media controls and animate the elements smoothly. Bonus: Multi-Crop Tool Social Platforms With 360 Support Multi-Crop Tool Support for 360 photos is consistently available across devices and web browsers. This makes it surprising that most major social platforms still lack support for it. Facebook supports both 360 photo and video, and YouTube supports video only. How can I share more immersive views within the confines of the tools available on other platforms? Instagram, Reddit, and other platforms often offer an image carousel. Within it, photos can be viewed one at a time and slide in from either direction. With correctly cropped photos, a rotating view can be emulated using these features. Because each photo is static, it becomes even more important to ensure each image is whole and that nothing critical to the composition is cut off. Splitting someone’s face across two photos, for example, may not be visually pleasing. I began brainstorming what the process would be to break a 360 photo into these cropped images. Many tools would allow me to manually crop, export, and reposition the images, but this seemed tedious, slow, and prone to error. If I miscalculated where the second or third crop should go, or was off by a pixel, the entire output would be affected. I decided the ideal solution would be one where every crop is immediately visible, and reframing one would automatically reframe all of them. I also determined that adjusting

Week 7 2026: Video Read More »

Week 6 2026: Maintenance

Accomplishments Preserving the User Experience Test List Documentation Workflow Documentation Bonus Architecture Update Infospot Annotations With time all things change, and so does our perspective. When developing features it is easy to become so immersed in the details and develop a sense that you’re aware of everything about the topic. With a head so buried in sand, we might forget the tide will soon come. It takes stepping away and changing topic in order to return with a fresh set of eyes to identify things that weren’t seen before. This week was a practice in that respect. I have identified and resolved many edge cases which lead to unexpected behavior within a virtual tour. These fixes don’t change the ideal user experience. Instead, they ensure that it will occur more often. To protect this experience I began collecting a list of tests to perform after changes. To better recreate these virtual experiences I have outlined the workflow and key steps required to deliver them. Preserving the User Experience Glitch, Bug, Error. These words have become synonymous with technology. The interesting thing about computers is that they almost always do exactly what they are told (if we don’t consider single-event upsets (Link https://en.wikipedia.org/wiki/Single-event_upset)). So really, glitches, bugs, and errors are results of misunderstood or incomplete expectations. For example, virtual tours can be viewed across desktop or mobile. When last week I added the option to right-click and drag the diorama view I had expected this to work the same on mobile. Then, to my surprise, using two fingers to drag on mobile appeared to “glitch”, only zooming in and out. Another step needed to be added which checked for two finger drag, not just right-click. Other improvements were added. In Click and Go 3D, when viewing the diorama, you can now double click on an area in the 3D model and it will determine the closest hotspot. Previously, you had to perfectly click on the tiny circle hotspots. This had the potential to cause frustration on mobile. The 3D cursor was hidden on mobile. It would appear where the phone was last tapped and just sit on the wall. I tested this using the tour from a tropical villa in Pattaya, Thailand. The cursor sticking to the wall reminded me of the geckos native to the resort. Sizing is a challenge I often face when defining user interfaces. The one file is expected to cover infinite sizes and many aspect ratios. How can we ensure text is equally visible on a screen regardless of resolution? We can scale the size of the values based on the screen’s size. This way it will always display as a percentage of the screen, rather than a static pixel count. I updated the custom grid to introduce font scaling, and reduce the chance of fonts appearing difficult to see. These along with more minor fixes and improvements were implemented and used to provide a better user experience as they navigate virtual tours. Test List Documentation Test List Many of us have visited a grocery store, purchased our ingredients, and only when we returned home realize we forgot something important. Lists are a great way to avoid this. I find that it’s easy to remember many things, but unlikely to remember everything always. This is why I began keeping a list of tests, or expectations, I have about the virtual tours I produce. When I change the code it’s important to return to these tests to check that I have everything the way I expect it. Sometimes one change, in one place, may affect something totally different in another place, in a way we would never expect. So as confident that I may be that my code is perfect, confidence is not fact, and lists are often more reliable. Workflow Documentation Simplified Workflow AI Graphic A recipe is another form of list. It’s a list of steps (sort of like a program (wink emoji)). Many of us have tried to follow a recipe from memory. Sometimes this leads to comments about how “it came out different this time”, or “did you use more sugar?”. Having a recipe ensures that something can be reproduced, and the closer we follow the recipe the more accurate the reproduction. Virtual tours are no different. You need ingredients (text content), kitchen tools (blender, camera, custom scripts), and an order of steps (take pictures before placing them in a 3D scene). All of this leads to a result. To keep providing results at a similar quality it’s important to have a recipe. So this week I organized and expanded upon some notes I have gathered about the process and documented them. This 4 page workflow document currently covers the steps from photography to upload for a single floor 3D virtual tour. I expect this document to change with time, as more outputs are defined, different needs are identified, and different improvements are made. Bonus: Architecture Updates Directory Structure AI Graphic When the power goes out, standalone devices require us to go around and reset their clocks. Each one we have to reset on its own. Computers almost never have to be reset nowadays. This is because the time isn’t always from inside the computer, it is checked from some shared location, and that makes life much easier. I wanted the same to be true for my virtual tours. Until now, virtual tours were self contained. They lived in their own folder, with their own plugins and other files. Recently I had to update a plugin file for all of the tours on my website. It was the same file, with one instance stored locally in each of the tours. It’s annoying to update one thing in six places. It’s much better to update six things in one place! This week I was able to dedicate some time to identifying if and how this could be done on a big scale. I wanted to outline a structure that would work for any

Week 6 2026: Maintenance Read More »

Recent Post

  • All Post
  • Immersive Media
  • Photography
  • Software Development
  • Travel
  • Web Design
  • Week Reviews

© 2025 Justin Codair