Publications
I am currently on the job market and actively searching for new opportunities!If my research can contribute in any way, please reach out at rajatmodi62@gmail.com
2017- 2025
[1] Asynchronous Perception Machine(s): First working implementation of the GLOM architecture. It was previously only a theoretical idea. Our implementation is 10x faster than ViT-B/16 and performs 2% better than state-of-the-art OpenCLIP ViT-H on ImageNet. Designed as a spiritual successor to capsule networks.
Venue: NeurIPS 2024
Links: [Paper] [Poster] [Code] [Patent]
[2] Layer Query Networks: Introduces constant-time feature extraction across any layer in a DNN. While existing networks are sequential ($O(\text{depth})$), LQNs operate in $O(1)$ time. Achieved 15% speedup on ImageNet with 12% relative accuracy improvement.
Status: ICLR 2026 (Under Review)
Links: [Paper]
[3] On Occlusions in Video Action Detection: Benchmark Datasets And Training Recipes:
First benchmark studying occlusions in spatio-temporal video action detection. Introduced 5 new datasets and surpassed VideoCapsuleNet by 32.3%.
Venue: NeurIPS 2023
Links: [Paper] [Poster] [Code]
[4] Video Action Detection: Analyzing Limitations And Challenges.
Dataset containing multiple people performing temporally challenging actions (e.g., opening/closing doors). Featured in the official CVPR workshop.
Venue: CVPRW 2022
Links: [Paper] [Dataset] [Challenge]
[5] Sky2Ground: First dataset consisting of real and synthetic ground/satellite/aerial images for 54 outdoor scenes with camera poses. Proposed SkyNet, a 3D point cloud generation architecture using novel curriculum-inspired strategies. Supported by a $500k grant from IARPA.
Status: CVPR 2026 (Under Review)
Links: [Paper]
[6] Foundational Models for Video Understanding: A survey: First survey of over 200 video foundational models and evaluation metrics across 14 video tasks.
Status: ACM Computing Survey (Under Review)
Links: [Paper] [GitHub]
[7] Steganography Using Wavelets with statistical performance analysis: An algorithm which forces a stego image to lie within “higher frequencies” range of a given input image.
Venue: IEEE Iementech 2017
Links: [Paper]
Patents
[1] Asychronous Perception machine
APM is the first implementation to getting GLOM work. Recently filed as an A1 patent application in USPTO by UCF.
Links: [Patent Application]
[1] Externally Guided Multi Domain Personalization:
Invented attribute selection and sampling vectors in GANs to help achieve personalized user recommendations.
Links: [US A1 Patent]
[2] Context Resolution in Autonomous Systems
Propose a system using cross-stitch units to merge user inputs into a unified representation.
Links: [US A2 Patent]
A lot more still needs to be done.