Inside Human Archive: The Y Combinator-Backed Startup Recording Indian Workers To Train The World’s Robots

Human Archive is making waves in India’s startup ecosystem this week amid reports of its work with home services startups like Snabbit. And amid this spotlight, the San Francisco and Bengaluru-based startup has raised $8.2 Mn to build what it describes as the largest human sensorimotor dataset of its kind.
Essentially, the Y Combinator-backed startup is building the training material for robots and physical AI systems that frontier labs are racing to build and ship. The round was led by Wing Venture Capital, an early Snowflake backer and NVP Capital. The cap table also includes angels from OpenAI, NVIDIA, BAIR, SAIL, Anduril, Google, Mercor, AfterQuery (Founder & CEO), Meta, DoorDash AI Research, and Astranis, among others.
Founded by UC Berkeley and Stanford University dropouts, 20-year-olds Rushil Agarwal, Raj Patel, Samay Maini, and Shloke Patel, the startup was thrust into the limelight with the recent debate around gig workers being used to train physical AI systems and robots.
Earlier this week, home services startup Pronto was found to have run a pilot using worker tracking data to train AI models. Soon after, rival Snabbit was also found to have conducted a similar test in “controlled environments” in partnership with Human Archive, although the human services startup has denied deploying it in real-time scenarios.
The Business Of Collecting Data
Human Archive’s business is data. The company has so far collected tens of thousands of hours of data. It wants millions.
“We capture human embodied intelligence, that’s what we do,” cofounder Rushil Agarwal told Inc42.
The company taps worker networks and businesses to collect data. The collected data undergoes anonymisation, annotation, and processing by the team at Human Archive before it is finally sold to frontier labs and robotics companies.
Human Archive supplies workers with hardware or rigs with downward-facing cameras that record 4K video at 30 frames per second, paired with depth-sensing cameras and a wide-angle lens. The system is meant to capture how human hands perform specific tasks. The patterns and data from these movements is then used to train robots and physical AI systems.
Addressing the worker privacy issue first, Agarwal said, faces are rarely visible due to the camera angle. The company also claims to strip out any identities that slip into frame during post-processing. Other hardware components such as tactile gloves, wrist-mounted cameras, arm and chest-mounted body IMUs (Inertial Measurement Unit) make up the rest of the rig.
Once workers record the footage, it is processed with motion capture technology and tactile force-feedback streams to produce a dataset that is then further processed in-house by Human Archive. The video data is run through proprietary QA, hand-tracking, and reconstruction models.
Currently, Asia, and within that India, is the largest hub for data collection for Human Archive. The startup’s ambition to scale up rapidly is why India sits at the centre of the operation. Agarwal cited two reasons: the diversity of industries accessible inside a single country, “from jewellery to textile to coal to steel to everything” and a labour economy that the US, by their reading, does not have.
Of the 125+ companies they have partnered with, a major chunk is based in India. Agarwal said that Human Archive has signed over 120 partnerships across hotels, restaurants, quick commerce platforms, construction sites and factories, though it concedes that many have not yet been activated.
While the company did not disclose the companies it is engaged with at the demand or at the supply side, although in its company deck, Human Archive mentions Gurugram-based home cleaning service QuickyGo conducted pilot.
The Privacy And Ethical Dilemmas
The debate about workers training machines that could one day replace them is heating up. The crux of the issue is that workers often don’t know what exactly they are contributing, but eventually such data is used to move robots or technology into human roles.
The oft-cited example is one of self-driving vehicles in major US cities which were trained for city navigation with videos shot on cab-mounted cameras driven by gig workers and cab drivers.
Human Archive’s founders maintain that anyone engaged directly by the company is fully informed about what is being captured and why. But when collection runs through a partner business, the responsibility for informing workers shifts.
When asked if workers themselves know that they could be potentially training themselves out of job, Agarwal said that they are transparent with partners on the intent and purposes of data usage, however, it is the partner companies’ responsibility to educate their employees.
Contracts, he says, require partners to obtain informed consent, and the company only works with those that comply. Additionally, Agarwal said that on average, each worker gets $10-100 per hour of work. Inc42 was not able to independently verify the compensation component.
To be sure, the blue-collar worker economy is a highly unorganised sector that typically involves manual labor, skilled trades, and hands-on physical work. Their work forms the backbone of industries such as building, manufacturing, transporting, and maintaining the infrastructure of the economy.
Blue-collar workers in India generally earn average salaries ranging from ₹15,000 to ₹35,000 per month. That said, Deloitte’s Blue-Collar Workforce Trends 2025 found that blue collar wages are now growing at an annual rate of 5–6%.
The other side of the story is the privacy challenge. Especially when it comes to recording people’s homes, as seen in the case of housekeeping services apps, which directly challenge privacy rules such as DPDPA 2023.
Pronto has, since then, issued a statement. “Unless you have opted in and paid for the program personally, the Pro doesn’t come to the house with a camera. Opt in is not one-time, it has to be affirmed before each booking. By default there is no camera involved, and when there is, it’s impossible to miss.”
The home services startup also said that they are not the only company in the space doing this.
Urban Company, a rival company, has distanced itself from indulging in such practice.
Snabbit, which conducted a simulation test within its own environment with Human Archive, said that ‘understanding something and deploying it in our customers’ homes are two very different things’.
Snabbit and Human Archive did sign an NDA, however, the current status of that agreement could not be ascertained. Aayush Agarwal, founder of Snabbit, has issued a statement, stating that they have no partnership for deploying it in customer homes and have ‘no intention of changing that’.
The Ministry of Electronics and Information Technology (MeitY) is reportedly taking cognisance of the Pronto controversy. According to Moneycontrol’s report based on government sources, this could prompt greater scrutiny around startups utilising customer home data.
Physical AI data collection has begun to attract similar criticism as was seen in content-moderation sweatshops of the late 2010s and early 2020s, the underlying geometry of Global South labour feeding a decidedly Western model development.
The Companies Racing To Train Physical AI
Data collection has become its own category inside the physical-AI race. Unlike language models that train on trillions of tokens scraped from the internet, or vision models that learn from billions of public images, robotic manipulation data cannot be scraped from the web, it has to be captured in the real world.
Scale AI, in which Meta owns 49% stake, now runs a dedicated Data Engine for Physical AI and says it has completed more than 100,000 production hours at its San Francisco prototyping facility. Build AI, a newer entrant, sells an egocentric dataset of first-person video aimed specifically at industrial robots and embodied AI.
The physical AI market is expected to grow at over 47% CAGR in the 2026 to 2032 period to $15.24 Bn, according to MarketsandMarkets report. This growth will stem from edge AI computing and real-time decision-making capabilities in robots.
Earlier this week, entrepreneur Abhinav Kukreja launched Neocambrian AI to build what he described as the data foundation of physical AI. This involves creating a high fidelity, pre-training scale database of “human action from India”.
The pitch is similar to Human Archive, and one suspects that, given the depth in the Indian market, we will be hearing a lot more about how startups are looking to capture ‘human action’ in the next few months.
[Edited by Nikhil Subramaniam]
The post Inside Human Archive: The Y Combinator-Backed Startup Recording Indian Workers To Train The World’s Robots appeared first on Inc42 Media.


Superadmin 










