Why Avataar Is Bullish About Cracking AI Video And Outdoing Global Giants

The race in AI video generation is currently being dominated by global giants like OpenAI, Google and Chinese startups flush with compute budgets running into the billions of dollars. While all of them are competing with bigger models and more GPUs, a Bengaluru-based startup has emerged to buck this trend.
Much like how Chinese AI lab DeepSeek turned the tide in the LLM market, the twelve-year-old Peak XV-backed Avataar has attempted a similar disruption in the AI video generation market.
With the launch of Varya, an AI video generation model, Avataar claims that videos can now be generated for as low as ₹0.50 per second, which is at least 10X lower than the cheapest AI video creation model available right now.
According to the startup, this is India’s first distilled AI video generation model developed under the government’s IndiaAI Mission. Generally, AI video generation is a token-heavy task and requires large computational power.
With its distilled AI video generation model, the startup has been able to significantly reduce its compute cost, proving that the winners in this race won’t be the ones that use more compute but the one who needs the least of it.
However, the cost advantage for Avataar does not come from building a smaller model. “Most distillation projects work by shrinking the parameter count. A 70-Bn-parameter model gets compressed to 7 Bn,” said Sravanth Aluru, Avataar’s cofounder and CEO.
This method, however, helps users save costs and time, but quality often goes for a toss.
Avataar has already taken cognisance of this. Varya is built on Alibaba’s open-source Wan 2.2 architecture and retains a 14-Bn-parameter footprint, the same size as its teacher model.
What Avataar changed is how it reasons about video generation. Standard diffusion-based video models generate output through a long iterative denoising process, typically around 50 sequential steps, where the model progressively refines a noisy signal into a coherent video.
Varya collapses this to four steps, but does so through a redesigned inference framework where each step carries a distinct function rather than repeating the same operation at a finer resolution.
Aluru told Inc42 that the first two steps focus on trajectory shaping, establishing the broad structure, motion path and compositional logic of the video, while the final two steps generate the actual output frames.
The system integrates several techniques under the hood, including role-aware supervision, distribution matching and classifier-free guidance augmentation.
Notably, on an NVIDIA H200 GPU, Varya generates a five-second 720p video in approximately 45 seconds.
The same task on Wan 2.2, its underlying base model, takes roughly 1,230 seconds. The company claims a 27X improvement in speed and cost versus its teacher model.
Can Avataar Succeed Where Sora Failed?
To understand why Varya matters, here is a look at the global AI video market, which managed to spend staggering sums of money while failing to find a mass audience.
The cautionary tale here is OpenAI’s Sora. Launched in late 2024 to extraordinary hype, the model was producing photorealistic video clips that seemed to herald a new era in content creation.
The economics behind the scenes were far less glamorous. By March 2026, each 10-second clip cost OpenAI approximately $1.30 to produce, translating into roughly $15 Mn per day in inference costs, against lifetime revenue of just $2.1 Mn.
Downloads peaked at 3.33 Mn in November 2025 and cratered to 1.13 Mn by February 2026, and active users dipped below 500,000.
Further, a reported $1 Bn partnership with Disney, which would have unlocked 200-plus characters from Marvel, Pixar, and Star Wars, also collapsed. Eventually, OpenAI shut down the Sora standalone app in March 2026.
Sora’s collapse did not kill the AI video. But it did lay bare a structural problem that the industry has been unable to solve: the cost of generating high-quality video at inference time remains prohibitively high for most users and most use cases.
For instance, Google’s Veo 3.1 Standard costs about $0.75 per second of video, while OpenAI’s Sora 2 ranges between $0.30 and $0.50 per second.
Runway Gen-4.5 charges around $0.15 per second, and Kuaishou’s Kling 3.0 costs about $0.10 per second.
This means a 30-second video can cost anywhere from a few dollars to over $20, depending on the model. While prices have fallen sharply from 2024 levels, they are still too high for many small businesses, teachers and independent creators, particularly in markets such as India.
Avataar’s Varya prices video generation at ₹0.48 per second — approximately $0.0057 at current exchange rates. That makes it roughly 17X cheaper than Kling, the global market’s cost champion, and more than 130X cheaper than Veo 3.1 Standard.
Notably, the startup is yet to publish its technical paper detailing Varya’s architecture and methodology, meaning these claims remain self-reported for now. An assessment of whether Varya can truly match global models on quality, not just on cost, will only be possible once that paper is out and third-party benchmarks follow.
For now, Varya wins on the affordability part.
“There is a huge audience sitting with ideas but without affordable tools to express them through video,” said Aluru.
Twelve Years In The Making
Avataar’s path to building an AI video model is not the straight line the launch might suggest.
The startup was founded in 2014 by Sravanth Aluru, Prashanth Aluru, Gaurav Baid and Mayank Tiwari with a focus on spatial visual computing, specifically, enabling ecommerce brands to replace flat 2D product images with life-sized 3D augmented reality experiences.
In January 2022, the startup raised $45 Mn in a Series B round led by Tiger Global, with participation from Sequoia Capital India (now Peak XV Partners), at the time one of the largest fundraising rounds in the applied 3D/AR space.
By 2023, the startup had around 180 specialists split between Silicon Valley and Bengaluru, and counted brands like Sleep Number, Pepperfry, and Bajaj Auto among its enterprise customers. The core competency it had built, understanding how objects, spaces, and visual context interact, and doing so computationally at scale, turns out to be meaningful groundwork for the generative video problem it is now attempting to solve.
Now with Varya’s launch, the startup sees three key user groups for Varya.
Enterprises can fine-tune the model on proprietary data to automate video creation across marketing and product catalogues, with integrations planned for tools like Adobe Firefly.
For creators and small businesses, video generation is priced at about ₹0.5 per second, making AI video production affordable for platforms such as Instagram Reels, YouTube Shorts, and WhatsApp Status.
Aluru is particularly bullish on education, arguing that Varya could help India’s 1.5 Mn-plus schools create engaging visual learning content, enabling teachers to generate educational videos without expensive production resources.
Ranjan Anandan, MD at Peak XV, believes the significance of Varya extends beyond a single product launch. He sees it as part of a broader pattern that has defined India’s technology success stories, where local innovation succeeds not by matching Western incumbents on scale, but by radically lowering costs and adapting products for Indian users.
Drawing parallels with the rise of Anandan argued that AI adoption in India will ultimately be driven by affordability, cultural relevance, and population-scale accessibility rather than cutting-edge benchmarks alone.
“India has never built leadership in any technology area by following what the West has done, simply because we can’t afford it. When it comes to AI, that playbook of doing it differently, aiming for population scale, dramatically lower cost, making it culturally relevant and Indian contextual, is what’s going to be needed,” Anandan said.
Edited by Nikhil Subramaniam
The post Why Avataar Is Bullish About Cracking AI Video And Outdoing Global Giants appeared first on Inc42 Media.


Superadmin 










