Fractile, a UK company building a groundbreaking new AI chip to deliver exponential performance improvements for AI models, has today exited stealth and announced $15m (£12m) in seed funding. The round was co-led by Kindred Capital, NATO Innovation Fund, and Oxford Science Enterprises, with participation from Cocoa and Inovia Capital, together with angel investors including Hermann Hauser (co-founder, Acorn, Amadeus Capital), Stan Boland (ex-Icera, NVIDIA, Element 14 and Five AI), and Amar Shah (co-founder, Wayve). To date, Fractile has raised $17.5m (£14m) in total funding.
Founded in 2022 by 28-year-old artificial intelligence PhD, Walter Goodwin, Fractile has developed a radically different approach to the design of chips for AI inference that can deliver transformational improvements in performance for frontier AI models in deployment.
Today’s chips are the biggest constraint to better AI performance
The world’s biggest AI companies are today engaged in a hypercompetitive race to build, train and deploy the best foundational models, requiring vast investments in computational resources. However, all of these companies are reliant on very similar hardware. These chips and highly developed tools and libraries are well-optimised for training large language models (LLMs), but they are unsuited to inference, which is the process of running live data (input tokens) through a specific model with learnt parameters, to produce results (in LLMs, a series of output tokens). This means:
- AI models are very expensive to provision and run at scale. Issues like the time taken on conventional hardware to move model parameters from memory to processors mean that very expensive hardware is often used at a small fraction of its theoretical capability, driving up costs. This is certain to throttle adoption and deployment once AI model providers are forced to generate a return on the huge investments required, by passing these costs onto users.
- AI performance is inhibited. Ever faster compute cannot make up for the performance lag caused in inference by moving model weights from memory to the processor units, limiting real-time performance and user experience.
- Potential AI performance in the future is restricted. Continual advancement of conventional computing is limited by the heat generated by these chips. There is a limit to how fast we can cool silicon chips, and this has become the new constraint on continuing to scale conventional digital processors (the end of Dennard Scaling). With enough data, bigger AI models are predictably better, but without breakthroughs in compute systems, we will not be able to continue to scale AI models to be orders of magnitude larger with sufficiently low latency (time per output token, for instance) to be useable.
- Restricted opportunity for AI model providers to drive differentiation. With every AI model provider building on similar infrastructure and the balance of its use tilting heavily to inference, without novel hardware the opportunity to create long-term differentiation and competitive advantage from faster, cheaper and higher quality token generation in inference will be severely limited.
Fractile’s fresh approach to chip design
There are two paths available to a company attempting to build better hardware for AI inference. The first is specialisation: honing in on very specific workloads and building chips that are uniquely suited to those specific requirements. Because model architectures evolve rapidly in the world of AI, whilst designing, verifying, fabricating and testing chips takes considerable time, companies pursuing this approach face the problem of shooting for a moving target whose exact direction is uncertain.
The second path is to fundamentally change the way that computational operations themselves are performed, create entirely different chips from these new building blocks, and build massively scalable systems on top of these. This is Fractile’s approach, which will unlock breakthrough performance across a range of AI models both present and future.
Radically improved performance and power savings
A Fractile system will achieve astonishing performance on AI model inference – initial targets are 100x faster and 10x cheaper – by using novel circuits to execute 99.99% of the operations needed to run model inference. A key aspect is a shift to in-memory compute, which removes the need to shuttle model parameters to and from processor chips, instead baking computational operations into memory directly. Fractile is also unique in its approach to building these computational units, while still ensuring its technology is fully compatible with the leading-edge unmodified silicon foundry processes that all leading AI chips are built on.
Not only will Fractile provide vast speed and cost advantages, but it does so at a substantial power reduction. Power – sometimes measured in Tera Operations Per Second per Watt (TOPS/W) – is the biggest fundamental limitation when it comes to scaling up AI compute performance (see notes below for more detail). Fractile’s system is targeting 20x the TOPS/W of any other system visible to the company today. This allows for more users to be served in parallel per inference system, with – in the case of LLMs for example – more words per second returned to those users, thereby making it possible to serve many more users for the same cost.
Such powerful inference hardware can be leveraged by AI model providers for huge performance advantage from existing models, such as by introducing reasoning to language models. Currently, to get output from the largest models that matches human reading speed, AI companies tend to deploy systems which leverage purely ‘next token prediction’. With faster speeds, AI model providers can cost-effectively introduce recursive queries, chain of thought prompting and tree search, and users can get much better answers from the same models: the equivalent of transporting a foundation AI model from two years in the future into the present day. It’s not just language models that see this sort of qualitative shift when they can be run faster and at lower cost – Fractile’s performance leap on inference will accelerate AI’s ability to solve the biggest scientific and computationally heavy problems, from drug discovery to climate modelling to video generation.
Fractile has already built a world-class team with senior hires from NVIDIA, ARM and Imagination, and has filed patents protecting key circuits and its unique approach to in-memory compute. The company is already in discussions with potential partners and expects to sign partnerships ahead of production of the company’s first commercial AI accelerator hardware. Fractile will use the funding to continue to grow its team and accelerate progress towards the company’s first product.