Naveen Rao, (Left) Co-Founder and CEO of MosaicML, and Hanlin Tang, Co-Founder and CTO. The company’s training techniques are being applied to “building experts” using big language models to more efficiently handle corporate data. mosaicml
On Monday, Databricks, a ten-year-old software maker based in San Francisco, announced It will acquire MosaicML, a three-year-old San Francisco-based startup focused on taking AI beyond the lab, for $1.3 billion.
The deal is a sign not only of enthusiasm for assets in the white-hot generative artificial intelligence market, but also of the changing nature of the modern cloud database market.
Too: What is ChatGPT and why does it matter? here’s what you need to know
mosaicmlWorking with the semiconductor giant, created a program called Composer that makes it easy and affordable to take any standard version of AI programs, such as OpenAI’s GPT, and dramatically speed up that program’s development, taking the early stages to neural networks. Known as training network.
The company launched cloud-based commercial services this year, where businesses can train a neural network for a fee and produce predictions in response to user queries.
However, a more profound element of MosaicML’s approach is to show that entire areas of working with data – such as traditional relational databases – can be completely redesigned.
“Neural network models can really be thought of almost as a type of database, especially when we’re talking about generative models,” MosaicML co-founder and CEO Naveen Rao told ZDNET in an interview ahead of the deal. Have been.”
“At a very high level, what a database is, is a set of endpoints that are typically very structured, so typically have rows and columns of some sort of data, and then, based on that data, There is a schema on which you organize it,” Rao explained.
Rao said, unlike traditional relational databases like Oracle or document databases like MongoDB, where the schema is predefined, with a larger language model, “the schema (data) is searched from, it generates a latent representation based on the data.” , it’s flexible.” And the query is also flexible, unlike the fixed lookup in SQL-like databases that dominates traditional databases.
Too: Serving Generative AI just got a whole lot easier with OctoML’s OctoAI
“So, basically,” Rao said, “you took the database, loosened the constraints on its inputs, its schema, and its outputs, but it’s a database.” Furthermore, as a large language model, such a database can handle large blobs of data that have escaped traditional structured data stores.
Rao said, “I can eat a ton of books from an author, and I can question the ideas and relationships within those books, which is something you can’t do with just the text.”
Using intelligent prompting in LLM, prompts provide flexible ways to query the reference database. Rao explained, “When you signal it in the right way, you will enable it to generate something because of the context created by the signal.” “And, so, you can question that in aspects of native data, which is a huge concept that can apply to a lot of things, and I think that’s really why these technologies are so important.”
The MosaicML work is part of a broader movement to make so-called Generative AI programs like ChatGPT more relevant to practical business purposes.
Too: Why is open source essential to alleviating the fear of AI, according to the founder of Stability.ai?
For example, three-year-old AI startup Snorkel, based in San Francisco, provides tools that let companies write functions that automatically label training data for so-called foundational models — the largest neural nets that exist, such as That OpenAI’s GPT-4.
and another startup, OctoML, unveiled a service last week To smoothen the work of serving the estimates.
The acquisition by Databricks brings MosaicML into the vibrant non-relational database market that has been changing the data store paradigm beyond row and column for many years.
This includes Hadoop’s data lake, the technologies for working on it, and Apache Spark’s map and reduce paradigm, of which Databricks is a leading proponent. The market also includes streaming data technologies, where data storage may in some sense be in the flow of data itself, known as “data in motion”, such as the Apache Kafka software promoted by Confluent.
Too: Best AI Chatbot: ChatGPT and Other Notable Alternatives
MosaicML, which had raised $64 million prior to the deal, appealed to businesses with a language model that would not be generalists of the ChatGPT form, but would focus more on domain-specific business use cases, which Rao described as “building expertise”. Said.
The prevailing trend in artificial intelligence, including generative AI, has been to create programs that are more and more general, performing tasks in all kinds of domains, from playing video games to joining chats, to writing poems, to captioning pictures, to writing code. be able to handle. , and even controlling a robotic arm stacking block.
The enthusiasm over ChatGPT shows how attractive such a comprehensive program can be when it can be used to handle any number of requests.
Too: AI startup Snorkel grooms a new kind of expert for enterprise AI
And yet, a far more focused approach to the use of AI in the wild by individuals and institutions is likely to dominate because they can be far more efficient.
“I can build a smaller model for a particular domain that performs better than a larger model,” Rao told ZDNET.
MosaicML earned its name with performance achievements by demonstrating its power in the MLPerf benchmark tests, which show how fast a neural network can be trained. Among the secrets to speeding up AI is the observation that smaller neural networks built with greater focus can be more efficient.
that was the idea A 2019 paper extensively explored MIT scientists Jonathan Frankel and Michael Corbin won the award for best paper at the International Conference on Learning Representations that year. The paper introduced the “lottery ticket hypothesis”, the notion that each large neural network consists of “sub-networks” that can be as accurate as the total network, but with less computational effort.
Too: Six skills you need to become an AI prompt engineer
Frankel and Carbin have been advisors to MosaicML.
MosaicML is also explicitly based on techniques discovered by Google’s DeepMind unit that show there is an optimal balance between the amount of training data and the size of a neural network. By doubling the amount of training data, it is possible to make a smaller network more accurate than a larger network of the same type.
Rao encapsulates all of those capabilities into a kind of Moore’s Law for speeding up networks. Moore’s law is a semiconductor law that roughly states that the amount of transistors in a chip will double every 18 months at the same cost. It is the economic miracle that made possible the PC revolution and then the smartphone revolution.
Too: Google, Nvidia get top marks in MLPerf AI training benchmark
In Rao’s version, by applying smart compute tricks with the MosaicML composer tool, neural nets can get up to four times faster with each generation.
Many surprising insights come from this approach. One, contrary to the oft-repeated phrase that machine learning forms of AI require massive amounts of data, may be that small data sets when applied in an optimal balance of data and models work à la DeepMind. can work well. In other words, big data may not actually be better data.
opposite of huge General For neural nets like GPT-3, which are trained on everything on the Internet, small networks can be a storehouse of unique knowledge about a company’s own domain.
“Our infrastructure almost becomes the back-end for building these kinds of networks on people’s data,” Rao explained. “And there’s a whole other reason why people need to build their own models.”
Too: Who owns the code? If ChatGPT’s AI Helps You Write Your App, Is It Still Yours?
“If you’re Bank of America, or if you’re the intelligence community, you can’t use GPT-3 because it’s trained on Reddit, it’s trained on a lot of stuff that might even contain personally identifiable information, And it could be stuff that isn’t explicitly allowed to be used,” Rao said.
For this reason, MosaicML has been part of an effort to provide open-source models of the larger language model, so that customers can know what kind of program is operating on their data. It’s a view shared by other leaders in generic AI, such as Emad Mostaq, founder and CEO of Stability.ai, who told ZDNET in May, “You can use black-box models for the world’s most valuable data, including corporate ones.” There is no way to use”. Figures.
MosaicML released as open source the latest version of a language model — including 30 billion parameters, or neural weights — called MPT-30B last Thursday. The company claims that the MPT-30B is better than OpenAI’s GPT-3 in quality. The language model has had more than two million downloads since the company introduced the open-source language model in early May.
Although automatically discovering schemas may prove useful for database innovation, it is important to note that larger language models still have issues such as hallucinations, where the program will give wrong answers while being sure they are real.
Too: ChatGPT vs Bing Chat: Which AI Chatbot Is Better For You?
Rao said, “People don’t really understand, when you ask something about ChatGPT, it’s not quite right at times, and sometimes it sounds so right, like a very good shit artist. “
“Databases require absolute correctness, predictability,” based on a lot of things engineered in the database space over the last 30, 40 years that need to be true, or at least mostly true, of some sort New way of doing it,” Rao said.
“People look at it (the big language model) like it can solve all their problems,” said Rao of enterprise interest. “Let’s really get down to the basics of getting there.”











