当前位置:首页 > 知识 >

[约翰逊]法学硕士如何在 2023 年进入现代数据堆栈

1 月 10 日与旧金山的领导人一起度过一个独特的社交之夜、见解和对话。

在此请求邀请。

当 ChatGPT 一年前首次亮相时,互联网用户获得了一个随时可用的人工智能助手来聊天和工作。

它处理他们的日常任务,从生成自然语言内容(如论文)到审查和分析复杂信息。

聊天机器人的迅速崛起很快就吸引了全世界对其核心技术的关注:GPT 系列大型语言模型 (LLM)。

快进到今天,法学硕士(GPT 系列和其他)不仅是个人特定任务的驱动力,也是大规模业务运营的驱动力。

企业正在利用商业模型 API 和开源产品来自动执行重复性任务并提高关键功能的效率。

想象一下,与人工智能对话,为营销团队生成广告活动,或者能够通过在正确的时间显示正确的数据库来加速客户支持操作。

影响是深远的。

然而,法学硕士的作用没有得到太多讨论的一个领域是现代数据堆栈。

法学硕士改变数据堆栈

数据是高性能大型语言模型的关键。

当这些模型经过正确训练后,它们可以帮助团队处理数据——无论是进行实验还是运行复杂的分析。

VB事件

人工智能影响之旅

制定 AI 治理蓝图 – 请求 1 月 10 日活动的邀请。

了解更多

事实上,在过去的一年里,随着 ChatGPT 和竞争工具的发展,为企业提供数据工具的企业在其工作流程中循环生成人工智能,以使客户的工作变得更轻松。

这个想法很简单:利用语言模型的力量,让最终客户不仅在处理数据时获得更好的体验,而且还能够节省时间和资源——这最终将帮助他们专注于其他更紧迫的任务。

法学硕士的第一个(也可能是最重要的)转变发生在供应商开始推出对话式查询功能时,即通过与结构化数据(适合行和列的数据)对话来获取答案。

这消除了编写复杂 SQL(结构化查询语言)查询的麻烦,并为包括非技术用户在内的团队提供了易于使用的文本到 SQL 体验,他们可以输入自然语言提示并从他们的数据中获取见解。数据。

所使用的法学硕士将文本转换为 SQL,然后对目标数据集运行查询以生成答案。

While many vendors have launched this capability, some notable ones to make their move in the space were Databricks, Snowflake, Dremio, Kinetica and ThoughtSpot. Kinetica initially tapped ChatGPT for the task but now uses its own native LLM. Meanwhile, Snowflake offers two tools. One, a copilot that works as a conversational assistant for things like asking questions about data in plain text, writing SQL queries, refining queries and filtering down insights. The second is a Document AI tool to extract relevant information from unstructured datasets such as images and PDFs. Databricks also operates in this space with what it calls ‘LakehouseIQ’.

Notably, several startups have also come up in the same area, targeting the AI-based analytics domain. California-based DataGPT, for instance, sells a dedicated AI analyst for companies, one that runs thousands of queries in the lightning cache of its data store and gets results back in a conversational tone.

Helping with data management and AI efforts

Beyond helping teams generate insights and answers from their data through text inputs, LLMs are also handling traditionally manual data management and the data efforts crucial to building a robust AI product.

In May, Intelligent Data Management Cloud (IDMC) provider Informatica debuted Claire GPT, a multi-LLM-based conversational AI tool that allows users to discover, interact with and manage their IDMC data assets with natural language inputs. It handles multiple jobs within the IDMC platform, including data discovery, data pipeline creation and editing, metadata exploration, data quality and relationships exploration, and data quality rule generation.

Then, to help teams build AI offerings, California-based Refuel AI provides a purpose-built large language model that helps with data labeling and enrichment tasks. A paper published in October 2023 also shows that LLMs can do a good job at removing noise from datasets, which is also a crucial step in building robust AI.

Other areas in data engineering where LLMs can come into play are data integration and orchestration. The models can essentially generate the code needed for both aspects, whether one has to convert diverse data types into a common format, connect to different data sources or query for YAML or Python code templates to construct Airflow DAGs.

Much more to come

It’s only been a year since LLMs started making waves and we are already seeing so many changes in the enterprise domain. As these models improve in 2024 and teams continue to innovate, we’ll see more applications of language models in different areas of the enterprise data stack, including the gradually developing space of data observability.

Monte Carlo, a known vendor in the category, has already launched Fix with AI, a tool that detects problems in the data pipeline and suggests the code to fix them. Acceldata, another player in the space, also recently acquired Bewgle to focus on LLM integration for data observability.

然而,随着这些应用程序的出现,对于团队来说,确保这些语言模型(无论是从头开始构建还是经过微调)能够正确执行也将变得比以往任何时候都更加重要。

此处或那里的轻微错误可能会影响下游结果,从而导致客户体验受损。

VentureBeat 的使命

是成为技术决策者获取有关变革性企业技术和交易知识的数字城镇广场。

了解我们的简报。

猜你喜欢

微信二维码

微信