Skip to main content

Dolly

Dolly is an open Large Language Model (LLM) family (machine learning / Generative AI (GenAI)) released by Databricks and trained on instruction-following data to support enterprise-grade text generation and conversational use cases.

  • Open LLM family trained for instruction following (machine learning / GenAI).
  • Fine-tuned on a high-quality, human-generated instruction dataset released by Databricks (training data / datasets).
  • Designed to run on the Databricks Lakehouse Platform as well as external infrastructure (ML platform / deployment).
  • Supports enterprise-focused use cases such as chat, Q&A, and text generation with organizations retaining full control of the model weights (enterprise Artificial Intelligence (AI) / governance).
  • Distributed under an open license that permits commercial use of the model and dataset (open-source licensing / AI governance).

More About Dolly

Dolly is an open LLM family (machine learning / GenAI) created by Databricks to provide enterprises with an instruction-tuned model they can fully own, operate, and customize. Databricks introduced Dolly to demonstrate that capable instruction-following behavior can be achieved by fine-tuning existing base models on a relatively compact, high-quality dataset rather than relying on proprietary black-box APIs.

Databricks trained Dolly using a human-authored instruction dataset that it publicly released for commercial use (training data / datasets). This dataset consists of prompts and responses covering tasks such as open-ended Q&A, classification, summarization, brainstorming, and dialogue. By making both the model weights and the underlying instruction data available, Dolly enables organizations to reproduce, audit, and extend the training process on their own infrastructure, including the Databricks Lakehouse Platform (ML platform) or other environments that support standard deep learning frameworks.

From a capability perspective, Dolly supports general-purpose Natural Language Generation (NLG), including conversational agents, assistants, and text-based workflows (conversational AI / Natural Language Processing (NLP)). Core behaviors include following natural-language instructions, engaging in multi-turn dialogue, and generating or transforming text documents. Because Dolly is distributed as an open model, enterprises can further fine-tune it on domain-specific corpora, apply reinforcement learning from human feedback where desired, and integrate it into existing Machine Learning Operations (MLOps) pipelines for versioning, monitoring, and governance (MLOps / model governance).

Dolly is closely associated with the Databricks Lakehouse architecture (data lakehouse / analytics platform), where models are managed alongside data and features. Organizations can use Dolly in conjunction with Delta Lake tables, Unity Catalog for access control (data governance), and Databricks Machine Learning (ML) tooling for experiment tracking, deployment, and batch or real-time inference. The model can be served via Representational State Transfer (REST) endpoints, notebooks, or integrated into downstream applications such as analytics dashboards, search interfaces, and internal productivity tools (application integration).

In enterprise environments, Dolly is used where control over data, training procedure, and model outputs is a primary requirement. Because the weights and dataset are available, compliance teams can inspect training sources, legal teams can assess licensing terms, and engineering teams can adapt the model to private data without sending content to external providers (security / compliance). This positions Dolly within directories and taxonomies as an open, instruction-tuned LLM asset for organizations building on-premises (on-prem) or cloud-based GenAI capabilities, especially where alignment with data governance, privacy, and sovereignty requirements is a core design constraint.