Effortless Dataset Metadata
A step-by-step guide to making Croissant Metadata with MCP and Jetty.Machine learning thrives on high-quality data. A crucial part of this is accurate and standardized metadata. The MLCommons Croissant format provides a common standard for describing ML datasets, making them easier to find, understand, and use across different tools and platforms.
However, creating valid Croissant metadata files can be tricky. Errors or inconsistencies can lead to problems downstream, hindering collaboration and reproducibility. That's why validating your Croissant file is essential. It ensures your metadata adheres to the standard, is correctly structured, and accurately represents your dataset. Jetty provides a public Model-Context-Protocol (MCP) server specifically designed to validate MLCommons Croissant files. By connecting your MCP-enabled client (like the Croissant editor or validation libraries) to our server, you can easily check your metadata for compliance against the Croissant specification. The implementation is open source and available here.
1. Setting Up MCP Croissant Validation
To configure your MCP-enabled client to use Jetty's validation server, add the following endpoint configuration.VSCode Insiders
Settings > MCP > Edit:
"jetty": {
"type": "sse",
"url": "https://mcp.jetty.io/sse",
"headers": { "VERSION": "1.0" }
}
Cursor Settings
Settings > Cursor Settings > MCP:
{
"mcp.jetty.io": {
"url": "https://mcp.jetty.io/sse"
}
}
2. Creating Croissant Metadata from a Dataset
With the MCP endpoint installed, you can instruct your development environment on how to build an MLCommons Croissant metadata file.3. Validation and Fixing Croissant Metadata
Be sure to check if there are issues with your metadata using the MCP endpoint. Then have your LLM agent make the corrections.Contact Us
If you have any suggestions, comments, or support questions, don't hesitate to email us or create a GitHub issue.