Datasets

What are datasets and how to use them?

Data sets are a combination of the 'role' of the LLM, the prompt, and an optional reference (expected output).

When configuring the role for the language model, you have three choices: system, assistant, and user. Each role serves a unique function, aiding in the production of responses that are more relevant, precise, and suited to the context.

The 'system' role acts as a set of guidelines or context for the language model, directing how it should interpret and respond to requests. The 'user' role represents the actual query posed by the user. Based on the user's query, the 'assistant' role then dictates the language model's response.

Examples:

System role: "You are a helpful teaching assistant who explains difficult concepts in a way that is easy to understand. Avoid difficult terminology, be concise, and use a maximum of 50 words."”

User role: "What is photosynthesis?"

Assistant role: "Photosynthesis is how plants make their food using sunlight. They take in sunlight, carbon dioxide from the air, and water from the soil to create sugar for energy and oxygen, which they release back into the air for us to breathe."

Reference
In addition to your prompt, you have the option to add a reference. Adding a reference to an LLM prompt guides the model toward the desired result, enhancing relevance and precision and reducing the need for multiple iterations to achieve the target response.