Implementing Mistral-7B as a feedback generator given an input accuracy?

23 views Asked by At

I am using Mistral-7B to generate feedbacks given an accuracy input. After tokenization, the entries of the custom-made dataset look like:

{'text': "<s>[INST] Generate a feedback on the performance which takes into account the input accuracy. here are the inputs 0 [/INST] \\n You're starting to get familiar. Let's keep practicing! </s>",
 'instruction': 'Generate a feedback on the performance which takes into account the input accuracy.',
 'input': '0',
 'output': "You're starting to get familiar, which is the first step. Let's keep practicing!"}

Accuracy can range from 0-100, grouped by interval of 10, meaning that each group of 10 inputs corresponds three different output sentences (i.e. "You're starting to get familiar. Let's keep practicing!") for a total of 300 entries. I want to fine-tune the model, and I am aware 300 entries is small. Below the training loss: enter image description here

As for the training code, I simply implemented the tutorials for the Mistral-7B finetuning:

model_name = "mistralai/Mistral-7B-Instruct-v0.2"

# Load the base model with QLoRA configuration
compute_dtype = getattr(torch, "float16")


bnb_config = BitsAndBytesConfig(
     load_in_4bit=True,
     bnb_4bit_use_double_quant=True, # nested quantisation (additional quantisation)
     bnb_4bit_quant_type="nf4",
     bnb_4bit_compute_dtype=torch.bfloat16
)

base_model = AutoModelForCausalLM.from_pretrained(model_name,
    quantization_config=bnb_config)
base_model.config.use_cache = False
base_model.config.pretraining_tp = 1

# Load MitsralAi tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

peft_config = LoraConfig(
     lora_alpha=16,
     lora_dropout=0.1,
     r=64,
     target_modules=[
         "q_proj",
         "k_proj",
         "v_proj",
         "o_proj",
         "gate_proj",
         "up_proj",
         "down_proj",
         "lm_head",
     ],
     bias="none",
     task_type="CAUSAL_LM",
     modules_to_save=["score"]
)
base_model = get_peft_model(base_model, peft_config)

# Set training parameters
training_arguments = TrainingArguments(
     output_dir="../gpublob/results",
     report_to = "wandb",
     per_device_train_batch_size=1,
     per_device_eval_batch_size=1,
     num_train_epochs=1,
     optim="paged_adamw_32bit",
     save_steps=1000,
     logging_steps=25,
     #eval_steps = STEPS_PER_EPOCH,
     learning_rate=2e-4,
     weight_decay=0.001,
     fp16=False,
     bf16=False,
     max_grad_norm=0.3,
     max_steps=100000, # the total number of training steps to perform
     warmup_ratio=0.03,
     group_by_length=True,
     lr_scheduler_type="constant"
)

trainer = SFTTrainer(
     model=base_model,
     train_dataset=train_dataset,
     peft_config=peft_config,
     dataset_text_field="text",
     max_seq_length=None,  # You can specify the maximum sequence length here
     tokenizer=tokenizer,
     args=training_arguments,
     packing=False,
)

base_model.gradient_checkpointing_enable()
base_model = prepare_model_for_kbit_training(base_model)

# train
trainer.train()

Is my data structure correct?

Do you think agumenting the fine-tuning the dataset would be crucial?

0

There are 0 answers