{ "cells": [ { "cell_type": "markdown", "source": [ "# Finetuning experiments\n", "\n", "based on https://github.com/ml-explore/mlx-examples/tree/main/lora" ], "metadata": { "collapsed": false }, "id": "d6264ff5d5024ba1" }, { "cell_type": "markdown", "source": [ "## Create a 4-Bit quantized model" ], "metadata": { "collapsed": false }, "id": "f3d8cb11f32bf4de" }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "!python convert.py --hf-path mistralai/Mistral-7B-v0.1 -q" ], "metadata": { "collapsed": false }, "id": "8bb22a5cb2ec1db0" }, { "cell_type": "markdown", "source": [ "## Create training data \n", "\n", "### Download website data\n", "\n", "This only downloads new content if the list of journals has been changed or already downloaded files have been deleted. To overwrite existing files, use `overwrite=True`" ], "metadata": { "collapsed": false }, "id": "1135fbc8a6ced279" }, { "cell_type": "code", "execution_count": 2, "outputs": [ { "data": { "text/plain": "Downloading Content: 0%| | 0/130 [00:00<?, ?it/s]", "application/vnd.jupyter.widget-view+json": { "version_major": 2, "version_minor": 0, "model_id": "7470149ad0534f15bc7158770475355c" } }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloaded 0 web pages.\n" ] } ], "source": [ "from lib.prepare_training_data import download_input_data\n", "download_input_data(input_file='data/editors.csv', \n", " output_dir='data/website-data', \n", " overwrite=False)" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-02-28T09:49:56.822042Z", "start_time": "2024-02-28T09:49:56.780283Z" } }, "id": "9eb2effc7bfb22f" }, { "cell_type": "code", "execution_count": 2, "outputs": [], "source": [ "from lib.prepare_training_data import create_training_file\n", "\n", "instruction = \"Below is the content of a website of a German law journal. For each member of the editorial board or the advisory board, extract the following information: lastname, firstname, title, position, affiliation, role. Return as a YAML list of dictionaries. Omit keys that you cannot find information for.\"\n", "\n", "create_training_file(instruction=instruction,\n", " input_file='data/editors.csv', \n", " output_dir='data', \n", " website_dir='data/website-data',\n", " cols_to_remove = ['journal_abbr', 'website', 'retrieved_on'],\n", " column_to_filter_by='lastname',\n", " lines_before=2, lines_after=1)" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-02-28T12:28:21.729698Z", "start_time": "2024-02-28T12:28:20.329840Z" } }, "id": "31a2389404720256" }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "!python lora.py --model mlx_model/Mistral-7B-v0.1 --train --iters 600 --batch-size 1 --lora-layers 4" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-02-28T11:02:03.193773Z", "start_time": "2024-02-28T11:02:03.190364Z" } }, "id": "fd1a48e84474aaea" }, { "cell_type": "markdown", "source": [ "``` \n", "$ python lora.py --model mlx_model/Mistral-7B-v0.1 --train --iters 600 --batch-size 1 --lora-layers 4\n", "Loading pretrained model\n", "Total parameters 1242.763M\n", "Trainable parameters 0.426M\n", "Loading datasets\n", "Training\n", "Iter 1: Val loss 1.805, Val took 93.856s\n", "Iter 10: Train loss 1.275, It/sec 0.144, Tokens/sec 115.780\n", "[WARNING] Some sequences are longer than 2048 tokens. Consider pre-splitting your data to save memory.\n", "Iter 20: Train loss 1.052, It/sec 0.087, Tokens/sec 92.686\n", "[WARNING] Some sequences are longer than 2048 tokens. Consider pre-splitting your data to save memory.\n", "Iter 30: Train loss 1.230, It/sec 0.110, Tokens/sec 91.892\n", "[WARNING] Some sequences are longer than 2048 tokens. Consider pre-splitting your data to save memory.\n", "Iter 40: Train loss 1.032, It/sec 0.109, Tokens/sec 91.080\n", "Iter 50: Train loss 0.977, It/sec 0.128, Tokens/sec 95.607\n", "Iter 60: Train loss 1.021, It/sec 0.166, Tokens/sec 94.361\n", "[WARNING] Some sequences are longer than 2048 tokens. Consider pre-splitting your data to save memory.\n", "Iter 70: Train loss 1.077, It/sec 0.097, Tokens/sec 87.647\n", "[WARNING] Some sequences are longer than 2048 tokens. Consider pre-splitting your data to save memory.\n", "libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)\n" ], "metadata": { "collapsed": false }, "id": "7e10d007a2d411f0" } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 5 }