Видео 338
Просмотров 1 352 948

Uncover the Truth Behind AI Model Bias - It's More Serious Than You Think!

1:53

SLM in Action: Arcee Nova 72B local inference on math and code with Open WebUI and ollama

2:42

SLM in Action: Arcee Spark, Llama-3.1 8B, improved!

16:16

SLM in Action: Arcee Agent, A 7B model for function calls and tool usage

13:29

Deep dive: model merging, part 2

32:15

Arcee Cloud, part 5: model download

5:47

SLM in Action: Arcee Lite, a powerful 1.5B distilled model

In this video, you will learn about Arcee-Lite, a small yet powerful 1.5B model created with Distilkit, an open-source project for model distillation. Arcee-Lite outperforms Qwen2 1.5B, and is currently the best 1.5B model.
First, I run an 8-bit version on my M3 MacBook with ollama and OpenWeb UI. Then, I deploy the model on AWS with Amazon SageMaker. I run both synchronous and streaming inference. I also show you how to use the OpenAI Messages API, allowing you to invoke the model with the OpenAI prompting format.
* Model page (full precision model): huggingface.co/arcee-ai/arcee-lite
* Model page (quantized models): huggingface.co/arcee-ai/arcee-lite-GGUF
* Notebook: gitlab.com/juliensimon/...

Видео

Uncover the Truth Behind AI Model Bias - It's More Serious Than You Think!

1:53

Uncover the Truth Behind AI Model Bias - It's More Serious Than You Think!

Просмотров 17421 день назад

Uncover the Truth Behind AI Model Bias - It's More Serious Than You Think!

SLM in Action: Arcee Nova 72B local inference on math and code with Open WebUI and ollama

2:42

SLM in Action: Arcee Nova 72B local inference on math and code with Open WebUI and ollama

Просмотров 22828 дней назад

SLM in Action: Arcee Nova 72B local inference on math and code with Open WebUI and ollama

SLM in Action: Arcee Spark, Llama-3.1 8B, improved!

16:16

SLM in Action: Arcee Spark, Llama-3.1 8B, improved!

Просмотров 10 тыс.Месяц назад

SLM in Action: Arcee Spark, Llama-3.1 8B, improved!

SLM in Action: Arcee Agent, A 7B model for function calls and tool usage

13:29

SLM in Action: Arcee Agent, A 7B model for function calls and tool usage

Просмотров 12 тыс.Месяц назад

SLM in Action: Arcee Agent, A 7B model for function calls and tool usage

32:15

Deep dive: model merging, part 2

Просмотров 25 тыс.Месяц назад

Deep dive: model merging, part 2

5:47

Arcee Cloud, part 5: model download

Просмотров 1,2 тыс.Месяц назад

Arcee Cloud, part 5: model download

11:53

Arcee Cloud, part 4: model alignment

Просмотров 28 тыс.Месяц назад

Arcee Cloud, part 4: model alignment

Arcee Cloud, part 3: model continuous pretraining

17:27

Arcee Cloud, part 3: model continuous pretraining

Просмотров 27 тыс.Месяц назад

Arcee Cloud, part 3: model continuous pretraining

10:42

Arcee Cloud, part 2: model merging

Просмотров 30 тыс.Месяц назад

Arcee Cloud, part 2: model merging

9:10

Arcee Cloud, part 1: model deployment

Просмотров 174Месяц назад

Arcee Cloud, part 1: model deployment

SLM in Action: Local Inference with Arcee Nova 72B and Ollama

12:29

SLM in Action: Local Inference with Arcee Nova 72B and Ollama

Просмотров 9 тыс.Месяц назад

SLM in Action: Local Inference with Arcee Nova 72B and Ollama

SLM in Action: Arcee-Scribe, a 7.7B model for creative writing

13:44

SLM in Action: Arcee-Scribe, a 7.7B model for creative writing

Просмотров 10 тыс.Месяц назад

SLM in Action: Arcee-Scribe, a 7.7B model for creative writing

LLMs from the trenches - "LLMs are not intelligent, there is no reasoning"

2:28

LLMs from the trenches - "LLMs are not intelligent, there is no reasoning"

Просмотров 2732 месяца назад

LLMs from the trenches - "LLMs are not intelligent, there is no reasoning"

LLMs from the trenches - "Data is how you create a competitive advantage, not models"

6:39

LLMs from the trenches - "Data is how you create a competitive advantage, not models"

Просмотров 1232 месяца назад

LLMs from the trenches - "Data is how you create a competitive advantage, not models"

LLMs from the trenches - "First and foremost, it is a business discussion"

4:54

LLMs from the trenches - "First and foremost, it is a business discussion"

Просмотров 1632 месяца назад

LLMs from the trenches - "First and foremost, it is a business discussion"

LLMs from the trenches - "Closed model builders have decided for you"

8:27

LLMs from the trenches - "Closed model builders have decided for you"

Просмотров 1452 месяца назад

LLMs from the trenches - "Closed model builders have decided for you"

LLMs from the trenches - Bias, risk management, cultural differences, and all that good stuff

8:23

LLMs from the trenches - Bias, risk management, cultural differences, and all that good stuff

Просмотров 1452 месяца назад

LLMs from the trenches - Bias, risk management, cultural differences, and all that good stuff

Open Source AI with Hugging Face - Dallas AI meetup (05/2024)

1:16:34

Open Source AI with Hugging Face - Dallas AI meetup (05/2024)

Просмотров 7 тыс.2 месяца назад

Open Source AI with Hugging Face - Dallas AI meetup (05/2024)

Deploying Llama3 with Inference Endpoints and AWS Inferentia2

10:07

Deploying Llama3 with Inference Endpoints and AWS Inferentia2

Просмотров 7 тыс.3 месяца назад

Deploying Llama3 with Inference Endpoints and AWS Inferentia2

Discussion with Mark McQuade, CEO and co-founder, Arcee.ai

28:43

Discussion with Mark McQuade, CEO and co-founder, Arcee.ai

Просмотров 3,2 тыс.3 месяца назад

Discussion with Mark McQuade, CEO and co-founder, Arcee.ai

Migrating from OpenAI models to Hugging Face models

14:09

Migrating from OpenAI models to Hugging Face models

Просмотров 6 тыс.3 месяца назад

Migrating from OpenAI models to Hugging Face models

5:41

Deploying Llama3 on Amazon SageMaker

Просмотров 19 тыс.4 месяца назад

Deploying Llama3 on Amazon SageMaker

Deploy Hugging Face models on Google Cloud: directly from Vertex AI

5:21

Deploy Hugging Face models on Google Cloud: directly from Vertex AI

Просмотров 9 тыс.5 месяцев назад

Deploy Hugging Face models on Google Cloud: directly from Vertex AI

Deploy Hugging Face models on Google Cloud: from the hub to Vertex AI

4:57

Deploy Hugging Face models on Google Cloud: from the hub to Vertex AI

Просмотров 1,5 тыс.5 месяцев назад

Deploy Hugging Face models on Google Cloud: from the hub to Vertex AI

Deploy Hugging Face models on Google Cloud: from the hub to Inference Endpoints

7:14

Deploy Hugging Face models on Google Cloud: from the hub to Inference Endpoints

Просмотров 7705 месяцев назад

Deploy Hugging Face models on Google Cloud: from the hub to Inference Endpoints

Enterprise AI with the Hugging Face Enterprise Hub

30:51

Enterprise AI with the Hugging Face Enterprise Hub

Просмотров 7005 месяцев назад

Enterprise AI with the Hugging Face Enterprise Hub

Deploying Hugging Face models with Amazon SageMaker and AWS Inferentia2

19:38

Deploying Hugging Face models with Amazon SageMaker and AWS Inferentia2

Просмотров 9 тыс.5 месяцев назад

Deploying Hugging Face models with Amazon SageMaker and AWS Inferentia2

Phi-2 on Intel Meteor Lake - Physics question

0:17

Phi-2 on Intel Meteor Lake - Physics question

Просмотров 8395 месяцев назад

Phi-2 on Intel Meteor Lake - Physics question

Phi-2 on Intel Meteor Lake - Coding question

0:26

Phi-2 on Intel Meteor Lake - Coding question

Просмотров 4795 месяцев назад

Phi-2 on Intel Meteor Lake - Coding question

@snagyoung783 2 дня назад
Is hugging face model push required?
@sanjayclasses5666 2 дня назад
Super my respected Sir ❤
@philtoa334 3 дня назад
Nice.
@AngelaHall-h4k 3 дня назад
045 Kaylin Stream
@Nagendrababubattini 6 дней назад
Is possible to finetune the models which are available in Jumpstart? If yes please share the insights.
@juliensimonfr 5 дней назад
Some models allow it, some don't, see docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-use-studio-updated.html#jumpstart-foundation-models-use-studio-updated-fine-tune
@eduardwinston8681 8 дней назад
Thank you so much for this video! a pleasant and clear delivery. pls keep going!
@juliensimonfr 8 дней назад
Thank you! Will do!
@FranciscoDelcaPereira 8 дней назад
Thank you for the workshop-it was exactly what I needed. I do have one question though: if I want to save money and use my local machine to train the model, is there an easy way to quickly switch from AWS ec2 instance to my local machine?
@juliensimonfr 8 дней назад
Thank you. Yes, you can run the SageMaker SDK locally on your machine. The main difference is how to setup credentials with IAM, see See ruclips.net/video/K3ngZKF31mc/видео.html
@wEBMedPL 8 дней назад
I am trying to incorporate Arcee Scribe within my custom flow within Flowise (definitely worth to check that project out), but having no success - I guess it exceeds my abilities :(
@juliensimonfr 8 дней назад
Hi, I'm not familiar with Flowise at all.
@ravindranshanmugam782 10 дней назад
Excellent to bring out IAM issue, I am experiencing it. I am runnig program in Jy.notebook, not ec2. As admin user, how can I attach role (aoss) to achieve this. Thanks.
@juliensimonfr 9 дней назад
If you work on your local machine, you need AWS credentials (~/.aws/credentials) with enough permissions, and you need to set all roles explicitely with the arn, e.g. you can't use get_execution_role() in the SageMaker SDK. See ruclips.net/video/K3ngZKF31mc/видео.html
@arnabsinha2339 11 дней назад
Thanks for this awesome video. Any guidance on how to use Llama2/3 on Habana for realtime inference using TGI?
@juliensimonfr 11 дней назад
Sure! There's a TGI version for Habana: github.com/huggingface/tgi-gaudi
@arnabsinha2339 10 дней назад
@@juliensimonfr Thank you!
@subirm1511 12 дней назад
how can i integrate this with an application?
@robertbai2237 14 дней назад
Sorry, got a question. In the optimum doc i saw example using " better transformer" and "flash attention" at the same time. but in the better transformer it says, if use llama2 you don't need better transformer just the flash attention ( seems like). So should I use just the flash attention? also, can I combine Nvidia TensorRT as well?
@juliensimonfr 14 дней назад
Hi, these are complementary and work at different levels. BetterTransformer provides a generic "fast path" for PyTorch attention layers on CPU and GPU. It looks like torch.compile() is the better and more standard way to do this now, so I'd use that instead. Flash Attention is an optimized algorithm that reduces the amount of data moving between the GPU cores and the GPU RAM. As LLM inference is memory-bound, this is a a great way to speed it up. Flash Attention is a drop-in replacement for vanilla attention layers, and is implemented in inference servers like TGI or vLLM. TensorRT is a model compiler for NVIDIA GPUs. It includes many software and hardware optimizations, including Flash Attention. In summary, you should use: - torch.compile() to apply PyTorch optimization (layer fusion, etc) on all platforms. - Flash Attention in your inference server on GPU platforms. - TensorRT when running inference on NVIDIA GPUs.
@pigreatlor 14 дней назад
is it possible to finetune my own face ?
@donkeroo1 15 дней назад
What is the going rate for an AI scientist that can actually cobble together a functioning solution with significant return on investment. It has to be at least 500k. I’m not talking about those subpar Chemists/Engineers PhD “Data Scientist”, but an actual AI scientist that can build from concept to production. Curious
@sheikhshafayat6984 16 дней назад
The explanation was excellent. Thanks a lot!
@juliensimonfr 15 дней назад
Glad it was helpful!
@SeasonBible 17 дней назад
Hello Julien, great video! Still have a question if there is no table with clear content description, such as CV, can Amazon textract extract pdf file information and arrange into certain boxes that would be possible to fill information into different tool?
@juliensimonfr 15 дней назад
Maybe, maybe not. Textract can pick up "loose" relationships between a label and an item living outside a table, but it's not going to invent labels or column names.
@xspydazx 19 дней назад
will you do a tutorial on Distilkit? Please ? And the evolution merging ? I think Pruning is a great way to extract a strong model from a larger model : But how ! what are the do's and Donts? and really how to evaluate the information inside the tensors to select the correct layers to extract ? is there a way to preview how it would result before applying a prune ? IE how do you choose to select a subset of layers to run instead of a whole stack of layers ?.. In some pre-training people have expanded layer by layer ... some have begun with a full layer stack : if they expanded layer by layer Pruning is a real viable option for that model : but if they rained the layer stack as a whole then pruning is not prudent as the layers are inversely connected over the ranges: but the expanded models are heavily connected : hence identifying the data held in specific layers , as well as being able to run a subset of layers ... i would expect it to be like extracting a peft from the selected subset of layers and running that peft or merging that peft to a model ? im not great at these tasks or fully versed , ... but if your with the merging technology that means they have a greater understanding of what is happening in a pre-trained model and its layers : in theory it should not be hard to extract a specific subset of layers to make a new one ! it would only be the alignment after that is required as they still contain thier probability its just the layers are not highly connected: ie alignment should be done with a dataset that the model was already previously trained on ! hence aligning existing knowledge with the layers and aligning the tensors together to return this hidden data ! so before distilling i would suggest fine-tuning the model on a specific dataset for 1000 steps and over-train the model ! on these examples : then when the model is distilled retrain on the same dataset to align ! so you have your baseline to match ! here it would suggest that the previous data that was trained did indeed align ( and infer that all previous data also aligned to the new layer stack ?) again papers are always nice , but experiments and actual doing is always better ! , it nice to see the final result of the distillation ... what we gleen from this is that data does not need to travel a long distance through many layers to prove a result or prediction : in fact i personally believe that the longer the context you wish to use the more layers you need to hold the sequence ! so with smaller models that have great pre-training they will be great for pre-trained tasks with conversational sized contexts but smaller than the larger layer stack : but i would say they perform the same on the same task set : given a small context size ! and the loose when using larger context : hence for simple tasks they are perfect ! hence a Slim model can be considered an ideal agent to become a dedicated Tool ! ie: entity detector , documenter ( but not large summary writer ( smaller essays ) ...etc ...
@noinktechnique 19 дней назад
Every few weeks I have to read some threads in r/machinelearning and search out content like this to flush the buzzwords and singularity fanfiction that is so prevalent these days out of my brain. Thanks for posting! I'm looking forward to listening to the full discussion.
@juliensimonfr 19 дней назад
You're welcome. The full discussion is at www.twitch.tv/videos/2170990579
@xspydazx 20 дней назад
before i watch this ( i will coment after as well i like surprises) - > How do you train a model for funciton calling ? i see that you had a dataset , it has the fake calls inside : and i have use these datasets before and prompted them as they prompted the models : hence yes my model can call functions ! But : in training should we actually have the model coorectly set up to do the actual function call ? or have a set of results to fake the actual function call , so the model during training can access theses calls ! << as it would sem that funciton calling can only really be trained on a set of converstaional history containing the back and forth <<< and the funciton calling datasets do not always prvide this , often they provide some of the calls ? maybe not truly exaxctly as they were commited as , we do see the verbose outputs that these models produce and i would expect that we should be training the models on the full verbose data , which is not the case with these datasets ? So my questio of actually function calling or returning the actual expected values for th emodle to format the response correctly and the the series of calls and responses ? as in geeral the datasets also onoly leave the output seperate ( when the output is the actual process and not jut the answer ? the problem i highlght is the fact that the model did not Pause between function calls or attempt the internal conversation so it might not even reflexct a true possible answer from the modlel , such as synthestic data is : So the model is never really trained on function calling . I also found that if you want to do function calling with the model , it should be correctly alligned , ie most function calling is done on a hosted model , so that means that model needs to be trained as a messaging model (chatML) and if you take it to gguf ( it also needs to stay like this ( if it will be on ollama or lm studio ( hosted gguf ) ... ) but for general usage in a gpt4all unhosted it needs to be in the chatQA format or it also will not work : And if it is in tensors then you can change it on the fly to suit your needs ! SO when creating a model for a specifi purpose ( ie ReACT ) these are also the considerations ( ie the whole react process , not just the ins and outs for the data ) ... as well as make sure to train as chatml ! im not sure your dataset was a CHatML styled dataset , so i had to wangle it the Q/A prompt method ! ... As Always i expect a great video !
@xspydazx 20 дней назад
Also one mor question : what is the actual difference between tools and functions ? ( as i noticed with open AI ( function call ) is singular and Tool Calls are plural !) I also recently found that the Graph method ! is very good and helps to create the agentic workflow ! (passing your state down the path ) so that the model can have less functions or tools , utilising the graph entry points as decision trees ! opps.... chain of thoughts lol !
@arnabsinha2339 20 дней назад
Amazing performance! So use this model for RAG or also fine-tune? Look forward to your video on distillation. Thanks for the awesome content.
@juliensimonfr 20 дней назад
Thank you! Not sure how fine-tuning would affect distillation, that's an interesting idea :)
@giedrel1s 20 дней назад
Sadly anything under 8-12B is garbage for anything other than transcribing or other lightweight menial task
@juliensimonfr 20 дней назад
Bold statement, but what do I know? Thanks for watching :)
@oryxchannel 21 день назад
19:55 I think one or all of the creators (Pala Tej Deep, Rishabh Bhardwaj, Soujanya Poria)...are also fans of J _Dilla_ ...legendary music "sampler".
@juliensimonfr 20 дней назад
I have no idea, but I'll trust you on that :*)
@butcher977 22 дня назад
So basically the LLM is just redirecting based on predefined rules to corresponding python functions and print results? Is this how agents generally work?
@juliensimonfr 22 дня назад
Yes, then you extract the function call from the answer, run it, and often feed the result back to the LLM for story writing.
@xspydazx 20 дней назад
yes you can prompt the model and give it an example of a funciton call output and parse the call from the response , or you can use the funcitoncalling and toollcalling from these librarys ( local models do not have this ) so we use pyydantic to attempt to force the model to produce a structured output. : I found with local models its very slow , but if you host the same model the function calling is much faster , as the outputs are in general json based : this is the true requirement to get the structured output , just to be able to parse the data from the response : A model can produce calls anywhere in the response and we can still find them easy with a reg ex ! so its still parseable from the response : I use this method more right now for learnablity of the how the models work , (example meta prompting is very powerful ) < and this is transferable to other programming languages , as i personally use VB.NET ( after i fugured out how to make a transformer and after i watrch kparthy load the gpt2 by writing some code !<< i was able to move back to my original programming language and even load a mistral model !
@xspydazx 20 дней назад
also in this model setup every response is a funciton call : Making it easier : so your straight response is also a function call !
@hemangnehra7389 23 дня назад
Finance has always adopted technological breakthroughs first. I work at a tech company, and fortunately or unfortunately, finance will not adopt AI because as the person said, too much risk.
@jamescash4065 24 дня назад
Very important ❤
@juliensimonfr 24 дня назад
Yes. Different country, different culture, different rules.
@darkmatter9583 26 дней назад
hi i have a dataset with json files of 5gb how can i use that data? on openai inside throws very little data
@user-wr4yl7tx3w Месяц назад
no audio
@juliensimonfr Месяц назад
There isn't any. The demo speaks for itself :)
@bhanuchirutha Месяц назад
great , I agree sometimes you have to spend a lot of time on IAM than the original problem what a mess
@juliensimonfr Месяц назад
Yes, even if you know what you're doing, it's difficult to be 100% sure 🤣
@mourady5588 Месяц назад
Thank you very much Julien for this high-quality excerpt! Could you please attach the slides in the description, as well as under the other videos?
@juliensimonfr Месяц назад
Hi, you'll find the slides at fr.slideshare.net/slideshow/julien-simon-deep-dive-optimizing-llm-inference/270920916. I'll share the other ones in the next week or so.
@mourady5588 Месяц назад
@@juliensimonfr thanks a lot!
@pavansaitadepalli6097 Месяц назад
Julien, this was a great video
@juliensimonfr Месяц назад
Thank you!
@alexis91459 Месяц назад
Super cool! Just why in speculative decoding the validation part made by the bigger model is faster? I don"t understand how validation works
@juliensimonfr Месяц назад
Good question. The main reason is that the input verification by the larger model only requires a single forward pass per candidate sequence. This is much faster than the usual text generation process, which requires one forward pass per new token. If the larger model disagrees on a particular token, then it will generate a better one and the next ones. However, all the tokens generated up to that point by the smaller model are used as is. So, in the end we get large-model generation quality, only quicker :) Makes sense ? Here's a detailed example: huggingface.co/blog/whisper-speculative-decoding
@arnabsinha2339 Месяц назад
Awesome video Julien. When is part 3 coming?
@juliensimonfr Месяц назад
Thank you. Which algos are you interested in?
@francoisdev Месяц назад
Merci Julien!
@juliensimonfr Месяц назад
Avec plaisir !
@AI-Projects24 Месяц назад
Is there any chance to get the slides? Its very well organized and presented. Thank you so much for your work✨🔥🔥
@juliensimonfr Месяц назад
Hi, you can find the slides on Slideshare at Slides: fr.slideshare.net/slideshow/julien-simon-deep-dive-quantizing-llms/270921785
@Lifelessons-sv7pr Месяц назад
Will it work on stabilityai/sd-turbo? I am unable to make it work 😢
@juliensimonfr Месяц назад
I don't know. You should ask for help at discuss.huggingface.co or create an issue in the Optimum Intel repository.
@arnabsinha2339 Месяц назад
Julien, this was a great video and walking down memory lane. Thank you for taking the time to do this. One question: If the pytorch folks natively supported cpu/gpu/tpu via torch.compile then is the integration with Open XLA to support future AI accelerators only?
@juliensimonfr Месяц назад
Thank you. Yes, I think the purpose of OpenXLA is to provide a unified interface hiding the complexity of custom accelerators (AWS Trainium, etc.).
@xspydazx Месяц назад
i think the quantization step is very important , as it reduces the size of the model : it would be good to be able to push it to huggingface then it can be quantized by gguf my repo then the model will be 4.5 gig !! as it is a very large download ... ( esepcially for us slow consumers ( i used to be i a fast internet country but now im in the slow ones ))>> hence cloud dev is the only way to have the same funcitons , without requiring up-to-date technology .. hence i think the pay as you go service is very important !! << Dont forget the little guys ! ( how can i teach my local learners who do ot even have laptops ? computing ... you know it ca even be doen from a good phoe or tablet .. so i feel the market could be ( ema ) countrys ... as well as india a big consumwer of clouds services ( also financially depreived ( in large quantitys ) ) << as well as the STUDENT ! <<
@xspydazx Месяц назад
Good Series ! < quite calm and followable , and repeatable (nearly) .>>
@xspydazx Месяц назад
ok i think the code platform to operate on your model is very good as this will have to be the way to get the custom models and datsets up ...Good stuff
@xspydazx Месяц назад
I really hope we can get the huggingface datsets to use with this :: As i have already placed my great data sets there ! ( so convienient )
@juliensimonfr Месяц назад
You can use HF datasets for alignment, see upload_hugging_face_dataset_qa_pairs() in the Arcee SDK. For pretraining, we don't have that option yet. I'll share your feedback, thank you.
@xspydazx Месяц назад
@@juliensimonfr i think its a lovely simple site exactly what is needed : just do your simple tasks and leave : In fact in truth you dont need to store every thing on site .. but it does help to have a space : but i also see ( space as an expense so i would expect the pay as you go(to have some limitations on this )
@xspydazx Месяц назад
that merge sir!!... its a good merge i also done it :: :Many ways ...All experiments : to leanr about how the merges worked ( before your great video on merging techniques ) I would have put the mistral 2 as the base model : as you could justify that you are upgradiing to biomistal to the instruct 2 version of the mistral: so by leveraging the base model ( mistral 1 , and placing the new desired base model ie mistral 2 ... the merger will use the new tokenizer from mistral 2 and the same config, and ... merge the rest of the model list into that model : I can see that you used the mistral 1 as the base model because it was the original base model for the bio mistral , but i would not consider the BASE RAW model to be the base model but as a QVector .. to align the bio mistral to the new chosen base model : i could have chosen the base model for the bio mistral ? but this would have l=kept the bio mistral the same as it would have only grabbed the deltas from the two mistral models ? hence its not much improvment : but it would preserve the Tokenizer from the bio mistral which may have added conceptula meaning and relationn to the unique domain bio mistral data : If i had mixed a bio mistral and a code mistral ( dolphinCoder ) then i would have chosen a neutral base model ( ie the most common base among the collection of models being merged On to the Base model ) <<<< The base model is the model which is the priority base :? Maybe ? Nice video as usual
@xspydazx 14 дней назад
I hope that made sense !! As it was just about what happens when you pick the base model and why to pick a particular skew ! One quicky note : when deconstructing a model it's the same , your extracting the difference between models , so if you have merged many models technically you can extract to a specific base model or checkpoint .. by selecting the check point as the base model for the deconstruction ! .. extracting the lora up to that check point . Enabling for you to use the lora on a earlier base model and reach the same point (the lora contains the deltas)
@xspydazx Месяц назад
yes but the teirs are very expensive ? - they need a py as you go option ; so we can probably load the reuired credits needed to do a single job ... its very restricted ? i did not even see where to pay ? easily ? How to get my current mistral onto the site for trainning ? as i base trained it with my personal data and changed my settings : SO i cannot train on the base and output the peft for attachment ? Will this conect directly to hugging face models and datasets? ...alowing the import and export of the final models? FOr me it would be good to be able to have amy model (stored/ Not hosted (while training project in being done ( ie a month or two : ) testinng it with the deployment (very good ) then uploading to huggig face when complete to deleate and reimport the next model ... ( i primary use 7b model but i would like to try to mix modalitys ie add a encoder from a audio/image input model ? ... so adding an optional steps of encoder decoder model .. or just the decoder modality .. hence the test deployment is vital for this as this is where huggingface fails ( we need to be able to get free endpoints for upto 7/8b model ( perhaps only for short visits only but at least to test ! ) .... the site looks like it will be easy to use without having to sit in a jupiter notebook to perform the daily tasks : SO a pay as you go feature is vital to rapid expansion ? ( monthly really at that price is more than open source user can afford ) ( hence colab 10-20 euro ) lasts quite along time !! and can do a serious train if need be :( monitored is thier only downfall )( availablity issues too for the best serving!! i hope you overcome this usablity issues as i think its key to living up to the phrase we are wishing to help the open source devleoper ... i think the test rig is the service and not the hosting ( as hugginf face spaces !! <<< really is very excellent for this and over time i expect will raise performance as well as allowing for a single gpu to manage (7-8 b models) ...( for training and devlopment ( hf is not good ) so each for its purpose : I also like the fact that the user does not need to understand the requierements of the machine configuration also !! !<< as this has also blocked me from speding on cloud services .. ( even hface ) ( the colab pricing teirs is actually quite Fair , their Google Cloud ( if you can get it to connect up looks actually well crafted despite being censored and restricted in various countrys due to thier politcal affiinitys ( hence your data is at risk with google ) hence independant such as open osource providers are safe as you choose what to share ! << and anything can be deleted and uploaded at will 1<< Freedoms in the hand of the user.. SO there is a lot to consider with my post sorrry ... But i wish you guys luck with this and im exicted to watch the whole series before deciding ( but 300... phew No Way(i Wish ) ... My current goal with my models is ( recall ) and Personality ...( as well as im-plementing the methodologies such as chain of thoughts and visual pacial reasoning etc, this is what i train my mistral for ... ) i have noticed you do not need a lot of data to train for such funcitonality ... in fact 1000-2000 is enogh to teach a model a new trick ....Ie over fit (loss 0.2/0.1 /0.0064 ) the task on 100 samples first before training it for the task, 1000-2000 samples ( to be fit (0.5 loss) ( training the lm head ( with most layers activated ie Push 345,000,000 parameters ) .... then train normally (8-16) lora ..... on a large dataset of 10,000.... fit any where under loss of 1 ... ranging from 1.2-1.5 to 0.5 loss ( this allows the model to become a predictor ) .... now the task is trained .. the prompt can be removed and the model realigned to alpaca ....( allowiing for the new methodology to be absorbed in the model : , rememebring the prompt , you can activate the task using the same prompt !! <<) .... so lots of small training sessions..
@RemekKinas Месяц назад
Great series!
@juliensimonfr Месяц назад
Thank you!
@divyagarh Месяц назад
Hi Julien, I followed exact steps for Meta's newer model Llama 3.1 8B Instruct and get this error on Sagemaker, "The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. The tokenizer class you load from this checkpoint is 'PreTrainedTokenizerFast'. The class this function is called from is 'LlamaTokenizer'." Any thoughts? Please please help.
@juliensimonfr Месяц назад
Is this the deployment failing? which cell gives you this error?
@divyagarh Месяц назад
@@juliensimonfr the deploy cell. I found these errors in the log. A lot of people are experiencing the same issue.
@Renozilla Месяц назад
Thanks for sharing, amazing content
@juliensimonfr Месяц назад
Thank you!
@melikanobakhtian6018 Месяц назад
That was great and it helped me so much! Is there this possibility to have the presentation slides?
@juliensimonfr Месяц назад
Hi, you can find the slides on Slideshare at fr.slideshare.net/slideshow/julien-simon-deep-dive-model-merging/270921708
@FushigiMigi Месяц назад
Need to know how to communicate with chat models that are running using python code. I’m struggling to find this information.
@juliensimonfr Месяц назад
Check out the Inference Endpoints documentation. The format is simple JSON.
@itayatelis2898 Месяц назад
Naw Julien you left HF??
@juliensimonfr Месяц назад
Yep :)
@JayPrakash-py3sh 22 дня назад
Mm@@juliensimonfr
@CMAZZONI Месяц назад
Hello, thank you so much for doing this video, the only question I have is that some of the models do not have a deploy option in the model card (for example gliner models). Is there a way to use these? Many thanks!
@juliensimonfr Месяц назад
You're welcome! You mean these, right: huggingface.co/urchade ? Not 100% sure, but they don't seem to be supported by the transformers library (see huggingface.co/docs/transformers/main/en/model_doc/bert), so this would explain why they can't be deployed in the standard way. The alternative would be to deploy it in a pytorch environment with the appropriate dependencies, see cloud.google.com/blog/topics/developers-practitioners/pytorch-google-cloud-how-deploy-pytorch-models-vertex-ai.
@Hotboy-q7n Месяц назад
Is there a course to understand this here?
@juliensimonfr Месяц назад
What would you like to understand? Building models? Running them?
@Hotboy-q7n Месяц назад
@@juliensimonfr Thx for awnser Julien!!! I want to know how to use Arcee/Scribe to develop a bot like Character AI or Replika, in a chatting style. I am willing to do whatever it takes. I want to know how to create and train it.
@xspydazx Месяц назад
@@Hotboy-q7n i agree i think Character building : and trying to imprit a personality .. or a specific model of speech or reaction to situations etc : Not neccasarly role play models : i have been experimenting with movie scripts and subtitles : to give the model conversations to enable it to have some form of character , i removed the personal refferences like names i now also realize you can replace then with tag [name] if you chat with the model and tell it your name it will fill the square with your actual name o the fly ... so i would also probably use this technique more and more: i think that the main issue is handling the conversations < or even converting existing step by step data etc into conversations instead .. so giving the model dialogs? .. to persoalize the model ( i used the samatha dataset ( it does reduce brain performance , but i replaced with my own name ) , i also used my chat historys from other apps ... as it also contains my style .. so by giving the model some personal data it is basically my own charachter i build : When i fine tune i chage the settings for the peft , and do many eopochs until i get the desied loss rate for the information ... ( the lower the loss the higher the priority ) ... it does make a big difference .
@ChouaibNemri Месяц назад
Congrats on joining Arceee! Great demo! Keep inspiring and empowering us! <3
@juliensimonfr Месяц назад
Thank you so much!

Julien Simon

Видео

Комментарии