These years AI begin to trend as if the mankind discovered hot water, so i decided to take a closer look at it.
I did my researches, i learnt a bit more about it and started doing practice with writing a simple classifier neural network from scratch with numpy for sentiment analysis.
Then i explored popular frameworks like Tensorflow and PyTorch to develop a seq2seq model with attention and then a transformer for natural language processing, i surely learn a lot.
But what if i wanna just fine tune an already existing model or just integrate that in my software?
Hugging Face is an open-source community platform specializing in artificial intelligence, best known for its transformers python library, which provides pre-trained models for NLP, computer vision, and other AI applications.
It supports frameworks like PyTorch and TensorFlow, simplifying the implementation and training of advanced models.
It facilitates AI application deployment by giving access to optimized models and intuitive APIs accelerates development, reducing computational costs and experimentation time.
This makes it ideal for researchers, developers, and businesses looking to build AI solutions quickly and efficiently.
This community also made Gradio, a low code framework to build and deploy AI interfaces quickly.
Oh, by the way hunter returns thousands of exposed gradio dashboards: product.name="Gradio"
If you only want to run LLMs just use ollama.
Here's the little things i've done with huggingface and gradio:
[1] Generate musical instrumentals using Gradio and Facebook MusicGen AI models
import gradio as gr
from os import mkdir
from random import random
from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write
try:
mkdir("logs")
except FileExistsError:
pass
models = ["small", "medium", "large", "melody"]
strategies = ["loudness", "peak"]
def gen_music(size, description, duration, strategy):
model = MusicGen.get_pretrained(models[size])
model.set_generation_params(duration=duration)
wav = model.generate([description])
for idx, one_wav in enumerate(wav):
name = f"logs/{idx}{random()}"
audio_write(name, wav[0].cpu(), model.sample_rate, strategy=strategies[strategy])
return name+".wav"
with gr.Blocks() as interface:
gr.Markdown("# Music generator\nSelect model size, strategy, duration and write a description!")
with gr.Row():
with gr.Column():
msize = gr.Dropdown(models, label="Model size", type="index")
strategy = gr.Dropdown(strategies, label="Generation strategy", type="index")
duration = gr.Slider(10,180, label="Duration in seconds")
description = gr.Textbox(label="Music description")
output = gr.Audio(label="Generated audio")
button = gr.Button("Generate music!")
button.click(gen_music, inputs=[msize, description, duration, strategy], outputs=output)
interface.launch()
import gradio as gr
from transformers import AutoModelForCausalLM, AutoTokenizer
MODEL_ID = "Qwen/Qwen2.5-1.5B-Instruct" # Qwen/Qwen2.5-7B-Instruct, Qwen/Qwen2.5-32B-Instruct
class QwenInterface:
def __init__(self):
self.messages = [{"role": "system", "content": "You're a usefull assistant who answers to everything."}]
self.model = AutoModelForCausalLM.from_pretrained(MODEL_ID, torch_dtype="auto", device_map="auto")
self.tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
def get_response(self, prompt):
self.messages.append({"role": "user", "content": prompt})
text = self.tokenizer.apply_chat_template(self.messages, tokenize=False, add_generation_prompt=True)
model_inputs = self.tokenizer([text], return_tensors="pt").to(self.model.device)
generated_ids = self.model.generate(**model_inputs, max_new_tokens=512)
generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]
response = self.tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
self.messages.append({"role": "assistant", "content": response})
return response
chat = QwenInterface()
with gr.Blocks(title="Chat with Qwen", fill_height=True) as interface:
gr.Markdown(f"# {MODEL_ID.split('/')[1].replace('-', ' ')}")
gr.ChatInterface(lambda message, _: chat.get_response(message), autofocus=True, type="messages")
interface.launch()
You don't have enough resources?
If you can't run locally LLMs there are a lot of services which usually gives you an HTTP RESTful API to use their latest AI models in the cloud.
Almost all of them are designed for enterprise use and often comes with a payment subscription.
Hugginface gives you also an API to compute your needs into hundreds of models for wider purposes such as audio and image elaboration, but the constrait here is that you get monthly credits in the free tier plan.
I've discovered Groq, a fast AI inference service that also offer a free api key with a rate limit of 30 requests per minute.
Here's the documentation with its official python API wrapper library.
It's designed to be fast and currently support a vast choice of models to integrate easily in your code.