Build the next wave of AI on Windows with DirectML support for PyTorch 2.2

Build the next wave of AI on Windows with DirectML support for PyTorch 2.2

Today, Windows developers can leverage PyTorch to run inference on the latest models across the breadth of GPUs in the Windows ecosystem, thanks to DirectML. We’ve updated  Torch-DirectML to use DirectML 1.13 for acceleration and support PyTorch 2.2. PyTorch with DirectML simplifies the setup process, through a one-package install, making it easy to try out AI powered experiences and supporting your ability to scale AI to your customers across Windows.

To see these updates in action, check out our Build session Bring AI experiences to all your Windows Devices.

See here to learn how our hardware vendor partners are making this experience great:

  • AMD: AMD is glad PyTorch with DirectML is enabling even more developers to run LLMs locally. Learn more about where else AMD is investing with DirectML.
  • Intel: Intel is excited to support Microsoft’s PyTorch with DirectML goals – see our blog to learn more about the full support that’s available today.
  • NVIDIA: NVIDIA looks forward to developers using the torch-directml package accelerated by RTX GPUs. Check out all the NVIDIA related Microsoft Build announcements around RTX AI PCs and their expanded collaboration with Microsoft.

PyTorch with DirectML is easy-to-use with the latest Generative AI models

PyTorch with DirectML provides an easy-to-use way for developers to try out the latest and greatest AI models on their Windows machine. This update builds on DirectML’s world class inferencing platform ensuring these optimizations provide a scalable and performant experience across the latest Generative AI models. Our aim in this update is to ensure a seamless experience with relevant Gen AI models, such as Llama 2, Llama 3, Mistral, Phi 2, and Phi 3 Mini, and we’ll expand our coverage even more in the coming months!

The best part is using the latest Torch-DirectML package with your Windows GPU is as simple as running:

pip install torch-directml

Once installed, check out our language model sample that will get you running a language model locally in no time! Start by installing a few requirements and logging into the Hugging Face CLI:

pip install –r requirements.txt 
huggingface-cli login

Next, run the following command, which downloads the specified Hugging Face model, optimizes it for DirectML, and runs the model in an interactive chat-based Gradio session!

python app.py --model_repo “microsoft/Phi-3-mini-4k-instruct”

Build the next wave of AI on Windows with DirectML support for PyTorch 2.2

Phi 3 Mini 4K running locally using DirectML through the Gradio Chatbot interface.

These latest PyTorch with DirectML samples work across a range of machines and perform best on recent GPUs equipped with the newest drivers. Check out the Supported Models section of the sample for more info on the GPU memory requirements for each model.

This seamless inferencing experience is powered by our close co-engineering relationships with our hardware partners to make sure you get the most of your Windows GPU when leveraging DirectML.

Try out PyTorch with DirectML today

Trying out this update is truly as simple as running “pip install torch-directml” in your existing Python environment and following the instructions in one of our samples. For more guidance on getting setup visit the Enable PyTorch with DirectML on Windows page on Microsoft Learn.

This is only the beginning of the next chapter with DirectML and PyTorch! Stay tuned for broader use case coverage, expansion to other local accelerators, like NPUs, and more. Our goal is to meet developers where they’re at, so they can use the right tools to build the next wave of AI innovation.

We’re excited for developers to continue innovating with cutting edge Generative AI on Windows and build the AI apps of the future!

Source: Windows Blog






Quantization with DirectML helps you scale further on Windows

DirectML support for Phi 3 mini launched last month and we’ve since made several improvements, unlocking more models and even better performance!

Developers can grab already quantized versions of Phi-3 mini (with variants for the 4k and 128k versions). They can now also get Phi 3 medium (4k and 128k)  and Mistral v0.2. Stay tuned for additional pre-quantized models! We’ve also shipped a gradio interface to make easier to test these models with the new ONNX Runtime Generate() API. Learn more.

Be sure to check out our Build sessions to learn more. See below for details.

See here to learn what our hardware vendor partners have to say:

What is quantization?

Memory bandwidth is often a bottleneck for getting models to run on entry-level and older hardware, especially when it comes to language models. This means that making language models smaller directly translates to increasing the breadth of devices developers can target.

There’s been a lot of research into reducing model size through quantization, a process that reduces the precision and therefore size of model weights.

Our goal is to ensure scalability, while also maintaining model accuracy, so we integrated support for models that have had Activation-Aware Quantization (AWQ) applied to them. AWQ is a technique that lets us reap the memory savings from quantization with only a minimal impact on accuracy. AWQ achieves this by identifying the top 1% of salient weights that are needed for maintaining model accuracy and then quantizes the remaining 99% of weights. This leads to much less accuracy loss with AWQ compared to other techniques.

The average person reads up to 5 words/second. Thanks to the significant memory wins from AWQ, Phi-3-mini runs at this speed or faster on older discrete GPUs and even laptop integrated GPUs. This translates into being able to run Phi-3-mini on hundreds of millions of devices!

Check out our Build talk below to see this in action!

Perplexity measurements

Perplexity is a measure used to quantify how well a model predicts a sample. Without getting into the math of it all, a lower perplexity score means the model is more certain about its predictions and suggests that the model’s probability distribution is closer to the true distribution of the data.

Perplexity can be thought of as a way to quantify the average number of branches in front of a model at each decision point. At each step, a lower perplexity would mean that the model has fewer, more confident choices to make, which reflects a more refined understanding of the topic. A higher perplexity would mean more, less confident choices and therefore choices that are less predictable, relevant, and/or varied in quality.

As you can see below our data shows that AWQ leads to a small loss in model accuracy with only a small increase in perplexity. In return, using AWQ means 4x smaller model weights, leading to a dramatic increase in the number of devices that can run Phi-3-mini!

Model variant Dataset Base model perplexity AWQ perplexity Difference
Phi3 mini 128k wikitext2 14.42 14.81 0.39
Phi3 mini 128k ptb 31.39 33.63 2.24
Phi3 mini 4k wikitext2 15.83 16.52 0.69
Phi3 mini 4k ptb 31.98 34.3 2.32

Learn more

Be sure check out the these sessions at Build to learn more:

Get Started

Check out the ONNX Runtime Generate() API repo to get started today: https://github.com/microsoft/onnxruntime-genai

See here for our chat app with a handy gradio interface: https://github.com/microsoft/onnxruntime-genai/tree/main/examples/chat_app

This lets developers choose from different types of language models that work best for their specific use case. Stay tuned for more!

Drivers

We recommend upgrading to the latest drivers for the best performance.

Source: Windows Blog






Introducing the WebNN Developer Preview with DirectML

Introducing the WebNN Developer Preview with DirectML

We are excited to announce the availability of the developer preview for WebNN, a web standard for cross-platform and hardware-accelerated neural network inference in the browser, using DirectML and ONNX Runtime Web. This preview enables web developers to leverage the power and performance of DirectML across GPUs with support coming soon for Intel’s® Core™ Ultra processors with Intel® AI Boost and the Copilot+ PC, powered by Qualcomm® Hexagon™ NPUs.

Diagram showing how WebNN fits in the architechture

WebNN is a game-changer for web development. It’s an emerging web standard that defines how to run machine learning models in the browser, using the hardware acceleration of your local device’s GPU or NPU. This way, you can enjoy web applications that use machine learning without any extra software or plugins, and without compromising your privacy or security. WebNN opens up new possibilities for web applications, such as generative AI, object recognition, natural language processing, and more.

WebNN is a web standard that defines how to interface with different backends for hardware accelerated ML inference. One of the backends that WebNN can use is DirectML, which provides performant, cross-hardware ML acceleration across Windows devices. By leveraging DirectML, WebNN can benefit from the hardware scale, performance, and reliability of DirectML.

With WebNN, you can unleash the power of ML models in your web app. It offers you the core elements of ML, such as tensors, operators, and graphs. You can also combine it with ONNX Runtime Web, a JavaScript library that enables you to run ONNX models in the browser. ONNX Runtime Web includes a WebNN Execution Provider that simplifies your use of WebNN.

To learn more or to see this in action, be sure to check out our various Build sessions. See below for details.

See here to learn what our hardware vendor partners have to say:

  • AMD: AMD is excited about the launch of WebNN with DirectML enabling local execution of generative AI machine learning models on AMD hardware. Learn more about where else AMD is investing with DirectML.
  • Intel: Intel looks forward to the new possibilities WebNN and DirectML bring to web developers – learn more here about our investments in WebNN. Please download the latest driver for best performance.
  • NVIDIA: NVIDIA is excited to see DirectML powering WebNN to bring even more ways for web apps to leverage hardware acceleration on RTX GPUs. Check out all the NVIDIA related Microsoft Build announcements around RTX AI PCs and their expanded collaboration with Microsoft.

Getting Started with the WebNN Developer Preview

With the WebNN Developer Preview, powered by DirectML and ONNX Runtime Web, you can run ONNX models in the browser with hardware acceleration and minimal code changes.

To get started with WebNN on DirectX 12 compatible GPUs you will need:

  • Window 11, version 21H2 or newer
  • ONNX Runtime Web minimum version 1.18
  • Microsoft Edge Canary channel, with the WebNN flag enabled in about:flags

For more instructions and information about supported models and operators, please visit our documentation. To try out samples, please visit the WebNN Developer Preview page.

Learn more

Be sure to check out these sessions at Microsoft’s Build Conference to learn more about WebNN:

Additional WebNN documentation and samples:

Source: Windows Blog






ASUS VivoWatch 6 - Stay in Touch with Your Wellness

ASUS VivoWatch 6 – Stay in Touch with Your Wellness

ASUS VivoWatch 6 - Stay in Touch with Your Wellness

ASUS VivoWatch 6 is a pioneering smartwatch designed for your wellness. It boasts the exclusive fingertip blood pressure, ECG and Body Composition measurements, along with a suite of other health monitoring features such as sleep tracking, stress management, SpO₂, heart rate and temperature measurements, all displayed on a vivid 1.39-inch AMOLED screen.

Learn more about
ASUS VivoWatch 6:
https://asus.click/Vivowatch6HCD06

#Smartwatch #Bloodpressure #Smarthealthcare

Follow us to get the latest news!
► TikTok: https://www.tiktok.com/@asus
► Facebook: https://www.facebook.com/asus
► Twitter: https://twitter.com/asus
► Instagram: https://www.instagram.com/asus
► LinkedIn: https://www.linkedin.com/company/asus/

Source: ASUS YouTube






ASUS Handheld Ultrasound LU800 for Veterinary Care - Envisioning the Future of Veterinary Care

ASUS Handheld Ultrasound LU800 for Veterinary Care – Envisioning the Future of Veterinary Care

ASUS Handheld Ultrasound LU800 for Veterinary Care - Envisioning the Future of Veterinary Care

Tough, durable, and reliable. Innovative ASUS handheld ultrasound technologies empower veterinarians to provide better care for patients.

Easy to carry and ideal for point-of-care testing in the clinic, as well as on-farm visit, house calls and veterinary telehealth, performing comprehensive diagnostic evaluations across different species and clinical scenarios, ultimately improving patient care outcomes.

From small pets to large livestock, ASUS Handheld Ultrasound LU800 provides real-time high-definition imagery to allow for quick and confidence diagnosis.

#ASUShandheldultrasound #VeterinaryUltrasound

Learn more about
ASUS Handheld Ultrasound LU800 for Veterinary Care:
https://asus.click/Ultrasound-LU800-VetCare-YT

Follow us to get the latest news!
► TikTok: https://www.tiktok.com/@asus
► Facebook: https://www.facebook.com/asus
► Twitter: https://twitter.com/asus
► Instagram: https://www.instagram.com/asus
► LinkedIn: https://www.linkedin.com/company/asus/

Source: ASUS YouTube