Embarking on a journey of deep learning with Microsoft Fabric, we’re taking on an intriguing task: differentiating between pangolins and armadillos using a fastai vision model. This blog will be your comprehensive guide to training and deploying a model that can accurately distinguish these animals. Leveraging Microsoft Fabric’s advanced analytics, we’ll utilize fastai to build a model that’s both effective and quick to implement. Whether you're a data enthusiast or a seasoned analyst, join us as we delve into the intricacies of machine learning and outline the steps to accomplish this fascinating challenge.
Inspiration for this post and many of the implementation details come from the fantastic Practical Deep Learning for Coders course. My contributions here demonstrate how to get this working in Fabric with mlflow. To better understand the code in this blog (e.g., what is a datablock?), check out lesson 1 of the course above.
In this blog, you will find:
Next Steps: Explore 'Fabric and Copilot' Recordings
Conclusion
You may be interested in these blogs:
💰 Efficient Cost Management with Copilot for PowerBI: A Complete Guide
📈 Power BI Usage Metrics Across All Workspaces: Step-by-Step
📊 How AI Data Analysis Enhances Analytics: Key Benefits & Top Tools
📓 Installing the ArcGIS Python Module in a Fabric Notebook: Step-by-Step
Step 1: Create an ML Model
Begin by creating a new ML model in the Data Science section of Microsoft Fabric.
You'll be prompted to name your model. Since we’re classifying images of pangolins and armadillos, I’ve named mine pangolinVsArmadillo.
Next, click on “Start with a new Notebook.”
Step 2 - Train Your Model
The following commands will download images of pangolins and armadillos, create a datablock, and train your ML model using mlflow and fastai. Add these to your new Notebook, each section as its own cell.
Install and Import Requirements:
!pip install fastbook
from fastbook import *
Download images of pangolins and armadillos using duck duck go:
#search and save images using duckduckgo (ddg)
searches = 'pangolin', 'armadillo'
path = Path('pangolin_or_not')
if not path.exists():
path.mkdir(exist_ok=True)
for o in searches:
dest = (path/o)
dest.mkdir(exist_ok=True)
results = search_images_ddg(f'{o} photo')
download_images(dest, urls=results[:200])
resize_images(dest, max_size=400, dest=dest)
Some warnings/errors will show up in the output of the above cell, they can be ignored.
Remove any failed downloads or files that aren’t valid images
#remove any bad images (images that can't be opened)
failed = verify_images(get_image_files(path))
failed.map(Path.unlink);
failed
Some warnings/errors will show up in the output of the above cell, but they can be ignored.
Create the fastai datablock, load in the downloaded pictures, and display 9 of them in the cell output.
#create your fastai datablock
dls = DataBlock(
blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(valid_pct=0.2, seed=42),
get_y=parent_label,
item_tfms=[Resize(192, method='squish')]
).dataloaders(path)
dls.show_batch(max_n=9)
Train Your Vision Model
In this case, we’re using resnet18 as our base model and training for only one epoch. Increasing the number of epochs and/or changing the base model can improve model accuracy.
#train and track your model using mlflow and fastai. For test/demo purposes, we'll only do a 1 epoch of training
import mlflow.fastai
from mlflow import MlflowClient
def print_auto_logged_info(r):
tags = {k: v for k, v in r.data.tags.items() if not k.startswith("mlflow.")}
artifacts = [f.path for f in MlflowClient().list_artifacts(r.info.run_id, "model")]
print(f"run_id: {r.info.run_id}")
print(f"artifacts: {artifacts}")
print(f"params: {r.data.params}")
print(f"metrics: {r.data.metrics}")
print(f"tags: {tags}")
def main(epochs=1):
model = vision_learner(dls, resnet18, metrics=error_rate)
# Enable auto logging
mlflow.fastai.autolog()
# Start MLflow session
with mlflow.start_run() as run:
#model.fit(epochs, learning_rate)
model.fine_tune(epochs)
# fetch the auto logged parameters, metrics, and artifacts
print_auto_logged_info(mlflow.get_run(run_id=run.info.run_id))
main()
After running all the above cells, you should see something like this as the output of the final cell:
⚠️ The accuracy of this model is only 86% (1 - 0.138888 * 100), adding more epochs or changing the base model will help improve this.
You should also see a new Experiment in your workspace (you may need to refresh your browser window):
Step 3 - Save your ML Model
Open the new experiment that’s been created in your workspace. Click on Save run as ML model
Click on “Select an existing ML model”, select the model you created and click Save.
Step 4 - Get the Model RunID
Open up the ML Model you created, expand Version 1, expand model, click on MLmodel and then copy the run id:
Step 5 - Load and Predict
Create a new notebook in your workspace. Add the following code to your notebook.
Install and import required modules and download a single image of a creature - could be a pangolin or an armadillo - it’s up to you.
!pip install fastbook
import mlflow
import mlflow.fastai
from fastbook import *
#search and save an image of a pangolin - can be changed to armadillo in the line below if you want
urls = search_images_ddg('pangolin photos', max_images=1)
len(urls),urls[0]
#save image as creature.jpg
dest = Path('creature.jpg')
if not dest.exists(): download_url(urls[0], dest, show_progress=False)
im = Image.open(dest)
im.to_thumb(256, 256)
Load the ML model we trained and saved.
#load the model via the runID
model = mlflow.fastai.load_model(f"runs:/[[enter your runID here]]/model")
❗To be honest, I’m not sure if the above is the correct way to load a saved model in Fabric. The “Apply this version” code that Fabric can auto-create didn’t work for me, and the above does allow me to make predictions from a separate Fabric notebook, so I’m going with this for now. If you know the correct way to load a saved model, please do let me know.
Run the predict function with the image we just downloaded to see if it’s a pangolin or an armadillo. Include the result variable to see how confident the model is with its prediction.
#use the loaded model to see if the image was a pangolin or armadillo
result = model.predict(PILImage.create('creature.jpg'))
print(f"It's a {result[0]}.")
print(result)
The result from the notebook should look like this:
Next Steps: Explore 'Fabric and Copilot' Recordings
Dive deeper into Microsoft Fabric and uncover the full spectrum of its capabilities. Explore how this powerful platform enables organizations to efficiently manage, analyze, and harness valuable data and insights.
By registering for the Copilot Virtual Briefing Sessions, you'll gain exclusive access to a wealth of information, including the recording of "Fabric and Copilot." Join Scott Sugar as he delves into the intricacies of Fabric, showcasing how Copilot can revolutionize your data analysis workflows and enhance your decision-making processes. Register now and unlock the true potential of Microsoft Fabric and elevate your data transformation journey to new heights.
Moreover, don't miss the exclusive Copilot Virtual Briefing Session every Thursday in July and August. These sessions are truly special, allowing you to delve into Copilot and Copilot for Microsoft 365 and experience persona-based demonstrations. Be sure to check your schedule, register for the sessions that align with your availability and interests, and gain access to the insightful recordings on Copilot for Security, AI Strategy for Businesses, and Copilot for Sales.
Conclusion
As we conclude this technical walkthrough, you’re now equipped with the knowledge to train and deploy a fastai vision model in Microsoft Fabric that can distinguish between pangolins and armadillos—or other creatures if you change the search terms.
This exploration of deep learning within Microsoft Fabric demonstrates that with the right tools and guidance, creating precise and efficient models is attainable. We hope this blog has illuminated the path for your own projects and inspired you to leverage fastai and Microsoft Fabric for your deep learning endeavors.
Unleash the Full Potential of Your Data Transformation
Ready to take your data transformation to the next level? ProServeIT can help! As a Microsoft Solutions Partner in Data & AI, our team of certified professionals is here to empower your organization to leverage the full potential of your data.
Contact ProServeIT today and schedule a consultation with our data specialists.
Tags:
Data & AnalyticsJuly 26, 2024
Comments