MarkTechPost@AI 03月28日 06:25
A Code Implementation of Monocular Depth Estimation Using Intel MiDaS Open Source Model on Google Colab with PyTorch and OpenCV
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍在计算机视觉中,利用Intel的MiDaS模型,在Google Colab上通过PyTorch、OpenCV等实现从单RGB图像预测场景深度的教程,包括安装库、克隆模型库、模型加载与评估、图像上传与处理、深度预测及结果可视化等步骤。

🎉利用Intel MiDaS模型进行单目深度估计,应用广泛。

💻在Google Colab上操作,使用PyTorch等库进行处理。

📁克隆模型库并设置计算设备,完成模型加载与评估。

📷用户上传图像,经预处理后进行深度预测与可视化。

Monocular depth estimation involves predicting scene depth from a single RGB image—a fundamental task in computer vision with wide-ranging applications, including augmented reality, robotics, and 3D scene understanding. In this tutorial, we implement Intel’s MiDaS (Monocular Depth Estimation via a Multi-Scale Vision Transformer), a state-of-the-art model designed for high-quality depth prediction from a single image. Leveraging Google Colab as the compute platform, along with PyTorch, OpenCV, and Matplotlib, this tutorial enables you to upload your image and visualize the corresponding depth maps easily.

!pip install -q timm opencv-python matplotlib

First, we install the necessary Python libraries—timm for model support, opencv-python for image processing, and matplotlib for visualizing the depth maps.

!git clone https://github.com/isl-org/MiDaS.git%cd MiDaS

Then, we clone the official Intel MiDaS repository from GitHub and navigate into its directory to access the model code and transformation utilities.

import torchimport cv2import matplotlib.pyplot as pltimport numpy as npfrom PIL import Imagefrom torchvision.transforms import Composefrom google.colab import filesfrom midas.dpt_depth import DPTDepthModelfrom midas.transforms import Resize, NormalizeImage, PrepareForNetdevice = torch.device("cuda" if torch.cuda.is_available() else "cpu")

We import all the necessary libraries and MiDaS components required for loading the model, preprocessing images, handling uploads, and visualizing depth predictions. Then we set the computation device to GPU (CUDA) if available; otherwise, it defaults to CPU, ensuring system compatibility.

model_path = torch.hub.load("intel-isl/MiDaS", "DPT_Large", pretrained=True, force_reload=True)model = model_path.to(device)model.eval()

Here, we download the pretrained MiDaS DPT_Large model from Intel’s torch.hub, moves it to the selected device (CPU or GPU), and sets it to evaluation mode for inference.

transform = Compose([    Resize(384, 384, resize_target=None, keep_aspect_ratio=True, ensure_multiple_of=32, resize_method="upper_bound"),    NormalizeImage(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),    PrepareForNet()])

We define MiDaS’s image preprocessing pipeline, which resizes the input image, normalizes its pixel values, and formats it appropriately for model inference.

uploaded = files.upload()for filename in uploaded:    img = cv2.imread(filename)    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)    break

We allow the user to upload an image in Colab, read it using OpenCV, and convert it from BGR to RGB format for accurate color representation.

img_input = transform({"image": img})["image"]input_tensor = torch.from_numpy(img_input).unsqueeze(0).to(device)with torch.no_grad():    prediction = model(input_tensor)    prediction = torch.nn.functional.interpolate(        prediction.unsqueeze(1),        size=img.shape[:2],        mode="bicubic",        align_corners=False,    ).squeeze()depth_map = prediction.cpu().numpy()

Now, we apply the preprocessing transform to the uploaded image, convert it to a tensor, perform depth prediction using the MiDaS model, resize the output to match the original image dimensions, and extract the final depth map as a NumPy array.

plt.figure(figsize=(10, 5))plt.subplot(1, 2, 1)plt.imshow(img)plt.title("Original Image")plt.axis("off")plt.subplot(1, 2, 2)plt.imshow(depth_map, cmap='inferno')plt.title("Depth Map")plt.axis("off")plt.tight_layout()plt.show()

Finally, we create a side-by-side visualization of the original image and its corresponding depth map using Matplotlib. The depth map is displayed using the ‘inferno’ colormap for better contrast.

In conclusion, by completing this tutorial, we’ve successfully deployed Intel’s MiDaS model on Google Colab to perform monocular depth estimation using just an RGB image. Using PyTorch for model inference, OpenCV for image processing, and Matplotlib for visualization, we’ve built a robust pipeline to generate high-quality depth maps with minimal setup. This implementation is a strong foundation for further exploration, including video depth estimation, real-time applications, and integration of AR/VR systems.


Here is the Colab Notebook. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 85k+ ML SubReddit.

The post A Code Implementation of Monocular Depth Estimation Using Intel MiDaS Open Source Model on Google Colab with PyTorch and OpenCV appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

单目深度估计 Intel MiDaS Google Colab PyTorch
相关文章