MarkTechPost@AI 01月22日
Create Portrait Mode Effect with Segment Anything Model 2 (SAM2)
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了如何使用开源计算机视觉模型,如Meta的SAM2和Intel ISL的MiDaS,来程序化地实现智能手机相机中的“人像模式”效果。该效果通过模拟单反相机的浅景深,将主要拍摄对象从背景中分离出来,并对背景进行模糊处理。文章详细阐述了实现这一效果的步骤,包括使用SAM2模型分割前景和背景,使用MiDaS模型计算深度图,以及应用基于深度的模糊效果。此外,还提供了代码示例和工具依赖,方便读者进行实践和扩展。

🧰 使用开源工具:利用Meta的SAM2模型进行对象分割,将前景与背景分离;Intel ISL的MiDaS模型用于计算深度图,实现基于深度的模糊。

🖼️ 图像处理流程:首先加载目标图像,然后使用SAM2模型选定主体,生成二值掩码;接着使用MiDaS模型生成深度图,并进行反转处理;最后,应用迭代高斯模糊,根据深度值调整模糊程度。

✨ 效果合成与优化:通过SAM2模型生成的掩码提取清晰前景,并与模糊的背景结合,最终实现人像模式效果;文章还提出了未来改进方向,如使用边缘检测算法优化主体边缘,调整模糊内核大小等。

Have you ever admired how smartphone cameras isolate the main subject from the background, adding a subtle blur to the background based on depth? This “portrait mode” effect gives photographs a professional look by simulating shallow depth-of-field similar to DSLR cameras. In this tutorial, we’ll recreate this effect programmatically using open-source computer vision models, like SAM2 from Meta and MiDaS from Intel ISL.

Tools and Frameworks

To build our pipeline, we’ll use:

    Segment Anything Model (SAM2): To segment objects of interest and separate the foreground from the background.Depth Estimation Model: To compute a depth map, enabling depth-based blurring.Gaussian Blur: To blur the background with intensity varying based on depth.

Step 1: Setting Up the Environment

To get started, install the following dependencies:

pip install matplotlib samv2 pytest opencv-python timm pillow

Step 2: Loading a Target Image

Choose a picture to apply this effect and load it into Python using the Pillow library.

from PIL import Imageimport numpy as npimport matplotlib.pyplot as pltimage_path = "<path to your image>.jpg"img = Image.open(image_path)img_array = np.array(img)# Display the imageplt.imshow(img)plt.axis("off")plt.show()

Step 3: Initialize the SAM2

To initialize the model, download the pretrained checkpoint. SAM2 offers four variants based on performance and inference speed: tiny, small, base_plus, and large. In this tutorial, we’ll use tiny for faster inference.

Download the model checkpoint from: https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_&lt;model_type&gt;.pt

Replace <model_type> with your desired model type.

from sam2.build_sam import build_sam2from sam2.sam2_image_predictor import SAM2ImagePredictorfrom sam2.utils.misc import variant_to_config_mappingfrom sam2.utils.visualization import show_masksmodel = build_sam2(    variant_to_config_mapping["tiny"],    "sam2_hiera_tiny.pt",)image_predictor = SAM2ImagePredictor(model)

Step 4: Feed Image into SAM and Select the Subject

Set the image in SAM and provide points that lie on the subject you want to isolate. SAM predicts a binary mask of the subject and background.

image_predictor.set_image(img_array)input_point = np.array([[2500, 1200], [2500, 1500], [2500, 2000]])input_label = np.array([1, 1, 1])masks, scores, logits = image_predictor.predict(    point_coords=input_point,    point_labels=input_label,    box=None,    multimask_output=True,)output_mask = show_masks(img_array, masks, scores)sorted_ind = np.argsort(scores)[::-1]

Step 5: Initialize the Depth Estimation Model

For depth estimation, we use MiDaS by Intel ISL. Similar to SAM, you can choose different variants based on accuracy and speed.Note: The predicted depth map is reversed, meaning larger values correspond to closer objects. We’ll invert it in the next step for better intuitiveness.

import torchimport torchvision.transforms as transformsmodel_type = "DPT_Large"  # MiDaS v3 - Large (highest accuracy)# Load MiDaS modelmodel = torch.hub.load("intel-isl/MiDaS", model_type)model.eval()# Load and preprocess imagetransform = torch.hub.load("intel-isl/MiDaS", "transforms").dpt_transforminput_batch = transform(img_array)# Perform depth estimationwith torch.no_grad():    prediction = model(input_batch)    prediction = torch.nn.functional.interpolate(        prediction.unsqueeze(1),        size=img_array.shape[:2],        mode="bicubic",        align_corners=False,    ).squeeze()prediction = prediction.cpu().numpy()# Visualize the depth mapplt.imshow(prediction, cmap="plasma")plt.colorbar(label="Relative Depth")plt.title("Depth Map Visualization")plt.show()

Step 6: Apply Depth-Based Gaussian Blur

Here we optimize the depth-based blurring using an iterative Gaussian blur approach. Instead of applying a single large kernel, we apply a smaller kernel multiple times for pixels with higher depth values.

import cv2def apply_depth_based_blur_iterative(image, depth_map, base_kernel_size=7, max_repeats=10):    if base_kernel_size % 2 == 0:        base_kernel_size += 1    # Invert depth map    depth_map = np.max(depth_map) - depth_map    # Normalize depth to range [0, max_repeats]    depth_normalized = cv2.normalize(depth_map, None, 0, max_repeats, cv2.NORM_MINMAX).astype(np.uint8)    blurred_image = image.copy()    for repeat in range(1, max_repeats + 1):        mask = (depth_normalized == repeat)        if np.any(mask):            blurred_temp = cv2.GaussianBlur(blurred_image, (base_kernel_size, base_kernel_size), 0)            for c in range(image.shape[2]):                blurred_image[..., c][mask] = blurred_temp[..., c][mask]    return blurred_imageblurred_image = apply_depth_based_blur_iterative(img_array, prediction, base_kernel_size=35, max_repeats=20)# Visualize the resultplt.figure(figsize=(20, 10))plt.subplot(1, 2, 1)plt.imshow(img)plt.title("Original Image")plt.axis("off")plt.subplot(1, 2, 2)plt.imshow(blurred_image)plt.title("Depth-based Blurred Image")plt.axis("off")plt.show()

Step 7: Combine Foreground and Background

Finally, use the SAM mask to extract the sharp foreground and combine it with the blurred background.

def combine_foreground_background(foreground, background, mask):    if mask.ndim == 2:        mask = np.expand_dims(mask, axis=-1)    return np.where(mask, foreground, background)mask = masks[sorted_ind[0]].astype(np.uint8)mask = cv2.resize(mask, (img_array.shape[1], img_array.shape[0]))foreground = img_arraybackground = blurred_imagecombined_image = combine_foreground_background(foreground, background, mask)plt.figure(figsize=(20, 10))plt.subplot(1, 2, 1)plt.imshow(img)plt.title("Original Image")plt.axis("off")plt.subplot(1, 2, 2)plt.imshow(combined_image)plt.title("Final Portrait Mode Effect")plt.axis("off")plt.show()

Conclusion

With just a few tools, we’ve recreated the portrait mode effect programmatically. This technique can be extended for photo editing applications, simulating camera effects, or creative projects.

Future Enhancements:

    Use edge detection algorithms for better refinement of subject edges.Experiment with kernel sizes to enhance the blur effect.Create a user interface to upload images and select subjects dynamically.

Resources:

    Segment anything model by META (https://github.com/facebookresearch/sam2)CPU compatible implementation of SAM 2 (https://github.com/SauravMaheshkar/samv2/tree/main)MIDas Depth Estimation Model ( https://pytorch.org/hub/intelisl_midas_v2/)

The post Create Portrait Mode Effect with Segment Anything Model 2 (SAM2) appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

SAM2 MiDaS 人像模式 深度估计 图像处理
相关文章