Skip to content

Code repository for paper: "G3: An Effective and Adaptive Framework for Worldwide Geolocalization Using Large Multi-Modality Models"

License

Notifications You must be signed in to change notification settings

Applied-Machine-Learning-Lab/G3

Repository files navigation

This is the code repository for paper "G3: An Effective and Adaptive Framework for Worldwide Geolocalization Using Large Multi-Modality Models"

MP16-Pro

You can download the images and metadata of MP16-Pro from huggingface: Jia-py/MP16-Pro

Data

IM2GPS3K: images | metadata

YFCC4K: images | metadata

Checkpoint

You can download the checkpoints and retrieval index from Jia-py/G3-checkpoint

Environment Setting

# test on cuda12.0
conda create -n g3 python=3.9
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu121
pip install transformers accelerate huggingface_hub pandas

If there are any issues with transformers, you may try transformers==4.42.0.

Quick Use

Similarity between GPS and Images

import torch
import numpy as np
from PIL import Image
from utils.G3 import G3

image_path = 'xxx'
gps_data = [[10,20],[0,0]] # [[latitude1, longitude1],[latitude2, longitude2]]
device = 'cuda'

model = G3(device).to(device)
model.load_state_dict(torch.load('/checkpoints/g3.pth'))
image = Image.open(image_path).convert('RGB')
image = model.vision_processor(images=image, return_tensors='pt')['pixel_values'].reshape(3,224,224)

images = image.reshape(1,3,224,224) # pretend as a batch

images = images.to(device) # b,3,224,224
image_embeds = model.vision_projection_else_2(model.vision_projection(model.vision_model(images)[1]))
image_embeds = image_embeds / image_embeds.norm(p=2, dim=-1, keepdim=True) # b, 768

gps_batch = torch.tensor(gps_data).reshape(1,2,2)
gps_batch = gps_batch.to(device) # b,n,2; n is the number of candidates
gps_input = gps_batch.clone().detach()
b, c, _ = gps_input.shape
gps_input = gps_input.reshape(b*c, 2)
location_embeds = model.location_encoder(gps_input)
location_embeds = model.location_projection_else(location_embeds.reshape(b*c, -1))
location_embeds = location_embeds / location_embeds.norm(p=2, dim=-1, keepdim=True)
location_embeds = location_embeds.reshape(b, c, -1) #  b, c, 768

similarity = torch.matmul(image_embeds.unsqueeze(1), location_embeds.permute(0, 2, 1)) # b, 1, c
similarity = similarity.squeeze(1).cpu().detach().numpy()
max_idxs = np.argmax(similarity, axis=1)
print('similarity:', similarity)
# similarity: [[0.05875633 0.10544068]]

This code can be easily adapted to calculate the similarity between text and images. Please check the source code in G3. The vision_projection_else_2 layer should be modified accordingly.

Running samples

  1. Geo-alignment

You can run python run_G3.py to train the model.

  1. Geo-diversification

First, you need to build the index file using python IndexSearch.py.

Parameters in IndexSearch.py

  • index name --> which model you want to use for embedding
  • dataset --> im2gps3k or yfcc4k
  • database --> default mp16

Then, you also need to construct index for negative samples by modifying images_embeds to -1 * images_embeds

Then, you can run llm_predict_hf.py or llm_predict.py to generate llm predictions.

After that, running aggregate_llm_predictions.py to aggregate the predictions.

  1. Geo-verification

python IndexSearch.py --index=g3 --dataset=im2gps3k or yfcc4k to verificate predictions and evaluate.

Citation

@article{jia2024g3,
  title={G3: an effective and adaptive framework for worldwide geolocalization using large multi-modality models},
  author={Jia, Pengyue and Liu, Yiding and Li, Xiaopeng and Zhao, Xiangyu and Wang, Yuhao and Du, Yantong and Han, Xiao and Wei, Xuetao and Wang, Shuaiqiang and Yin, Dawei},
  journal={Advances in Neural Information Processing Systems},
  volume={37},
  pages={53198--53221},
  year={2024}
}

About

Code repository for paper: "G3: An Effective and Adaptive Framework for Worldwide Geolocalization Using Large Multi-Modality Models"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages