MP16-Pro

This is the code repository for paper "G3: An Effective and Adaptive Framework for Worldwide Geolocalization Using Large Multi-Modality Models"

MP16-Pro

You can download the images and metadata of MP16-Pro from huggingface: Jia-py/MP16-Pro

Data

IM2GPS3K: images | metadata

YFCC4K: images | metadata

Checkpoint

You can download the checkpoints and retrieval index from Jia-py/G3-checkpoint

Environment Setting

# test on cuda12.0
conda create -n g3 python=3.9
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu121
pip install transformers accelerate huggingface_hub pandas

If there are any issues with transformers, you may try transformers==4.42.0.

Quick Use

Similarity between GPS and Images

import torch
import numpy as np
from PIL import Image
from utils.G3 import G3

image_path = 'xxx'
gps_data = [[10,20],[0,0]] # [[latitude1, longitude1],[latitude2, longitude2]]
device = 'cuda'

model = G3(device).to(device)
model.load_state_dict(torch.load('/checkpoints/g3.pth'))
image = Image.open(image_path).convert('RGB')
image = model.vision_processor(images=image, return_tensors='pt')['pixel_values'].reshape(3,224,224)

images = image.reshape(1,3,224,224) # pretend as a batch

images = images.to(device) # b,3,224,224
image_embeds = model.vision_projection_else_2(model.vision_projection(model.vision_model(images)[1]))
image_embeds = image_embeds / image_embeds.norm(p=2, dim=-1, keepdim=True) # b, 768

gps_batch = torch.tensor(gps_data).reshape(1,2,2)
gps_batch = gps_batch.to(device) # b,n,2; n is the number of candidates
gps_input = gps_batch.clone().detach()
b, c, _ = gps_input.shape
gps_input = gps_input.reshape(b*c, 2)
location_embeds = model.location_encoder(gps_input)
location_embeds = model.location_projection_else(location_embeds.reshape(b*c, -1))
location_embeds = location_embeds / location_embeds.norm(p=2, dim=-1, keepdim=True)
location_embeds = location_embeds.reshape(b, c, -1) #  b, c, 768

similarity = torch.matmul(image_embeds.unsqueeze(1), location_embeds.permute(0, 2, 1)) # b, 1, c
similarity = similarity.squeeze(1).cpu().detach().numpy()
max_idxs = np.argmax(similarity, axis=1)
print('similarity:', similarity)
# similarity: [[0.05875633 0.10544068]]

This code can be easily adapted to calculate the similarity between text and images. Please check the source code in G3. The vision_projection_else_2 layer should be modified accordingly.

Running samples

Geo-alignment

You can run python run_G3.py to train the model.

Geo-diversification

First, you need to build the index file using python IndexSearch.py.

Parameters in IndexSearch.py

index name --> which model you want to use for embedding
dataset --> im2gps3k or yfcc4k
database --> default mp16

Then, you also need to construct index for negative samples by modifying images_embeds to -1 * images_embeds

Then, you can run llm_predict_hf.py or llm_predict.py to generate llm predictions.

After that, running aggregate_llm_predictions.py to aggregate the predictions.

Geo-verification

python IndexSearch.py --index=g3 --dataset=im2gps3k or yfcc4k to verificate predictions and evaluate.

Citation

@article{jia2024g3,
  title={G3: an effective and adaptive framework for worldwide geolocalization using large multi-modality models},
  author={Jia, Pengyue and Liu, Yiding and Li, Xiaopeng and Zhao, Xiangyu and Wang, Yuhao and Du, Yantong and Han, Xiao and Wei, Xuetao and Wang, Shuaiqiang and Yin, Dawei},
  journal={Advances in Neural Information Processing Systems},
  volume={37},
  pages={53198--53221},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MP16-Pro

Data

Checkpoint

Environment Setting

Quick Use

Similarity between GPS and Images

Running samples

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
index		index
utils		utils
IndexSearch.py		IndexSearch.py
LICENSE		LICENSE
README.md		README.md
aggregate_llm_predictions.py		aggregate_llm_predictions.py
llm_predict.py		llm_predict.py
llm_predict_hf.py		llm_predict_hf.py
quick_use.py		quick_use.py
run_G3.py		run_G3.py

License

Applied-Machine-Learning-Lab/G3

Folders and files

Latest commit

History

Repository files navigation

MP16-Pro

Data

Checkpoint

Environment Setting

Quick Use

Similarity between GPS and Images

Running samples

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages