-
Notifications
You must be signed in to change notification settings - Fork 5
docs: estrutura Wiki de tutoriais e configuração de APIs de IA #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… de nuevas apps y aumento del límite de campos
…s, textos con idioma y manejo flexible de fechas
…ón de referencias
…s y ampliación de tipos soportados (confproc, full_text, etc.)
…istas de búsqueda, utilidades y hooks de Wagtail
…o, utilidades y hooks de Wagtail
…ones OMML a MathML
…es de inferencia, tareas y hooks de Wagtail
…s de procesamiento de datos
…ial.py y eliminación de migraciones intermedias
…n de Django y traducción de verbose_name a inglés
Corrige el tipo de excepción para responder 404 cuando el registro no existe.
…nlaces Reduce ruido en logs y mantiene la función enfocada a su retorno.
Mejora legibilidad y buenas prácticas de manejo de errores.
…a prompt de referencias Se agregan comillas a campos textuales y se corrigen comas/keys para evitar errores de parseo del prompt.
Permite traducción de 'Mixed Citation' y 'Rating from 1 to 10'.
…eference status' (incluye migraciones)
- function_llama passou a ser LlamaInputSettings em llama.py - generic_llama passou a ser llama.py com LlamaService
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request introduces a comprehensive document markup and XML generation system for processing DOCX files and managing references. The PR adds new applications (markup_doc and model_ai) with AI-powered metadata extraction, reference parsing, and XML/HTML generation capabilities. Key changes include renaming menu identifiers from xml_manager to xml_files and xml_manager admin group consolidation, adding new dependencies for AI processing (Google Generative AI, python-docx, langid), and implementing a complete workflow for converting DOCX documents to SciELO-compliant XML.
Key Changes
- Added
markup_docapp with DOCX processing, AI-based labeling, XML generation, and SciELO package creation - Added
model_aiapp for managing LLM models (Llama/Gemini) with download capabilities - Renamed XML manager menu from
xml_managertoxml_filesand consolidated menu structure - Added new package dependencies: google-generativeai, python-docx, and langid
Reviewed Changes
Copilot reviewed 59 out of 70 changed files in this pull request and generated 91 comments.
Show a summary per file
| File | Description |
|---|---|
| requirements/base.txt | Added AI processing dependencies (google-generativeai, langid, python-docx) |
| xml_manager/wagtail_hooks.py | Renamed menu identifiers and consolidated menu structure for XML management |
| reference/wagtail_hooks.py | Refactored import statements and renamed admin class with menu order adjustment |
| reference/models.py | Added ReferenceStatus enum and replaced estatus with status field |
| reference/marker.py | Updated imports to use new model_ai.llama module |
| reference/data_utils.py | Enhanced error handling and updated to use ReferenceStatus enum |
| model_ai/* | New app for managing AI models with Llama/Gemini integration |
| markup_doc/* | New app for DOCX processing, metadata extraction, and XML generation |
| markuplib/* | New library for DOCX processing and OMML to MathML conversion |
Comments suppressed due to low confidence (1)
markup_doc/sync_api.py:108
- Except block directly handles BaseException.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| 'uri': {'type': 'string'}, | ||
| 'access_date': {'type': 'string'}, | ||
| 'version': {'type': 'string'}, | ||
| "full_text": {"type": "integer"}, |
Copilot
AI
Oct 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type for 'full_text' should be 'string', not 'integer'. This field contains textual reference content, not numeric data.
| # FIXME: Hardcoded model name | ||
| model = genai.GenerativeModel('models/gemini-2.0-flash') |
Copilot
AI
Oct 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Gemini model name is hardcoded. Consider making this configurable through the LlamaModel database entry or environment variable to support different model versions and avoid requiring code changes for model updates.
| except: | ||
| print('**ERROR url') | ||
| print(url) | ||
| url = None |
Copilot
AI
Oct 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bare except clause catches all exceptions including SystemExit and KeyboardInterrupt. Use except Exception: instead and consider logging the actual exception for debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trocar print por logging e inserir uma mensagem mais descritiva do error.
| except Exception: | ||
| # si no hay match, dejarlo como está | ||
| pass |
Copilot
AI
Oct 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silent exception handling without logging makes debugging difficult. Consider logging the exception to help diagnose image lookup failures.
| }); | ||
|
|
||
| document.addEventListener("DOMContentLoaded", function () { | ||
| const journalInput = document.querySelector("#id_journal"); |
Copilot
AI
Oct 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused variable journalInput.
| } | ||
| stream_data.append(obj.copy()) | ||
|
|
||
| for i, auth in enumerate(output['authors']): |
Copilot
AI
Oct 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nested for statement uses loop variable 'i' of enclosing for statement.
| } | ||
| stream_data.append(obj.copy()) | ||
|
|
||
| for i, aff in enumerate(output['affiliations']): |
Copilot
AI
Oct 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nested for statement uses loop variable 'i' of enclosing for statement.
| else: | ||
| break | ||
|
|
||
| for i, val in enumerate(vals[1:], start=1): |
Copilot
AI
Oct 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nested for statement uses loop variable 'i' of enclosing for statement.
| and b.value.get('label') == '<kwd-group>' | ||
| ] | ||
|
|
||
| for i, val in enumerate(vals): |
Copilot
AI
Oct 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nested for statement uses loop variable 'i' of enclosing for statement.
| ) | ||
|
|
||
| # Respuesta HTTP | ||
| with open(zip_path, "rb") as fp: |
Copilot
AI
Oct 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File may not be closed if an exception is raised.
- Adiciona scielo_xml_tools.yml com novos caminhos de volume - Move volumes para estrutura ../markup_data/ - Corrige nomes de containers no Makefile (markapi_local_*) - Adiciona .ipython/ ao .dockerignore - Adiciona huggingface-hub ao requirements/local.txt - Atualiza .gitignore para ignorar backups e arquivos temporários
O que esse PR faz?
Este PR implementa melhorias em documentação e configuração do sistema:
Documentação estruturada via Wiki:
Configuração de APIs de IA:
Padronização de código:
xml_manager/models.pyOnde a revisão poderia começar?
Documentação Wiki:
Arquivos de código:
/.envs/.local/.djangoLLAMA_ENABLED=True/xml_manager/models.pyXMLDocument,XMLDocumentPDF,XMLDocumentHTMLcreate()reformatadosComo este poderia ser testado manualmente?
Documentação:
Código:
python manage.py test xml_managerAlgum cenário de contexto que queira dar?
Documentação:
Quais são tickets relevantes?
NA
Referências