Skip to content

Commit 043c952

Browse files
committed
feat!: add File entity support and RO-Crate metadata endpoint
BREAKING CHANGE: Remove /entity/{id}/file/{fileId} endpoint in favour of /entity/{id}/file - Add http://pcdm.org/models#File entity type to support individual files - Remove /entity/{id}/file/{fileId} endpoint (breaking change) - Add /entity/{id}/file endpoint for accessing File entity content - Add /entity/{id}/crate endpoint to retrieve RO-Crate JSON-LD metadata - Add InvalidEntityTypeError schema for entity type validation - Update documentation with File entity examples and migration guide - Remove http://schema.org/MediaObject from EntityType enum The new /entity/{id}/file endpoint only works with File entities and returns a 400 error with INVALID_ENTITY_TYPE code for Collection or Object entities. The new /entity/{id}/crate endpoint works with all entity types and returns the complete RO-Crate metadata conforming to the RO-Crate specification.
1 parent f0b8a47 commit 043c952

File tree

4 files changed

+188
-39
lines changed

4 files changed

+188
-39
lines changed

docs/getting-started/overview.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,30 @@ curl -X POST https://data.ldaca.edu.au/api/search \
9999
}'
100100
```
101101

102+
### Access File Content
103+
104+
For File entities, you can directly access the file content:
105+
106+
```bash
107+
curl https://data.ldaca.edu.au/api/entity/https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001%2Ffile.wav/file
108+
```
109+
110+
This endpoint supports:
111+
- Content disposition (inline or attachment)
112+
- Custom filenames
113+
- HTTP range requests for partial content
114+
- Redirects to file storage locations
115+
116+
### Get RO-Crate Metadata
117+
118+
Retrieve the raw RO-Crate JSON-LD metadata for any entity:
119+
120+
```bash
121+
curl https://data.ldaca.edu.au/api/entity/https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001/crate
122+
```
123+
124+
This returns the complete RO-Crate metadata conforming to the RO-Crate specification.
125+
102126
## Understanding Responses
103127

104128
### Success Responses

docs/getting-started/use-cases.md

Lines changed: 66 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -96,31 +96,31 @@ curl -X POST https://data.ldaca.edu.au/api/search \
9696
}'
9797
```
9898

99-
## 3. Downloading Files from Entities
99+
## 3. Working with File Entities
100100

101-
### Getting File Information
101+
### Accessing File Content
102102

103-
First, get entity details to see available files:
103+
File entities (entityType: `http://pcdm.org/models#File`) can be accessed directly:
104104

105105
```bash
106-
curl "https://data.ldaca.edu.au/api/entity/https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001"
106+
# Direct file download
107+
curl "https://data.ldaca.edu.au/api/entity/https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001%2Frecording.wav/file" -o recording.wav
107108
```
108109

109-
### Downloading Files
110+
### Download as Attachment
110111

111-
Download a specific file:
112+
Force download with a custom filename:
112113

113114
```bash
114-
# Direct download
115-
wget "https://data.ldaca.edu.au/api/entity/https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001/file/recording.wav"
115+
curl "https://data.ldaca.edu.au/api/entity/https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001%2Frecording.wav/file?disposition=attachment&filename=my-recording.wav"
116116
```
117117

118-
### Getting Download URLs
118+
### Getting File Location
119119

120-
Instead of direct download, get the file location:
120+
Get the file location without downloading:
121121

122122
```bash
123-
curl "https://data.ldaca.edu.au/api/entity/https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001/file/recording.wav?noRedirect=true"
123+
curl "https://data.ldaca.edu.au/api/entity/https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001%2Frecording.wav/file?noRedirect=true"
124124
```
125125

126126
**Response:**
@@ -131,22 +131,74 @@ curl "https://data.ldaca.edu.au/api/entity/https%3A%2F%2Fcatalog.paradisec.org.a
131131
}
132132
```
133133

134-
## 4. Paginating Through Large Result Sets
134+
### Partial Content Download
135+
136+
Use HTTP range requests for streaming or resuming downloads:
137+
138+
```bash
139+
# Download first 1KB
140+
curl -H "Range: bytes=0-1023" \
141+
"https://data.ldaca.edu.au/api/entity/https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001%2Frecording.wav/file"
142+
```
143+
144+
## 4. Retrieving RO-Crate Metadata
145+
146+
### Get Complete RO-Crate JSON-LD
147+
148+
Access the raw RO-Crate metadata for any entity:
149+
150+
```bash
151+
curl "https://data.ldaca.edu.au/api/entity/https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001/crate"
152+
```
153+
154+
**Response:**
155+
156+
```json
157+
{
158+
"@context": "https://w3id.org/ro/crate/1.1/context",
159+
"@graph": [
160+
{
161+
"@id": "ro-crate-metadata.json",
162+
"@type": "CreativeWork",
163+
"conformsTo": {
164+
"@id": "https://w3id.org/ro/crate/1.1"
165+
},
166+
"about": {
167+
"@id": "./"
168+
}
169+
},
170+
{
171+
"@id": "./",
172+
"@type": "Dataset",
173+
"name": "Recordings of West Alor languages",
174+
"description": "A compilation of recordings featuring various West Alor languages"
175+
}
176+
]
177+
}
178+
```
179+
180+
This is useful for:
181+
- Validating RO-Crate compliance
182+
- Accessing extended metadata not exposed in the entity API
183+
- Archival and preservation workflows
184+
- Integration with RO-Crate tools
185+
186+
## 5. Paginating Through Large Result Sets
135187

136188
### Basic Pagination
137189

138190
```bash
139191
# First page
140192
curl "https://data.ldaca.edu.au/api/entities?limit=100&offset=0"
141193

142-
# Second page
194+
# Second page
143195
curl "https://data.ldaca.edu.au/api/entities?limit=100&offset=100"
144196

145197
# Third page
146198
curl "https://data.ldaca.edu.au/api/entities?limit=100&offset=200"
147199
```
148200

149-
## 5. Working with Communication Modes
201+
## 6. Working with Communication Modes
150202

151203
Language archives often categorise data by communication mode:
152204

docs/intro.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -54,8 +54,9 @@ engines.
5454
### 📊 **Rich Metadata**
5555

5656
- Complete RO-Crate entity information
57-
- Hierarchical collection browsing
58-
- File access and download
57+
- Hierarchical collection browsing (Collections, Objects, and Files)
58+
- File entity access and download
59+
- Access to raw RO-Crate metadata
5960
- Conformance to LDAC profiles
6061

6162
### 🚀 **Developer Friendly**
@@ -78,9 +79,10 @@ The RO-Crate API is already in use by several major research data repositories:
7879
### Entity Management
7980

8081
- List entities with filtering and pagination
81-
- Retrieve detailed entity information
82+
- Retrieve detailed entity information for Collections, Objects, and Files
8283
- Navigate collection hierarchies
83-
- Access files and media
84+
- Access file content directly through File entities
85+
- Retrieve raw RO-Crate JSON-LD metadata
8486

8587
### Advanced Search
8688

openapi.yaml

Lines changed: 92 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -179,33 +179,26 @@ paths:
179179
"500":
180180
$ref: "#/components/responses/InternalServerErrorResponse"
181181

182-
/entity/{id}/file/{fileId}:
182+
/entity/{id}/file:
183183
get:
184184
tags:
185185
- entities
186-
summary: Get a file
186+
summary: Get file content
187187
description: |
188188
### File Access
189-
Retrieve an individual file that is part of a given entity's RO-Crate. The file can be returned inline (e.g., displayed in the browser) or as an attachment for download (e.g., prompting a save dialog), based on the disposition.
190-
The API can serve the file or redirect to the location of the file
191-
operationId: getEntityOpen
189+
Retrieve the file content for an entity of type File. The file can be returned inline (e.g., displayed in the browser) or as an attachment for download (e.g., prompting a save dialog), based on the disposition parameter.
190+
The API can serve the file directly or redirect to the location of the file.
191+
192+
This endpoint only works for entities with entityType `http://pcdm.org/models#File`. For Collection or Object entities, a 400 error will be returned.
193+
operationId: getEntityFile
192194
parameters:
193195
- name: id
194196
in: path
195197
required: true
196-
description: The RO-Crate entity ID to which the file belongs.
197-
example: https://catalog.paradisec.org.au/repository/LRB/001
198+
description: The RO-Crate entity ID of the File entity.
199+
example: https://catalog.paradisec.org.au/repository/LRB/001/file.wav
198200
schema:
199201
$ref: "#/components/schemas/Id"
200-
- name: fileId
201-
in: path
202-
required: true
203-
description: The path or identifier of the file within the entity’s crate.
204-
example: filename.wav
205-
schema:
206-
type: string
207-
maxLength: 255
208-
pattern: '^[a-zA-Z0-9._\-/\s]+$'
209202
- name: disposition
210203
in: query
211204
description: The HTTP Content-Disposition for how the file should be handled by the client.
@@ -288,17 +281,17 @@ paths:
288281
"302":
289282
description: Redirects to file location
290283
"400":
291-
description: Bad Request - Invalid parameters
284+
description: Bad Request - Invalid parameters or entity is not a File type
292285
content:
293286
application/json:
294287
schema:
295-
$ref: "#/components/schemas/ValidationError"
288+
$ref: "#/components/schemas/InvalidEntityTypeError"
296289
"401":
297290
$ref: "#/components/responses/UnauthorizedResponse"
298291
"403":
299292
$ref: "#/components/responses/ForbiddenResponse"
300293
"404":
301-
description: Entity or file not found
294+
description: Entity not found
302295
content:
303296
application/json:
304297
schema:
@@ -321,6 +314,60 @@ paths:
321314
"500":
322315
$ref: "#/components/responses/InternalServerErrorResponse"
323316

317+
/entity/{id}/crate:
318+
get:
319+
tags:
320+
- entities
321+
summary: Get RO-Crate metadata
322+
description: |
323+
### RO-Crate Metadata
324+
Retrieve the complete RO-Crate JSON-LD metadata for an entity. This returns the raw RO-Crate representation, which includes all metadata conforming to the RO-Crate specification.
325+
326+
This endpoint works for any entity type (Collection, Object, or File) and returns the associated ro-crate-metadata.json content.
327+
operationId: getEntityCrate
328+
parameters:
329+
- name: id
330+
in: path
331+
required: true
332+
description: The RO-Crate entity ID.
333+
example: https://catalog.paradisec.org.au/repository/LRB/001
334+
schema:
335+
$ref: "#/components/schemas/Id"
336+
responses:
337+
"200":
338+
description: Returns the RO-Crate JSON-LD metadata
339+
content:
340+
application/ld+json:
341+
schema:
342+
type: object
343+
description: RO-Crate metadata conforming to the RO-Crate specification
344+
example:
345+
"@context": "https://w3id.org/ro/crate/1.1/context"
346+
"@graph":
347+
- "@id": "ro-crate-metadata.json"
348+
"@type": "CreativeWork"
349+
conformsTo:
350+
"@id": "https://w3id.org/ro/crate/1.1"
351+
about:
352+
"@id": "./"
353+
- "@id": "./"
354+
"@type": "Dataset"
355+
name: "Recordings of West Alor languages"
356+
"401":
357+
$ref: "#/components/responses/UnauthorizedResponse"
358+
"403":
359+
$ref: "#/components/responses/ForbiddenResponse"
360+
"404":
361+
description: Entity not found
362+
content:
363+
application/json:
364+
schema:
365+
$ref: "#/components/schemas/NotFoundError"
366+
"429":
367+
$ref: "#/components/responses/RateLimitResponse"
368+
"500":
369+
$ref: "#/components/responses/InternalServerErrorResponse"
370+
324371
/search:
325372
post:
326373
tags:
@@ -665,9 +712,9 @@ components:
665712
enum:
666713
- http://pcdm.org/models#Collection
667714
- http://pcdm.org/models#Object
668-
- http://schema.org/MediaObject
715+
- http://pcdm.org/models#File
669716
- http://schema.org/Person
670-
description: An enumerated type describing the nature of the entity, such as a collection or object
717+
description: An enumerated type describing the nature of the entity, such as a collection, object, or file
671718
Order:
672719
type: string
673720
enum:
@@ -841,6 +888,30 @@ components:
841888
message:
842889
example: The requested entity was not found
843890

891+
InvalidEntityTypeError:
892+
allOf:
893+
- $ref: "#/components/schemas/ErrorResponse"
894+
- type: object
895+
properties:
896+
error:
897+
type: object
898+
properties:
899+
code:
900+
enum: [INVALID_ENTITY_TYPE]
901+
message:
902+
example: This operation is only valid for File entities
903+
details:
904+
type: object
905+
properties:
906+
entityType:
907+
type: string
908+
description: The actual entity type of the requested entity
909+
example: http://pcdm.org/models#Collection
910+
expectedType:
911+
type: string
912+
description: The expected entity type for this operation
913+
example: http://pcdm.org/models#File
914+
844915
RateLimitError:
845916
allOf:
846917
- $ref: "#/components/schemas/ErrorResponse"

0 commit comments

Comments
 (0)