Skip to content

Commit e8645cd

Browse files
authored
Merge pull request #6 from Language-Research-Technology/refactor-rocrate-file
feat!: separate file access from entity endpoints and add RO-Crate metadata endpoint
2 parents 630dada + 2ce371d commit e8645cd

File tree

5 files changed

+564
-148
lines changed

5 files changed

+564
-148
lines changed

docs/getting-started/authorization.md

Lines changed: 29 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,11 @@ is accessible and redirect to enrollment URLs when access is restricted.
1212

1313
## Access Property Structure
1414

15-
Each entity in the API includes an `access` object with the following
16-
structure:
15+
The API uses an `access` property to communicate access permissions. The structure differs between **Entities** and **Files**:
16+
17+
### Entity Access Structure
18+
19+
Entities (Collections, Objects, and MediaObjects) include both metadata and content access controls:
1720

1821
```json
1922
{
@@ -26,19 +29,33 @@ structure:
2629
}
2730
```
2831

29-
### Required Fields
32+
**Required Fields:**
33+
- **`metadata`** (boolean): Whether the current user has access to view the entity's metadata
34+
- **`content`** (boolean): Whether the current user has access to download or view the entity's content files
35+
36+
**Optional Fields:**
37+
- **`metadataAuthorizationUrl`** (string): URL where users can request access to metadata when `metadata` is `false`
38+
- **`contentAuthorizationUrl`** (string): URL where users can request access to content when `content` is `false`
39+
40+
### File Access Structure
41+
42+
Files (accessed via `/files` endpoints) only include content access controls, as file metadata is always accessible:
43+
44+
```json
45+
{
46+
"access": {
47+
"content": true
48+
}
49+
}
50+
```
3051

31-
- **`metadata`** (boolean): Whether the current user has access to view the
32-
entity's metadata
33-
- **`content`** (boolean): Whether the current user has access to download or
34-
view the entity's content files
52+
**Required Fields:**
53+
- **`content`** (boolean): Whether the current user has access to download the file
3554

36-
### Optional Fields
55+
**Optional Fields:**
56+
- **`contentAuthorizationUrl`** (string): URL where users can request access to content when `content` is `false`
3757

38-
- **`metadataAuthorizationUrl`** (string): URL where users can request access
39-
to metadata when `metadata` is `false`
40-
- **`contentAuthorizationUrl`** (string): URL where users can request access to
41-
content when `content` is `false`
58+
**Note:** All examples in the sections below demonstrate Entity access patterns. For Files, only the content-related fields apply
4259

4360
## Authorization Rules
4461

docs/getting-started/overview.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,47 @@ curl -X POST https://data.ldaca.edu.au/api/search \
9999
}'
100100
```
101101

102+
### Get RO-Crate Metadata
103+
104+
Retrieve the raw RO-Crate JSON-LD metadata for any entity:
105+
106+
```bash
107+
curl https://data.ldaca.edu.au/api/entity/https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001/crate
108+
```
109+
110+
This returns the complete RO-Crate metadata conforming to the RO-Crate specification.
111+
112+
### List Files
113+
114+
List all files in the repository:
115+
116+
```bash
117+
curl https://data.ldaca.edu.au/api/files
118+
```
119+
120+
You can filter files by memberOf to show files attached to a specific entity:
121+
122+
```bash
123+
curl https://data.ldaca.edu.au/api/files?memberOf=https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001
124+
```
125+
126+
**Note**: The `/files` endpoint returns files from the repository's file system. Not all files are represented as RO-Crate entities. To list MediaObject entities (files that are part of the RO-Crate), use `/entities?entityType=http://schema.org/MediaObject`.
127+
128+
### Access File Content
129+
130+
For MediaObject entities, you can directly access the file content:
131+
132+
```bash
133+
curl https://data.ldaca.edu.au/api/file/https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001%2Ffile.wav
134+
```
135+
136+
This endpoint supports:
137+
138+
- Content disposition (inline or attachment)
139+
- Custom filenames
140+
- HTTP range requests for partial content
141+
- Redirects to file storage locations
142+
102143
## Understanding Responses
103144

104145
### Success Responses

docs/getting-started/use-cases.md

Lines changed: 109 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -96,31 +96,116 @@ curl -X POST https://data.ldaca.edu.au/api/search \
9696
}'
9797
```
9898

99-
## 3. Downloading Files from Entities
99+
## 3. Retrieving RO-Crate Metadata
100100

101-
### Getting File Information
101+
### Get Complete RO-Crate JSON-LD
102102

103-
First, get entity details to see available files:
103+
Access the raw RO-Crate metadata for any entity:
104104

105105
```bash
106-
curl "https://data.ldaca.edu.au/api/entity/https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001"
106+
curl "https://data.ldaca.edu.au/api/entity/https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001/crate"
107107
```
108108

109-
### Downloading Files
109+
**Response:**
110+
111+
```json
112+
{
113+
"@context": "https://w3id.org/ro/crate/1.1/context",
114+
"@graph": [
115+
{
116+
"@id": "ro-crate-metadata.json",
117+
"@type": "CreativeWork",
118+
"conformsTo": {
119+
"@id": "https://w3id.org/ro/crate/1.1"
120+
},
121+
"about": {
122+
"@id": "./"
123+
}
124+
},
125+
{
126+
"@id": "./",
127+
"@type": "Dataset",
128+
"name": "Recordings of West Alor languages",
129+
"description": "A compilation of recordings featuring various West Alor languages"
130+
}
131+
]
132+
}
133+
```
134+
135+
This is useful for:
136+
137+
- Validating RO-Crate compliance
138+
- Accessing extended metadata not exposed in the entity API
139+
- Archival and preservation workflows
140+
- Integration with RO-Crate tools
110141

111-
Download a specific file:
142+
## 4. Working with Files
143+
144+
### Understanding Entities vs Files
145+
146+
The API provides two ways to work with files:
147+
148+
- **`/entities`** - Returns RO-Crate entities including MediaObjects (files that are part of the RO-Crate metadata)
149+
- **`/files`** - Returns files from the repository's file system
150+
151+
**Important**: Not all files are represented as RO-Crate entities. MediaObject entities are typically a subset of all files in the repository.
152+
153+
### Listing Files from the File System
154+
155+
List all files in the repository:
112156

113157
```bash
114-
# Direct download
115-
wget "https://data.ldaca.edu.au/api/entity/https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001/file/recording.wav"
158+
# List all files
159+
curl "https://data.ldaca.edu.au/api/files"
160+
161+
# List files attached to a specific entity
162+
curl "https://data.ldaca.edu.au/api/files?memberOf=https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001"
163+
164+
# Paginate through files
165+
curl "https://data.ldaca.edu.au/api/files?limit=50&offset=0"
116166
```
117167

118-
### Getting Download URLs
168+
### Listing MediaObject Entities
119169

120-
Instead of direct download, get the file location:
170+
List files that are part of the RO-Crate:
121171

122172
```bash
123-
curl "https://data.ldaca.edu.au/api/entity/https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001/file/recording.wav?noRedirect=true"
173+
curl "https://data.ldaca.edu.au/api/entities?entityType=http://schema.org/MediaObject"
174+
```
175+
176+
MediaObject entities include a `fileId` field that references the file in the `/files` endpoint:
177+
178+
```json
179+
{
180+
"id": "https://catalog.paradisec.org.au/repository/LRB/001/recording.wav",
181+
"name": "recording.wav",
182+
"entityType": "http://schema.org/MediaObject",
183+
"fileId": "https://catalog.paradisec.org.au/repository/LRB/001/recording.wav",
184+
...
185+
}
186+
```
187+
188+
### Accessing File Content
189+
190+
```bash
191+
# Direct file download
192+
curl "https://data.ldaca.edu.au/api/file/https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001%2Frecording.wav" -o recording.wav
193+
```
194+
195+
### Download as Attachment
196+
197+
Force download with a custom filename:
198+
199+
```bash
200+
curl "https://data.ldaca.edu.au/api/file/https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001%2Frecording.wav?disposition=attachment&filename=my-recording.wav"
201+
```
202+
203+
### Getting File Location
204+
205+
Get the file location without downloading:
206+
207+
```bash
208+
curl "https://data.ldaca.edu.au/api/file/https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001%2Frecording.wav?noRedirect=true"
124209
```
125210

126211
**Response:**
@@ -131,22 +216,32 @@ curl "https://data.ldaca.edu.au/api/entity/https%3A%2F%2Fcatalog.paradisec.org.a
131216
}
132217
```
133218

134-
## 4. Paginating Through Large Result Sets
219+
### Partial Content Download
220+
221+
Use HTTP range requests for streaming or resuming downloads:
222+
223+
```bash
224+
# Download first 1KB
225+
curl -H "Range: bytes=0-1023" \
226+
"https://data.ldaca.edu.au/api/file/https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001%2Frecording.wav"
227+
```
228+
229+
## 5. Paginating Through Large Result Sets
135230

136231
### Basic Pagination
137232

138233
```bash
139234
# First page
140235
curl "https://data.ldaca.edu.au/api/entities?limit=100&offset=0"
141236

142-
# Second page
237+
# Second page
143238
curl "https://data.ldaca.edu.au/api/entities?limit=100&offset=100"
144239

145240
# Third page
146241
curl "https://data.ldaca.edu.au/api/entities?limit=100&offset=200"
147242
```
148243

149-
## 5. Working with Communication Modes
244+
## 6. Working with Communication Modes
150245

151246
Language archives often categorise data by communication mode:
152247

docs/intro.md

Lines changed: 30 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,8 @@ engines.
5454
### 📊 **Rich Metadata**
5555

5656
- Complete RO-Crate entity information
57-
- Hierarchical collection browsing
57+
- Hierarchical collection browsing (Collections, Objects, and MediaObjects)
58+
- Access to raw RO-Crate metadata
5859
- File access and download
5960
- Conformance to LDAC profiles
6061

@@ -78,9 +79,11 @@ The RO-Crate API is already in use by several major research data repositories:
7879
### Entity Management
7980

8081
- List entities with filtering and pagination
81-
- Retrieve detailed entity information
82+
- List files with filtering and pagination
83+
- Retrieve detailed entity information for Collections, Objects, and MediaObjects
8284
- Navigate collection hierarchies
83-
- Access files and media
85+
- Access file content directly
86+
- Retrieve raw RO-Crate JSON-LD metadata
8487

8588
### Advanced Search
8689

@@ -97,6 +100,30 @@ The RO-Crate API is already in use by several major research data repositories:
97100
- Configurable content disposition
98101
- Location-based redirects for distributed storage
99102

103+
## Understanding Entities vs Files
104+
105+
The API provides two complementary ways to access content:
106+
107+
### `/entities` Endpoints
108+
109+
The `/entities` endpoints return RO-Crate entities, which can be:
110+
- **Collections** - Groups of related items
111+
- **Objects** - Individual items that may contain files
112+
- **MediaObjects** - Individual files that are part of the RO-Crate
113+
114+
MediaObject entities include a `fileId` field that can be used with the `/files` endpoints.
115+
116+
### `/files` Endpoints
117+
118+
The `/files` endpoints return files from the repository's file system.
119+
120+
**Important**: In a repository, not all files are necessarily represented as file entities in an RO-Crate. Therefore:
121+
- MediaObject entities will typically be a **subset** of all files
122+
- Some files may be accessible via `/files` but not appear in `/entities`
123+
- Files that are part of the RO-Crate metadata will have corresponding MediaObject entities
124+
125+
Use `/entities` when you need RO-Crate metadata about files, and `/files` when you need to browse or access the repository's file system directly.
126+
100127
## Getting Started
101128

102129
Choose your path based on your needs:

0 commit comments

Comments
 (0)