Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 29 additions & 12 deletions docs/getting-started/authorization.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,11 @@ is accessible and redirect to enrollment URLs when access is restricted.

## Access Property Structure

Each entity in the API includes an `access` object with the following
structure:
The API uses an `access` property to communicate access permissions. The structure differs between **Entities** and **Files**:

### Entity Access Structure

Entities (Collections, Objects, and MediaObjects) include both metadata and content access controls:

```json
{
Expand All @@ -26,19 +29,33 @@ structure:
}
```

### Required Fields
**Required Fields:**
- **`metadata`** (boolean): Whether the current user has access to view the entity's metadata
- **`content`** (boolean): Whether the current user has access to download or view the entity's content files

**Optional Fields:**
- **`metadataAuthorizationUrl`** (string): URL where users can request access to metadata when `metadata` is `false`
- **`contentAuthorizationUrl`** (string): URL where users can request access to content when `content` is `false`

### File Access Structure

Files (accessed via `/files` endpoints) only include content access controls, as file metadata is always accessible:

```json
{
"access": {
"content": true
}
}
```

- **`metadata`** (boolean): Whether the current user has access to view the
entity's metadata
- **`content`** (boolean): Whether the current user has access to download or
view the entity's content files
**Required Fields:**
- **`content`** (boolean): Whether the current user has access to download the file

### Optional Fields
**Optional Fields:**
- **`contentAuthorizationUrl`** (string): URL where users can request access to content when `content` is `false`

- **`metadataAuthorizationUrl`** (string): URL where users can request access
to metadata when `metadata` is `false`
- **`contentAuthorizationUrl`** (string): URL where users can request access to
content when `content` is `false`
**Note:** All examples in the sections below demonstrate Entity access patterns. For Files, only the content-related fields apply

## Authorization Rules

Expand Down
41 changes: 41 additions & 0 deletions docs/getting-started/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,47 @@ curl -X POST https://data.ldaca.edu.au/api/search \
}'
```

### Get RO-Crate Metadata

Retrieve the raw RO-Crate JSON-LD metadata for any entity:

```bash
curl https://data.ldaca.edu.au/api/entity/https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001/crate
```

This returns the complete RO-Crate metadata conforming to the RO-Crate specification.

### List Files

List all files in the repository:

```bash
curl https://data.ldaca.edu.au/api/files
```

You can filter files by memberOf to show files attached to a specific entity:

```bash
curl https://data.ldaca.edu.au/api/files?memberOf=https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001
```

**Note**: The `/files` endpoint returns files from the repository's file system. Not all files are represented as RO-Crate entities. To list MediaObject entities (files that are part of the RO-Crate), use `/entities?entityType=http://schema.org/MediaObject`.

### Access File Content

For MediaObject entities, you can directly access the file content:

```bash
curl https://data.ldaca.edu.au/api/file/https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001%2Ffile.wav
```

This endpoint supports:

- Content disposition (inline or attachment)
- Custom filenames
- HTTP range requests for partial content
- Redirects to file storage locations

## Understanding Responses

### Success Responses
Expand Down
123 changes: 109 additions & 14 deletions docs/getting-started/use-cases.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,31 +96,116 @@ curl -X POST https://data.ldaca.edu.au/api/search \
}'
```

## 3. Downloading Files from Entities
## 3. Retrieving RO-Crate Metadata

### Getting File Information
### Get Complete RO-Crate JSON-LD

First, get entity details to see available files:
Access the raw RO-Crate metadata for any entity:

```bash
curl "https://data.ldaca.edu.au/api/entity/https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001"
curl "https://data.ldaca.edu.au/api/entity/https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001/crate"
```

### Downloading Files
**Response:**

```json
{
"@context": "https://w3id.org/ro/crate/1.1/context",
"@graph": [
{
"@id": "ro-crate-metadata.json",
"@type": "CreativeWork",
"conformsTo": {
"@id": "https://w3id.org/ro/crate/1.1"
},
"about": {
"@id": "./"
}
},
{
"@id": "./",
"@type": "Dataset",
"name": "Recordings of West Alor languages",
"description": "A compilation of recordings featuring various West Alor languages"
}
]
}
```

This is useful for:

- Validating RO-Crate compliance
- Accessing extended metadata not exposed in the entity API
- Archival and preservation workflows
- Integration with RO-Crate tools

Download a specific file:
## 4. Working with Files

### Understanding Entities vs Files

The API provides two ways to work with files:

- **`/entities`** - Returns RO-Crate entities including MediaObjects (files that are part of the RO-Crate metadata)
- **`/files`** - Returns files from the repository's file system

**Important**: Not all files are represented as RO-Crate entities. MediaObject entities are typically a subset of all files in the repository.

### Listing Files from the File System

List all files in the repository:

```bash
# Direct download
wget "https://data.ldaca.edu.au/api/entity/https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001/file/recording.wav"
# List all files
curl "https://data.ldaca.edu.au/api/files"

# List files attached to a specific entity
curl "https://data.ldaca.edu.au/api/files?memberOf=https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001"

# Paginate through files
curl "https://data.ldaca.edu.au/api/files?limit=50&offset=0"
```

### Getting Download URLs
### Listing MediaObject Entities

Instead of direct download, get the file location:
List files that are part of the RO-Crate:

```bash
curl "https://data.ldaca.edu.au/api/entity/https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001/file/recording.wav?noRedirect=true"
curl "https://data.ldaca.edu.au/api/entities?entityType=http://schema.org/MediaObject"
```

MediaObject entities include a `fileId` field that references the file in the `/files` endpoint:

```json
{
"id": "https://catalog.paradisec.org.au/repository/LRB/001/recording.wav",
"name": "recording.wav",
"entityType": "http://schema.org/MediaObject",
"fileId": "https://catalog.paradisec.org.au/repository/LRB/001/recording.wav",
...
}
```

### Accessing File Content

```bash
# Direct file download
curl "https://data.ldaca.edu.au/api/file/https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001%2Frecording.wav" -o recording.wav
```

### Download as Attachment

Force download with a custom filename:

```bash
curl "https://data.ldaca.edu.au/api/file/https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001%2Frecording.wav?disposition=attachment&filename=my-recording.wav"
```

### Getting File Location

Get the file location without downloading:

```bash
curl "https://data.ldaca.edu.au/api/file/https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001%2Frecording.wav?noRedirect=true"
```

**Response:**
Expand All @@ -131,22 +216,32 @@ curl "https://data.ldaca.edu.au/api/entity/https%3A%2F%2Fcatalog.paradisec.org.a
}
```

## 4. Paginating Through Large Result Sets
### Partial Content Download

Use HTTP range requests for streaming or resuming downloads:

```bash
# Download first 1KB
curl -H "Range: bytes=0-1023" \
"https://data.ldaca.edu.au/api/file/https%3A%2F%2Fcatalog.paradisec.org.au%2Frepository%2FLRB%2F001%2Frecording.wav"
```

## 5. Paginating Through Large Result Sets

### Basic Pagination

```bash
# First page
curl "https://data.ldaca.edu.au/api/entities?limit=100&offset=0"

# Second page
# Second page
curl "https://data.ldaca.edu.au/api/entities?limit=100&offset=100"

# Third page
curl "https://data.ldaca.edu.au/api/entities?limit=100&offset=200"
```

## 5. Working with Communication Modes
## 6. Working with Communication Modes

Language archives often categorise data by communication mode:

Expand Down
33 changes: 30 additions & 3 deletions docs/intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,8 @@ engines.
### 📊 **Rich Metadata**

- Complete RO-Crate entity information
- Hierarchical collection browsing
- Hierarchical collection browsing (Collections, Objects, and MediaObjects)
- Access to raw RO-Crate metadata
- File access and download
- Conformance to LDAC profiles

Expand All @@ -78,9 +79,11 @@ The RO-Crate API is already in use by several major research data repositories:
### Entity Management

- List entities with filtering and pagination
- Retrieve detailed entity information
- List files with filtering and pagination
- Retrieve detailed entity information for Collections, Objects, and MediaObjects
- Navigate collection hierarchies
- Access files and media
- Access file content directly
- Retrieve raw RO-Crate JSON-LD metadata

### Advanced Search

Expand All @@ -97,6 +100,30 @@ The RO-Crate API is already in use by several major research data repositories:
- Configurable content disposition
- Location-based redirects for distributed storage

## Understanding Entities vs Files

The API provides two complementary ways to access content:

### `/entities` Endpoints

The `/entities` endpoints return RO-Crate entities, which can be:
- **Collections** - Groups of related items
- **Objects** - Individual items that may contain files
- **MediaObjects** - Individual files that are part of the RO-Crate

MediaObject entities include a `fileId` field that can be used with the `/files` endpoints.

### `/files` Endpoints

The `/files` endpoints return files from the repository's file system.

**Important**: In a repository, not all files are necessarily represented as file entities in an RO-Crate. Therefore:
- MediaObject entities will typically be a **subset** of all files
- Some files may be accessible via `/files` but not appear in `/entities`
- Files that are part of the RO-Crate metadata will have corresponding MediaObject entities

Use `/entities` when you need RO-Crate metadata about files, and `/files` when you need to browse or access the repository's file system directly.

## Getting Started

Choose your path based on your needs:
Expand Down
Loading