Evaluation

grand-challenge.org has a system for automatically evaluating new submissions. Challenge administrators upload their own Docker containers that will be executed by Celery workers when a new submission in uploaded by a participant.

Evaluation Container Requirements

The evaluation container must contain everything that is needed to perform the evaluation on a new submission. This includes the reference standard, and the code that will execute the evaluation on the new submission. An instance of the evaluation container image is created for each submission.

Input

The participants submission will be extracted and mounted as a docker volume on /input/.

Entrypoint

The container will be run with the default arguments, so the entrypoint must by default produce an evaluation for the data that will reside on /input/. The container is responsible for loading all of the data, handling incorrect filenames, incomplete submissions, duplicate folders, etc.

Errors

If there is an error in the evaluation process grand-challenge.org will parse stderr and return the last non-empty line to the user. If your evaluation script is in Python the best practice is to raise an exception and the message will then be passed to the user, eg

raise AttributeError(‘Expected to find 10 images, you submitted 5’)

Output

The container must produce the file /output/metrics.json. The structure within must be valid json (ie. loadable with json.loads()) and will be stored as a result in the database. The challenge administrator is free to define what metrics are included. We recommend storing results in two objects - case for the scores on individual cases (eg, scans), and aggregates for when there is one number per evaluation. For example:

{
  "case": {
    "dicecoefficient": {
      "0": 0.6461774875144065,
      "1": 0.7250400040547097,
      "2": 0.6747092236948878,
      "3": 0.6452332692745784,
      "4": 0.6839602948067993,
      "5": 0.6817807628480707,
      "6": 0.4715406247268339,
      "7": 0.5988810496224731,
      "8": 0.5475856316815167,
      "9": 0.32923801642370615
    },
    "jaccardcoefficient": {
      "0": 0.47729852440408627,
      "1": 0.5686766693547471,
      "2": 0.5091027839007266,
      "3": 0.47626890640360103,
      "4": 0.5197109875240358,
      "5": 0.5171983108978807,
      "6": 0.30850713624139353,
      "7": 0.4274305543159676,
      "8": 0.3770174983296798,
      "9": 0.1970585994056237
    },
    "alg_fname": {
      "0": "1.2.840.113704.1.111.2296.1199810886.7.mhd",
      "1": "1.2.276.0.28.3.0.14.4.0.20090213134050413.mhd",
      "2": "1.2.276.0.28.3.0.14.4.0.20090213134114792.mhd",
      "3": "1.2.840.113704.1.111.2004.1131987870.11.mhd",
      "4": "1.2.840.113704.1.111.2296.1199810941.11.mhd",
      "5": "1.2.840.113704.1.111.4400.1131982359.11.mhd",
      "6": "1.3.12.2.1107.5.1.4.50585.4.0.7023259421321855.mhd",
      "7": "1.0.000.000000.0.00.0.0000000000.0000.0000000000.000.mhd",
      "8": "1.2.392.200036.9116.2.2.2.1762676169.1080882991.2256.mhd",
      "9": "2.16.840.1.113669.632.21.3825556854.538251028.390606191418956020.mhd"
    },
    "gt_fname": {
      "0": "1.2.840.113704.1.111.2296.1199810886.7.mhd",
      "1": "1.2.276.0.28.3.0.14.4.0.20090213134050413.mhd",
      "2": "1.2.276.0.28.3.0.14.4.0.20090213134114792.mhd",
      "3": "1.2.840.113704.1.111.2004.1131987870.11.mhd",
      "4": "1.2.840.113704.1.111.2296.1199810941.11.mhd",
      "5": "1.2.840.113704.1.111.4400.1131982359.11.mhd",
      "6": "1.3.12.2.1107.5.1.4.50585.4.0.7023259421321855.mhd",
      "7": "1.0.000.000000.0.00.0.0000000000.0000.0000000000.000.mhd",
      "8": "1.2.392.200036.9116.2.2.2.1762676169.1080882991.2256.mhd",
      "9": "2.16.840.1.113669.632.21.3825556854.538251028.390606191418956020.mhd"
    }
  },
  "aggregates": {
    "dicecoefficient_mean": 0.6004146364647982,
    "dicecoefficient_std": 0.12096508479974993,
    "dicecoefficient_min": 0.32923801642370615,
    "dicecoefficient_max": 0.7250400040547097,
    "jaccardcoefficient_mean": 0.4378269970777743,
    "jaccardcoefficient_std": 0.11389145837530869,
    "jaccardcoefficient_min": 0.1970585994056237,
    "jaccardcoefficient_max": 0.5686766693547471,
  }
}

Evaluation Options

class grandchallenge.evaluation.models.Phase(id, created, modified, challenge, archive, title, slug, score_title, score_jsonpath, score_error_jsonpath, score_default_sort, score_decimal_places, extra_results_columns, scoring_method_choice, result_display_choice, creator_must_be_verified, submission_kind, allow_submission_comments, display_submission_comments, supplementary_file_choice, supplementary_file_label, supplementary_file_help_text, show_supplementary_file_link, publication_url_choice, show_publication_url, daily_submission_limit, submissions_open, submissions_close, submission_page_html, auto_publish_new_results, display_all_metrics, evaluation_detail_observable_url, evaluation_comparison_observable_url)[source]
Parameters
  • id (UUIDField) – Id

  • created (DateTimeField) – Created

  • modified (DateTimeField) – Modified

  • challenge (ForeignKey to Challenge) – Challenge

  • archive (ForeignKey to Archive) – Archive. Which archive should be used as the source dataset for this phase?

  • title (CharField) – Title. The title of this phase.

  • slug (AutoSlugField) – Slug

  • score_title (CharField) – Score title. The name that will be displayed for the scores column, for instance: Score (log-loss)

  • score_jsonpath (CharField) – Score jsonpath. The jsonpath of the field in metrics.json that will be used for the overall scores on the results page. See http://goessner.net/articles/JsonPath/ for syntax. For example: dice.mean

  • score_error_jsonpath (CharField) – Score error jsonpath. The jsonpath for the field in metrics.json that contains the error of the score, eg: dice.std

  • score_default_sort (CharField) – Score default sort. The default sorting to use for the scores on the results page.

  • score_decimal_places (PositiveSmallIntegerField) – Score decimal places. The number of decimal places to display for the score

  • extra_results_columns (JSONField) – Extra results columns. A JSON object that contains the extra columns from metrics.json that will be displayed on the results page.

  • scoring_method_choice (CharField) – Scoring method choice. How should the rank of each result be calculated?

  • result_display_choice (CharField) – Result display choice. Which results should be displayed on the leaderboard?

  • creator_must_be_verified (BooleanField) – Creator must be verified. If True, only participants with verified accounts can make submissions to this phase

  • submission_kind (PositiveSmallIntegerField) – Submission kind. Should participants submit a .csv/.zip file of predictions, or an algorithm?

  • allow_submission_comments (BooleanField) – Allow submission comments. Allow users to submit comments as part of their submission.

  • display_submission_comments (BooleanField) – Display submission comments. If true, submission comments are shown on the results page.

  • supplementary_file_choice (CharField) – Supplementary file choice. Show a supplementary file field on the submissions page so that users can upload an additional file along with their predictions file as part of their submission (eg, include a pdf description of their method). Off turns this feature off, Optional means that including the file is optional for the user, Required means that the user must upload a supplementary file.

  • supplementary_file_label (CharField) – Supplementary file label. The label that will be used on the submission and results page for the supplementary file. For example: Algorithm Description.

  • supplementary_file_help_text (CharField) – Supplementary file help text. The help text to include on the submissions page to describe the submissions file. Eg: “A PDF description of the method.”.

  • show_supplementary_file_link (BooleanField) – Show supplementary file link. Show a link to download the supplementary file on the results page.

  • publication_url_choice (CharField) – Publication url choice. Show a publication url field on the submission page so that users can submit a link to a publication that corresponds to their submission. Off turns this feature off, Optional means that including the url is optional for the user, Required means that the user must provide an url.

  • show_publication_url (BooleanField) – Show publication url. Show a link to the publication on the results page

  • daily_submission_limit (PositiveIntegerField) – Daily submission limit. The limit on the number of times that a user can make a submission in a 24 hour period.

  • submissions_open (DateTimeField) – Submissions open. If set, participants will not be able to make submissions to this phase before this time.

  • submissions_close (DateTimeField) – Submissions close. If set, participants will not be able to make submissions to this phase after this time.

  • submission_page_html (TextField) – Submission page html. HTML to include on the submission page for this challenge.

  • auto_publish_new_results (BooleanField) – Auto publish new results. If true, new results are automatically made public. If false, the challenge administrator must manually publish each new result.

  • display_all_metrics (BooleanField) – Display all metrics. Should all of the metrics be displayed on the Result detail page?

  • evaluation_detail_observable_url (URLField) – Evaluation detail observable url. The URL of the embeddable observable notebook for viewing individual results. Must be of the form https://observablehq.com/embed/@user/notebook?cell=

  • evaluation_comparison_observable_url (URLField) – Evaluation comparison observable url. The URL of the embeddable observable notebook for comparingresults. Must be of the form https://observablehq.com/embed/@user/notebook?cell=

  • inputs (ManyToManyField) – Inputs

  • outputs (ManyToManyField) – Outputs

exception DoesNotExist
exception MultipleObjectsReturned
class SubmissionKind(value)[source]

An enumeration.

save(*args, **kwargs)[source]

Save the current instance. Override this in a subclass if you want to control the saving process.

The ‘force_insert’ and ‘force_update’ parameters can be used to insist that the “save” must be an SQL insert or update (or equivalent for non-SQL backends), respectively. Normally, they should not be set.

Template Tags

grandchallenge.evaluation.templatetags.evaluation_extras.get_jsonpath(obj, jsonpath)[source]

Gets a value from a dictionary based on a jsonpath. It will only return one result, and if a key does not exist it will return an empty string as template tags should not raise errors.

Parameters
  • obj (dict) – The dictionary to query

  • jsonpath – The path to the object (singular)

Returns

The most relevant object in the dictionary