Evaluation¶

grand-challenge.org has a system for automatically evaluating new submissions. Challenge administrators upload their own Docker containers that will be executed by Celery workers when a new submission in uploaded by a participant.

Evaluation Container Requirements¶

The evaluation container must contain everything that is needed to perform the evaluation on a new submission. This includes the reference standard, and the code that will execute the evaluation on the new submission. An instance of the evaluation container image is created for each submission.

Input¶

The participants submission will be extracted and mounted as a docker volume on /input/.

Entrypoint¶

The container will be run with the default arguments, so the entrypoint must by default produce an evaluation for the data that will reside on /input/. The container is responsible for loading all of the data, handling incorrect filenames, incomplete submissions, duplicate folders, etc.

Errors¶

If there is an error in the evaluation process grand-challenge.org will parse stderr and return the last non-empty line to the user. If your evaluation script is in Python the best practice is to raise an exception and the message will then be passed to the user, eg

raise AttributeError(‘Expected to find 10 images, you submitted 5’)

Output¶

The container must produce the file /output/metrics.json. The structure within must be valid json (ie. loadable with json.loads()) and will be stored as a result in the database. The challenge administrator is free to define what metrics are included. We recommend storing results in two objects - case for the scores on individual cases (eg, scans), and aggregates for when there is one number per evaluation. For example:

{
  "case": {
    "dicecoefficient": {
      "0": 0.6461774875144065,
      "1": 0.7250400040547097,
      "2": 0.6747092236948878,
      "3": 0.6452332692745784,
      "4": 0.6839602948067993,
      "5": 0.6817807628480707,
      "6": 0.4715406247268339,
      "7": 0.5988810496224731,
      "8": 0.5475856316815167,
      "9": 0.32923801642370615
    },
    "jaccardcoefficient": {
      "0": 0.47729852440408627,
      "1": 0.5686766693547471,
      "2": 0.5091027839007266,
      "3": 0.47626890640360103,
      "4": 0.5197109875240358,
      "5": 0.5171983108978807,
      "6": 0.30850713624139353,
      "7": 0.4274305543159676,
      "8": 0.3770174983296798,
      "9": 0.1970585994056237
    },
    "alg_fname": {
      "0": "1.2.840.113704.1.111.2296.1199810886.7.mhd",
      "1": "1.2.276.0.28.3.0.14.4.0.20090213134050413.mhd",
      "2": "1.2.276.0.28.3.0.14.4.0.20090213134114792.mhd",
      "3": "1.2.840.113704.1.111.2004.1131987870.11.mhd",
      "4": "1.2.840.113704.1.111.2296.1199810941.11.mhd",
      "5": "1.2.840.113704.1.111.4400.1131982359.11.mhd",
      "6": "1.3.12.2.1107.5.1.4.50585.4.0.7023259421321855.mhd",
      "7": "1.0.000.000000.0.00.0.0000000000.0000.0000000000.000.mhd",
      "8": "1.2.392.200036.9116.2.2.2.1762676169.1080882991.2256.mhd",
      "9": "2.16.840.1.113669.632.21.3825556854.538251028.390606191418956020.mhd"
    },
    "gt_fname": {
      "0": "1.2.840.113704.1.111.2296.1199810886.7.mhd",
      "1": "1.2.276.0.28.3.0.14.4.0.20090213134050413.mhd",
      "2": "1.2.276.0.28.3.0.14.4.0.20090213134114792.mhd",
      "3": "1.2.840.113704.1.111.2004.1131987870.11.mhd",
      "4": "1.2.840.113704.1.111.2296.1199810941.11.mhd",
      "5": "1.2.840.113704.1.111.4400.1131982359.11.mhd",
      "6": "1.3.12.2.1107.5.1.4.50585.4.0.7023259421321855.mhd",
      "7": "1.0.000.000000.0.00.0.0000000000.0000.0000000000.000.mhd",
      "8": "1.2.392.200036.9116.2.2.2.1762676169.1080882991.2256.mhd",
      "9": "2.16.840.1.113669.632.21.3825556854.538251028.390606191418956020.mhd"
    }
  },
  "aggregates": {
    "dicecoefficient_mean": 0.6004146364647982,
    "dicecoefficient_std": 0.12096508479974993,
    "dicecoefficient_min": 0.32923801642370615,
    "dicecoefficient_max": 0.7250400040547097,
    "jaccardcoefficient_mean": 0.4378269970777743,
    "jaccardcoefficient_std": 0.11389145837530869,
    "jaccardcoefficient_min": 0.1970585994056237,
    "jaccardcoefficient_max": 0.5686766693547471,
  }
}

Evaluation Options¶

class grandchallenge.evaluation.models.Phase(id, created, modified, view_content, hanging_protocol, challenge, archive, title, slug, score_title, score_jsonpath, score_error_jsonpath, score_default_sort, score_decimal_places, extra_results_columns, scoring_method_choice, result_display_choice, creator_must_be_verified, submission_kind, allow_submission_comments, display_submission_comments, supplementary_file_choice, supplementary_file_label, supplementary_file_help_text, show_supplementary_file_link, supplementary_url_choice, supplementary_url_label, supplementary_url_help_text, show_supplementary_url, submissions_limit_per_user_per_period, submission_limit_period, submissions_open_at, submissions_close_at, submission_page_markdown, auto_publish_new_results, display_all_metrics, algorithm_selectable_gpu_type_choices, algorithm_maximum_settable_memory_gb, algorithm_time_limit, give_algorithm_editors_job_view_permissions, evaluation_time_limit, evaluation_selectable_gpu_type_choices, evaluation_requires_gpu_type, evaluation_maximum_settable_memory_gb, evaluation_requires_memory_gb, public, workstation, workstation_config, average_algorithm_job_duration, compute_cost_euro_millicents, parent, external_evaluation)[source]¶

Parameters:

id (UUIDField) – Primary key: Id
created (DateTimeField) – Created
modified (DateTimeField) – Modified
view_content (JSONField) – View content
title (CharField) – Title. The title of this phase.
slug (AutoSlugField) – Slug
score_title (CharField) – Score title. The name that will be displayed for the scores column, for instance: Score (log-loss)
score_jsonpath (CharField) – Score jsonpath. The jsonpath of the field in metrics.json that will be used for the overall scores on the results page. See http://goessner.net/articles/JsonPath/ for syntax. For example: dice.mean
score_error_jsonpath (CharField) – Score error jsonpath. The jsonpath for the field in metrics.json that contains the error of the score, eg: dice.std
score_default_sort (CharField) – Score default sort. The default sorting to use for the scores on the results page.
score_decimal_places (PositiveSmallIntegerField) – Score decimal places. The number of decimal places to display for the score
extra_results_columns (JSONField) – Extra results columns. A JSON object that contains the extra columns from metrics.json that will be displayed on the results page. An example that will display accuracy score with error would look like this: [{“path”: “accuracy.mean”,”order”: “asc”,”title”: “ASSD +/- std”,”error_path”: “accuracy.std”,”exclude_from_ranking”: true}]
scoring_method_choice (CharField) – Scoring method choice. How should the rank of each result be calculated?
result_display_choice (CharField) – Result display choice. Which results should be displayed on the leaderboard?
creator_must_be_verified (BooleanField) – Creator must be verified. If True, only participants with verified accounts can make submissions to this phase
submission_kind (PositiveSmallIntegerField) – Submission kind. Should participants submit a .csv/.zip file of predictions, or an algorithm?
allow_submission_comments (BooleanField) – Allow submission comments. Allow users to submit comments as part of their submission.
display_submission_comments (BooleanField) – Display submission comments. If true, submission comments are shown on the results page.
supplementary_file_choice (CharField) – Supplementary file choice. Show a supplementary file field on the submissions page so that users can upload an additional file along with their predictions file as part of their submission (eg, include a pdf description of their method). Off turns this feature off, Optional means that including the file is optional for the user, Required means that the user must upload a supplementary file.
supplementary_file_label (CharField) – Supplementary file label. The label that will be used on the submission and results page for the supplementary file. For example: Algorithm Description.
supplementary_file_help_text (CharField) – Supplementary file help text. The help text to include on the submissions page to describe the submissions file. Eg: “A PDF description of the method.”.
show_supplementary_file_link (BooleanField) – Show supplementary file link. Show a link to download the supplementary file on the results page.
supplementary_url_choice (CharField) – Supplementary url choice. Show a supplementary url field on the submission page so that users can submit a link to a publication that corresponds to their submission. Off turns this feature off, Optional means that including the url is optional for the user, Required means that the user must provide an url.
supplementary_url_label (CharField) – Supplementary url label. The label that will be used on the submission and results page for the supplementary url. For example: Publication.
supplementary_url_help_text (CharField) – Supplementary url help text. The help text to include on the submissions page to describe the submissions url. Eg: “A link to your publication.”.
show_supplementary_url (BooleanField) – Show supplementary url. Show a link to the supplementary url on the results page
submissions_limit_per_user_per_period (PositiveIntegerField) – Submissions limit per user per period. The limit on the number of times that a user can make a submission over the submission limit period. Set this to 0 to close submissions for this phase.
submission_limit_period (PositiveSmallIntegerField) – Submission limit period. The number of days to consider for the submission limit period. If this is set to 1, then the submission limit is applied over the previous day. If it is set to 365, then the submission limit is applied over the previous year. If the value is not set, then the limit is applied over all time.
submissions_open_at (DateTimeField) – Submissions open at. If set, participants will not be able to make submissions to this phase before this time. Enter the date and time in your local timezone.
submissions_close_at (DateTimeField) – Submissions close at. If set, participants will not be able to make submissions to this phase after this time. Enter the date and time in your local timezone.
submission_page_markdown (TextField) – Submission page markdown. Markdown to include on the submission page to provide more context to users making a submission to the phase.
auto_publish_new_results (BooleanField) – Auto publish new results. If true, new results are automatically made public. If false, the challenge administrator must manually publish each new result.
display_all_metrics (BooleanField) – Display all metrics. If True, the entire contents of metrics.json is available on the results detail page and over the API. If False, only the metrics used for ranking are available on the results detail page and over the API. Challenge administrators can always access the full metrics.json over the API.
algorithm_selectable_gpu_type_choices (JSONField) – Algorithm selectable gpu type choices. The GPU type choices that participants will be able to select for their algorithm inference jobs. The setting on the algorithm will be validated against this on submission. Options are [“”, “A100”, “A10G”, “V100”, “K80”, “T4”].
algorithm_maximum_settable_memory_gb (PositiveSmallIntegerField) – Algorithm maximum settable memory gb. Maximum amount of main memory (DRAM) that participants will be allowed to assign to algorithm inference jobs for submission. The setting on the algorithm will be validated against this on submission.
algorithm_time_limit (PositiveIntegerField) – Algorithm time limit. Time limit for inference jobs in seconds
give_algorithm_editors_job_view_permissions (BooleanField) – Give algorithm editors job view permissions. If set to True algorithm editors (i.e. challenge participants) will automatically be given view permissions to the algorithm jobs and their logs associated with this phase. This saves challenge administrators from having to manually share the logs for each failed submission. <b>Setting this to True will essentially make the data in the linked archive accessible to the participants. Only set this to True for debugging phases, where participants can check that their algorithms are working.</b> Algorithm editors will only be able to access their own logs and predictions, not the logs and predictions from other users.
evaluation_time_limit (PositiveIntegerField) – Evaluation time limit. Time limit for evaluation jobs in seconds
evaluation_selectable_gpu_type_choices (JSONField) – Evaluation selectable gpu type choices. The GPU type choices that challenge admins will be able to set for the evaluation method. Options are [“”, “A100”, “A10G”, “V100”, “K80”, “T4”].
evaluation_requires_gpu_type (CharField) – Evaluation requires gpu type. What GPU to attach to this phases evaluations. Note that the GPU attached to any algorithm inference jobs is determined by the submitted algorithm.
evaluation_maximum_settable_memory_gb (PositiveSmallIntegerField) – Evaluation maximum settable memory gb. Maximum amount of main memory (DRAM) that challenge admins will be able to assign for the evaluation method.
evaluation_requires_memory_gb (PositiveSmallIntegerField) – Evaluation requires memory gb. How much main memory (DRAM) to assign to this phases evaluations. Note that the memory assigned to any algorithm inference jobs is determined by the submitted algorithm.
public (BooleanField) – Public. Uncheck this box to hide this phase’s submission page and leaderboard from participants. Participants will then no longer have access to their previous submissions and evaluations from this phase if they exist, and they will no longer see the respective submit and leaderboard tabs for this phase. For you as admin these tabs remain visible. Note that hiding a phase is only possible if submissions for this phase are closed for participants.
average_algorithm_job_duration (DurationField) – Average algorithm job duration. The average duration of successful algorithm jobs for this phase
compute_cost_euro_millicents (PositiveBigIntegerField) – Compute cost euro millicents. The total compute cost for this phase in Euro Cents, including Tax
external_evaluation (BooleanField) – External evaluation. Are submissions to this phase evaluated externally? If so, it is the responsibility of the external service to claim and evaluate new submissions, download the submitted algorithm models and images and return the results.

Relationship fields:

Parameters:

hanging_protocol (ForeignKey to HangingProtocol) – Hanging protocol. Indicate which sockets need to be displayed in which image port. E.g. {“main”: [“socket1”]}. The first item in the list of sockets will be the main image in the image port. The first overlay type socket thereafter will be rendered as an overlay. For now, any other items will be ignored by the viewer. (related name: phase)
challenge (ForeignKey to Challenge) – Challenge (related name: phase)
archive (ForeignKey to Archive) – Archive. Which archive should be used as the source dataset for this phase? (related name: phase)
workstation (ForeignKey to Workstation) – Workstation (related name: phase)
workstation_config (ForeignKey to WorkstationConfig) – Workstation config (related name: phase)
parent (ForeignKey to Phase) – Parent. Is this phase dependent on another phase? If selected, submissions to the current phase will only be possible after a successful submission has been made to the parent phase. <b>Bear in mind that if you require a successful submission to a sanity check phase in order to submit to a final test phase, it could prevent people from submitting to the test phase on deadline day if the sanity check submission takes a long time to execute. </b> (related name: children)
algorithm_interfaces (ManyToManyField to AlgorithmInterface) – Algorithm interfaces. The interfaces that an algorithm for this phase must implement. (related name: phase)
additional_evaluation_inputs (ManyToManyField to ComponentInterface) – Additional evaluation inputs (related name: additional_eval_inputs)
evaluation_outputs (ManyToManyField to ComponentInterface) – Evaluation outputs (related name: eval_outputs)
optional_hanging_protocols (ManyToManyField to HangingProtocol) – Optional hanging protocols. Optional alternative hanging protocols for this phase (related name: optional_for_phase)
actor_actions (GenericRelation to Action) – Actor actions (related name: actions_with_evaluation_phase_as_actor)
target_actions (GenericRelation to Action) – Target actions (related name: actions_with_evaluation_phase_as_target)
action_object_actions (GenericRelation to Action) – Action object actions (related name: actions_with_evaluation_phase_as_action_object)

Reverse relationships:

Parameters:

children (Reverse ForeignKey from Phase) – All children of this phase (related name of parent)
phaseadditionalevaluationinput (Reverse ForeignKey from PhaseAdditionalEvaluationInput) – All phase additional evaluation inputs of this phase (related name of phase)
phaseevaluationoutput (Reverse ForeignKey from PhaseEvaluationOutput) – All phase evaluation outputs of this phase (related name of phase)
phaseuserobjectpermission (Reverse ForeignKey from PhaseUserObjectPermission) – All phase user object permissions of this phase (related name of content_object)
phasegroupobjectpermission (Reverse ForeignKey from PhaseGroupObjectPermission) – All phase group object permissions of this phase (related name of content_object)
phasealgorithminterface (Reverse ForeignKey from PhaseAlgorithmInterface) – All phase algorithm interfaces of this phase (related name of phase)
method (Reverse ForeignKey from Method) – All methods of this phase (related name of phase)
submission (Reverse ForeignKey from Submission) – All submissions of this phase (related name of phase)
ground_truths (Reverse ForeignKey from EvaluationGroundTruth) – All ground truths of this phase (related name of phase)
combinedleaderboard (Reverse ManyToManyField from CombinedLeaderboard) – All combined leaderboards of this phase (related name of phases)
combinedleaderboardphase (Reverse ForeignKey from CombinedLeaderboardPhase) – All combined leaderboard phases of this phase (related name of phase)
optionalhangingprotocolphase (Reverse ForeignKey from OptionalHangingProtocolPhase) – All optional hanging protocol phases of this phase (related name of phase)
job_utilizations (Reverse ForeignKey from JobUtilization) – All job utilizations of this phase (related name of phase)
job_warm_pool_utilizations (Reverse ForeignKey from JobWarmPoolUtilization) – All job warm pool utilizations of this phase (related name of phase)
evaluation_utilizations (Reverse ForeignKey from EvaluationUtilization) – All evaluation utilizations of this phase (related name of phase)

exception DoesNotExist¶

exception MultipleObjectsReturned¶

class StatusChoices(value)¶

class SubmissionKindChoices(value)¶

active_ground_truth¶

active_image¶

clean()[source]¶: Hook for doing any extra model-wide validation after clean() has been called on every field by self.clean_fields. Any ValidationError raised by this method will not be associated with a particular field; it will have a special-case association with the field defined by NON_FIELD_ERRORS.

get_next_submission(*, user)[source]¶: Determines the number of submissions left for the user, and when they can next submit.

save(*args, skip_calculate_ranks=False, **kwargs)[source]¶

Save the current instance. Override this in a subclass if you want to control the saving process.

The ‘force_insert’ and ‘force_update’ parameters can be used to insist that the “save” must be an SQL insert or update (or equivalent for non-SQL backends), respectively. Normally, they should not be set.

valid_archive_items_per_interface¶: Returns the archive items that are valid for

Template Tags¶

grandchallenge.evaluation.templatetags.evaluation_extras.get_jsonpath(obj, jsonpath)[source]¶

Gets a value from a dictionary based on a jsonpath. It will only return one result, and if a key does not exist it will return an empty string as template tags should not raise errors.

Parameters:

obj (dict) – The dictionary to query
jsonpath – The path to the object (singular)

Returns:

The most relevant object in the dictionary

Evaluation¶

Evaluation Container Requirements¶

Input¶

Entrypoint¶

Errors¶

Output¶

Evaluation Options¶

Template Tags¶

Previous topic

Next topic