Secure Code Review Automated Platform

A prototype for the automated evaluation of computer science students's PHP code submissions regarding vulnerabilities and to provide feedback on how to fix them.


server server-repo api-def api-doc client client-repo scanner-eval

Note regarding the demo server and client:
Due to maintenance efforts and low interest I have currently turned off the demo server. If you want to test it, without setting up your own demo server, send me an e-mail and I'll spin up the demo server again.

Background

In my master thesis in IT Security at the FH Technikum Wien I am investigating "the feasibility of using Free/Libre and Open Source Software (F/LOSS) tools, to build a toolchain and feedback generation platform, that could be used in introductory programming courses to add opportunities for secure code awareness, without overloading the existing coursework" (Klaura 2020).

A broad literature review reveals that there is still too little going on in the field of software security education - a few notable exceptions are highlighted in the thesis. In contrast, one of the major angles to improve software security might lie in better programming education, which includes security from the start and builds awareness for vulnerabilities in code early on in the education of many future software developers and engineers.

The practical part of the thesis focuses on the question if / how free/libre and open source static code analysis tools (or SAST tools, short for Static Application Security Testing) can be used to analyse and generate feedback for (introductory) programming learners. This includes a review of current available (free/libre) tools for static analysis of PHP code, which is detailed in the thesis. The primary survey of the most viable options can also be found in the scrap-scanner-eval repository.

The following sections and this site in general are dedicated to describing the prototype and its usage. For detailed background information on the feasibility of this approach and what would be really needed, take a look at the published thesis.


The prototype

The SCRAP prototype consists of three components:

Screenshot of the SCRAP API definition on SwaggerHub

SCRAP API

The prototype was designed with an API-first approach, which puts the interface and the transmitted data type at its core. This is due to SCRAP's requirement to be easily integrable into existing code submission systems. Find out more about the OpenAPI 3.0 conforming interface

  • in the api-def on SwaggerHub, or
  • in the rendered api-doc, containing code templates.

Screenshot of a running dev server and a curl request

API server

The SCRAP web service is a RESTful API server written in Python, utilising the Flask framework and the Flask-RESTful extension. You can check out:

Screenshot of a scan view in the SCRAP web UI

Web UI

The SCRAP UI is a web interface based on the JavaScript framework Vue.js. You can check out:

  • client, a running version of the web UI here on the SCRAP site.
  • client-repo, the web UI's source at GitLab.


Additional to the three components of the prototype the scanner-eval repository might be interesting. It documents the evaluation of several F/LOSS static analysis tools for PHP code for their applicability im SCRAP. All of them could be integrated without a lot of effort into the SCRAP server, for the prototype I decided to use PHP CodeSniffer with the ruleset provided by phpcs-security-audit, and YARA with the ruleset provided by PHP Malware Finder.

Setup and running the server and the web UI

If you want to set up your own local development server, or a productive remote server, there are detailed setup instructions in the README.md of the scrap-api-server repository.

Also the scrap-client repository contains a README.md file with instructions on how to run a development server and how to deploy the UI to a remote server.


Usage

You can use the public demo server and the web UI, to test SCRAP right away. If you want to test it in your own development environment take a look at the setup instructions in the repos.

If you have curl and jq already installed, try out the following (you can leave out the pipe into jq, but then the output is not formatted at all):


# Retrieve meta information of the API server:
curl -s https://scrap.tantemalkah.at/api/v1 | jq .
# Check out, which scanners are available:
curl -s https://scrap.tantemalkah.at/api/v1/scanners | jq .
# Retrieve the list of explanations and a specific explanation
curl -s https://scrap.tantemalkah.at/api/v1/explanations | jq .
curl -s https://scrap.tantemalkah.at/api/v1/explanations/yara.SQLi | jq .
# For listing scans you need an API key and a username:
curl -H "X-API-User: public" -H "X-API-Key: public" \
  https://scrap.tantemalkah.at/api/v1/scans | jq .
        

A scan consists of files that have been scanned and issues that have been found. We can retrieve a single scan with its UUID, that was listed in the last call above:


curl -H "X-API-User: public" -H "X-API-Key: public" \
  https://scrap.tantemalkah.at/api/v1/scans/65fbf6d6-7eb8-4aad-83d0-7f3d0fd4a76c \
  | jq .
        

This provides a JSON object like the following:


{
  "id": "65fbf6d6-7eb8-4aad-83d0-7f3d0fd4a76c",
  "status": {
    "stage": "done",
    "percentage": 100
  },
  "issuesFound": 1,
  "files": 1,
  "created": "2020-04-15T14:36:36",
  "analysed": "2020-04-15T14:36:36"
}
        

Files and issues can be listed and retrieved with the following:


# List all files from a single scan:
curl -H "X-API-User: public" -H "X-API-Key: public" \
  https://scrap.tantemalkah.at/api/v1/scans/65fbf6d6-7eb8-4aad-83d0-7f3d0fd4a76c/files \
  | jq .
# List all issues found in a scan:
curl -H "X-API-User: public" -H "X-API-Key: public" \
  https://scrap.tantemalkah.at/api/v1/scans/65fbf6d6-7eb8-4aad-83d0-7f3d0fd4a76c/issues \
  | jq .
# List meta info for a file:
curl -H "X-API-User: public" -H "X-API-Key: public" \
  https://scrap.tantemalkah.at/api/v1/scans/65fbf6d6-7eb8-4aad-83d0-7f3d0fd4a76c/files/sqli_low.php \
  | jq .
# Get the raw file (don't filter through jq):
curl -H "X-API-User: public" -H "X-API-Key: public" \
  https://scrap.tantemalkah.at/api/v1/scans/65fbf6d6-7eb8-4aad-83d0-7f3d0fd4a76c/blob/sqli_low.php \
# List an issue:
curl -H "X-API-User: public" -H "X-API-Key: public" \
  https://scrap.tantemalkah.at/api/v1/scans/65fbf6d6-7eb8-4aad-83d0-7f3d0fd4a76c/issues/0 \
  | jq .
        

The most relevant parts for the SCRAP end user are the single issues and corresponding explanations, if there are any. All of this will be much more usable when displayed through a graphical interface, like the web UI, or even some custom text UI, if you want to implement one. The JSON output of a single issue looks like this:


{
  "source": {
    "scanner": "yara",
    "rule": "SQLi",
    "info": "https://virustotal.github.io/yara/",
    "cli": "yara -r -w -s -m /opt/scrap/scanners/yara/scrap.yar uploads/65fbf6d6-7eb8-4aad-83d0-7f3d0fd4a76c/sqli_low.php"
  },
  "type": "SQLi",
  "explanation": "yara.SQLi",
  "affectedFiles": [
    {
      "path": "sqli_low.php",
      "lines": [
        {
          "characters": {
            "from": 65,
            "to": 89
          },
          "text": "b\"$id = $_REQUEST[ 'id' ];\"",
          "description": "Your code might be vulnerable to an SQL injection | The $id parameter seems to not be sanitized | More info at: https://scrap/description/sqli"
        },
        {
          "characters": {
            "from": 125,
            "to": 186
          },
          "text": "b\"SELECT first_name, last_name FROM users WHERE user_id = '$id'\"",
          "description": "Your code might be vulnerable to an SQL injection | The $id parameter seems to not be sanitized | More info at: https://scrap/description/sqli"
        }
      ]
    }
  ]
}
        

The corresponding exlanation yara.SQLi, which was already retrieved in the first batch of curl examples above, and looks like this:


{
  "name": "SQL Injection through unsanitized `id` parameter",
  "type": "sqli",
  "isStub": false,
  "shortDescription": "If you use an `id` parameter without validation in an unparameterised SQL query, an attacker can easily inject malicous code.\n",
  "longDescription": "If you use an `id` parameter without validation in an unparameterised SQL\nquery, an attacker can easily inject malicous code.\n\n__What does this mean?__\n\nIf you take for example the following PHP code:\n```php\n$id = $_GET[\"id\"];\n# do some other stuff\n# and then query for, e.g. a user with this id in the database:\n$query  = \"SELECT first_name, last_name FROM users WHERE user_name = '$id';\";\n$result = mysqli_query($connection, $query);\n```\nWhat would happen, if someone submits `1' OR 1=1; -- -` as a value?\nThis would lead to the following effective query:\n```sql\nSELECT first_name, last_name FROM users\n  WHERE user_name = '1' OR 1=!; -- -'\n```\nAs the `-- -` makes the reminder of the original query (in this case only)\nthe `'`, we have a new query, with a `WHERE` clause that is always true.\nTherefore not only one row for a specific user will be returned, but all\nusers.\nBut worse could be done, e.g. by using the `UNION` construct to find out\nabout other tables data or even the whole database scheme.\n",
  "howToFix": "One of the best ways in PHP to safeguard against SQL injections is to\nuse [prepared statements](https://www.php.net/manual/en/mysqli.quickstart.prepared-statements.php). Instead of putting the parameters into the\nquery yourself, you can let the database library do that for you with\nthe `prepare` method of a mysqli database object:\n```php\n$db = new mysqli(\"example.com\", \"user\", \"password\", \"database\");\n\n# do some other stuff\n\n# STEP 1: prepare the query statement\n$stmt = $db->prepare('SELECT first_name, last_name ' .\n                     'FROM users WHERE user_name = ?');\nif (!$stmt) {\n  echo \"Prepare failed: (\" . $mysqli->errno . \") \" . $mysqli->error;\n}\n\n# STEP 2: bind the parameter to the query statement\nif (!$stmt->bind_param(\"i\", $id)) {\n  echo \"Binding parameters failed: (\" . $stmt->errno . \") \" . $stmt->error;\n}\n\n# STEP 3: execute the query\nif (!$stmt->execute()) {\n  echo \"Execute failed: (\" . $stmt->errno . \") \" . $stmt->error;\n}\n```\nApart from using such prepared statements it is also always advisable\nto [validate your user inputs](https://www.w3schools.com/php/php_form_validation.asp). You can also use the [PHP filter functions](https://www.w3schools.com/php/php_ref_filter.asp)\nto check if the input conforms to what you expect.",
  "references": [
    "https://www.w3schools.com/sql/sql_injection.asp",
    "https://www.php.net/manual/en/security.database.sql-injection.php",
    "https://en.wikipedia.org/wiki/SQL_injection",
    "http://cis1.towson.edu/~cssecinj/modules/other-modules/database/sql-injection-introduction/",
    "https://xkcd.com/327/",
    "https://bobby-tables.com/",
    "https://owasp.org/www-community/attacks/SQL_Injection"
  ]
}
        

The descriptions are formatted as Markdown, which in turn contains code sections wich can be highlighted, like in the longDescription above the following PHP piece, which is prone to SQL injections:


$id = $_GET["id"];
# do some other stuff
# and then query for, e.g. a user with this id in the database:
$query  = "SELECT first_name, last_name FROM users WHERE user_name = '$id';";
$result = mysqli_query($connection, $query);
        

Or the follwoing SQL command illustrating a simple injection test:


SELECT first_name, last_name FROM users
  WHERE user_name = '1' OR 1=!; -- -'
        

If you want to use a scripting or programming language, the api-doc provides code examples for several languages, as for example the following Python code, that can be used to retrieve a specific explanation:


from __future__ import print_statement
import time
import swagger_client
from swagger_client.rest import ApiException
from pprint import pprint

# create an instance of the API class
api_instance = swagger_client.PublicApi()
slug = slug_example # String |

try:
    # Retrieve an explanation to a vulnerability
    api_response = api_instance.explanations_slug_get(slug)
    pprint(api_response)
except ApiException as e:
    print("Exception when calling PublicApi->explanationsSlugGet: %s\n" % e)
        

An example how to display all of this to the user is provided by the web UI prototype. Go to https://scrap.tantemalkah.at/webui/scans, and you should always find at least one scan with the sqli_low.php file from the DVWA, that was used in the previous examples. Click on the scan and in the scan overview click on the one issue that was found. A screenshot of this is also available above in the prototype section.

If you have more question on how to use it, or if you would like to get a private API key, contact me.


Acknowledgements

The following people, systems and tools helped me to make all of this possible:

People

  • Maria Klaura, for the awesome SCRAP logo.
  • Katharina Simma, for proof reading the whole thesis and encouraging me to go on when I most needed it.
  • Christian Kaufmann, my thesis supervisor, who came up with the initial idea of evaluating a F/LOSS tool chain for the analysis of student code submissions.
  • My study colleagues who provided feedback on the initial project design and kept me motivated by showing how awesome life can be once you submitted that darn thing.
  • The members of my IT collective diebin.at, who cut me some slack in the months before the final thesis deadline.


Tech stack



License

All licenses for code can be found in the LICENSE files of the repos (usually AGPLv3).

Code pertaining to the dark-mode-switch is licensed under a MIT license and originally provided by Christian Oliff, with minor modifications by myself. This pertains to the files dark-mode-switch.js, dark-mode-switch.min.js and dark-mode.css.

The SCRAP logo is licensed under a Creative Commons Attribution-ShareAlike 4.0 International by Maria Klaura.

All other stuff, if not otherwise noted, is licensed under a Creative Commons Attribution-ShareAlike 4.0 International by me (see contact below).


Contact

If you want to know more about the whole project or any of its single parts, just say hello, or contribute, please contact on one of the below channels:


Presentation

Here are the backup documents for my master thesis presentation: