Large-scale Java GitHub search of 'test' in content, filename and file path

Saved in:
Bibliographic Details
Title: Large-scale Java GitHub search of 'test' in content, filename and file path
Authors: Matej Madeja, orcid:0000-0002-8197-
Contributors: Jaroslav Porubän
Publisher Information: Zenodo
Publication Year: 2021
Collection: Zenodo
Subject Terms: GitHub analysis, "test" occurrence, improving program comprehension
Description: Dataset of large-scale GitHub analysis based on GHTorrent list of repositories from May 2019. Dataset includes only repositories with majority Java language, that are not forks. Each of 4.3M repositories was searched for the word "test" via Github Search API in: all files content java files content all filenames java filenames all file paths java file paths Simultaneously, number of current repository commits and watchers where obtained. The dataset was obtained between 2019-08-20 and 2019-10-01. Dataset is a mysql dump of 1 table, containing the following columns: id - internal table ID project_id - ID of `projects` table of GHTorrent's mirror mysql-2019-05-01 full_name - full name of the project found_test_in_path_java - number of occurrences of "test" in java paths found_test_in_path - number of occurrences of "test" in all paths found_test_in_body_java - number of occurrences of "test" in java files content found_test_in_body - number of occurrences of "test" in all files content found_test_in_filename_java - number of occurrences of "test" in java filenames found_test_in_filename - number of occurrences of "test" in all filenames watchers - number of project's watchers created_at - datetime of data fetching last_commit - datetime of last commit all_commits - all commits, along with the inherited (from other ones) project_commits - only commits of the project, without the inherited ; This work was supported by project VEGA No. 1/0762/19: Interactive pattern- driven language development.
Document Type: text
Language: English
Relation: https://zenodo.org/records/4566198; oai:zenodo.org:4566198; https://doi.org/10.5281/zenodo.4566198
DOI: 10.5281/zenodo.4566198
Availability: https://doi.org/10.5281/zenodo.4566198
https://zenodo.org/records/4566198
Rights: Creative Commons Attribution 4.0 International ; cc-by-4.0 ; https://creativecommons.org/licenses/by/4.0/legalcode
Accession Number: edsbas.A0C64685
Database: BASE
Description
Abstract:Dataset of large-scale GitHub analysis based on GHTorrent list of repositories from May 2019. Dataset includes only repositories with majority Java language, that are not forks. Each of 4.3M repositories was searched for the word "test" via Github Search API in: all files content java files content all filenames java filenames all file paths java file paths Simultaneously, number of current repository commits and watchers where obtained. The dataset was obtained between 2019-08-20 and 2019-10-01. Dataset is a mysql dump of 1 table, containing the following columns: id - internal table ID project_id - ID of `projects` table of GHTorrent's mirror mysql-2019-05-01 full_name - full name of the project found_test_in_path_java - number of occurrences of "test" in java paths found_test_in_path - number of occurrences of "test" in all paths found_test_in_body_java - number of occurrences of "test" in java files content found_test_in_body - number of occurrences of "test" in all files content found_test_in_filename_java - number of occurrences of "test" in java filenames found_test_in_filename - number of occurrences of "test" in all filenames watchers - number of project's watchers created_at - datetime of data fetching last_commit - datetime of last commit all_commits - all commits, along with the inherited (from other ones) project_commits - only commits of the project, without the inherited ; This work was supported by project VEGA No. 1/0762/19: Interactive pattern- driven language development.
DOI:10.5281/zenodo.4566198