Creation of datasets from open sources

Machine learning is one of the fastest growing spheres in IT, but it still has some fundamental problems. Before training a neural network, it's necessary to collect a vast dataset of marked entries. However, manual collection of information takes a lot of time and resources. That is why one of...

Full description

Saved in:
Bibliographic Details
Published in:2018 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus) pp. 295 - 297
Main Authors: Chugunkov, Ilya V., Kabak, Dmitry V., Vyunnikov, Viktor N., Aslanov, Roman E.
Format: Conference Proceeding
Language:English
Published: IEEE 01.01.2018
Subjects:
ISBN:9781538643396, 1538643391
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Machine learning is one of the fastest growing spheres in IT, but it still has some fundamental problems. Before training a neural network, it's necessary to collect a vast dataset of marked entries. However, manual collection of information takes a lot of time and resources. That is why one of the hardest problems to solve in deep learning is the problem of getting the right data with the proper tags. This paper aims at methods that allow to automatically create or update the marked dataset for building a car model classifier by the parser of known Internet sources, which uses a simple classifier to delete incorrect data. The main goal of this article is to prove that public sources can be used to collect the correctly selected and marked data.
ISBN:9781538643396
1538643391
DOI:10.1109/EIConRus.2018.8317091