Mining tweets of Moroccan users using the framework Hadoop, NLP, K-means and basemap
The information revolution and exactly the explosion of Web 2.0 platforms such as discussion forums, blogs, and social networks allow users to share ideas and opinions, express their feelings and much more. This revolution leads to an accumulation of an enormous amount of data that may contain a lot...
Saved in:
| Published in: | 2017 Intelligent Systems and Computer Vision (ISCV) pp. 1 - 7 |
|---|---|
| Main Authors: | , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
01.04.2017
|
| Subjects: | |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | The information revolution and exactly the explosion of Web 2.0 platforms such as discussion forums, blogs, and social networks allow users to share ideas and opinions, express their feelings and much more. This revolution leads to an accumulation of an enormous amount of data that may contain a lot of valuable information. Much work has focused on analyzing these data, in particular those provided from social networks platforms like Twitter. In this paper, our objective is to propose an approach for analyzing the data generated by Moroccan users in the social network Twitter, in order to discover the subjects that interest Moroccan society and then locate on Moroccan map the areas from where come the tweets related to these topics. Analyzing the tweets of Moroccan users is a real challenge for two main reasons. Firstly, Moroccan users utilize for their communication in Twitter a variety of languages and dialects, such as Standard Arabic, Moroccan Arabic "Darija", Moroccan Amazigh dialect "Tamazight", French, Spanish, and English. Secondly, the Moroccan tweets contain a lot of URLs, #hashtags, spelling mistakes, reduced syntactic structures, and many abbreviations. In this paper, we propose an approach for detecting the relevant subjects related to Moroccan users by extracting the data automatically, and storing it in a distributed file system using HDFS (Hadoop Distributed File System) of Framework Apache Hadoop. Then we preprocess this raw data and analyze it by developing a distributed program using three tools, MapReduce of Framework Apache Hadoop, Python language, and Natural Language Processing (NLP) techniques. Afterward, we convert the corpus generated by the previous step into numeric features, and apply the k-means algorithm to cluster all words into general topics. Finally, we plot tweets on our Moroccan map by using the coordinates extracted from them, in order to have an idea about the geolocation of these subjects. |
|---|---|
| AbstractList | The information revolution and exactly the explosion of Web 2.0 platforms such as discussion forums, blogs, and social networks allow users to share ideas and opinions, express their feelings and much more. This revolution leads to an accumulation of an enormous amount of data that may contain a lot of valuable information. Much work has focused on analyzing these data, in particular those provided from social networks platforms like Twitter. In this paper, our objective is to propose an approach for analyzing the data generated by Moroccan users in the social network Twitter, in order to discover the subjects that interest Moroccan society and then locate on Moroccan map the areas from where come the tweets related to these topics. Analyzing the tweets of Moroccan users is a real challenge for two main reasons. Firstly, Moroccan users utilize for their communication in Twitter a variety of languages and dialects, such as Standard Arabic, Moroccan Arabic "Darija", Moroccan Amazigh dialect "Tamazight", French, Spanish, and English. Secondly, the Moroccan tweets contain a lot of URLs, #hashtags, spelling mistakes, reduced syntactic structures, and many abbreviations. In this paper, we propose an approach for detecting the relevant subjects related to Moroccan users by extracting the data automatically, and storing it in a distributed file system using HDFS (Hadoop Distributed File System) of Framework Apache Hadoop. Then we preprocess this raw data and analyze it by developing a distributed program using three tools, MapReduce of Framework Apache Hadoop, Python language, and Natural Language Processing (NLP) techniques. Afterward, we convert the corpus generated by the previous step into numeric features, and apply the k-means algorithm to cluster all words into general topics. Finally, we plot tweets on our Moroccan map by using the coordinates extracted from them, in order to have an idea about the geolocation of these subjects. |
| Author | Hassouni, Larbi El Abdouli, Abdeljalil Anoun, Houda |
| Author_xml | – sequence: 1 givenname: Abdeljalil surname: El Abdouli fullname: El Abdouli, Abdeljalil email: elabdouli.abdeljalil@gmail.com organization: RITM Lab., Hassan II Univ. of Casablanca, Casablanca, Morocco – sequence: 2 givenname: Larbi surname: Hassouni fullname: Hassouni, Larbi email: lhassouni@hotmail.com organization: RITM Lab., Hassan II Univ. of Casablanca, Casablanca, Morocco – sequence: 3 givenname: Houda surname: Anoun fullname: Anoun, Houda email: houda.anoun@gmail.com organization: RITM Lab., Hassan II Univ. of Casablanca, Casablanca, Morocco |
| BookMark | eNotj9FKwzAUhiPohc69gN7kAdZ6kiZrcjmKumGngsXbcdKeaNEmJZ0M317R3fzfxQcf_BfsNMRAjF0JyIUAe7N5WVWvuQRR5ga0slCesLktjdBgQcFS2nPWbPvQhze-PxDtJx4938YU2xYD_5ooTb_7p9-J-4QDHWL64GvsYhwX_LF-XvCHbCAME8fQcYcTDThesjOPnxPNj5yx5u62qdZZ_XS_qVZ11lvYZ05pIxFAIyrjSvLOyU6Sb8tCCqELiyA1eDRKOLJWYYeKlFFKeNBuKYoZu_7P9kS0G1M_YPreHa8WPxj0TMY |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/ISACV.2017.8054907 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library Online IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9781509040629 1509040625 |
| EndPage | 7 |
| ExternalDocumentID | 8054907 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL CBEJK RIE RIL |
| ID | FETCH-LOGICAL-i90t-b4582a005aa48b7efbb2d2efc73211539a0250fa841be994ada4e48441f05b613 |
| IEDL.DBID | RIE |
| IngestDate | Thu Jun 29 18:37:09 EDT 2023 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i90t-b4582a005aa48b7efbb2d2efc73211539a0250fa841be994ada4e48441f05b613 |
| PageCount | 7 |
| ParticipantIDs | ieee_primary_8054907 |
| PublicationCentury | 2000 |
| PublicationDate | 2017-April |
| PublicationDateYYYYMMDD | 2017-04-01 |
| PublicationDate_xml | – month: 04 year: 2017 text: 2017-April |
| PublicationDecade | 2010 |
| PublicationTitle | 2017 Intelligent Systems and Computer Vision (ISCV) |
| PublicationTitleAbbrev | ISACV |
| PublicationYear | 2017 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| Score | 1.671958 |
| Snippet | The information revolution and exactly the explosion of Web 2.0 platforms such as discussion forums, blogs, and social networks allow users to share ideas and... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Clustering algorithms Distributed databases Distributed program File systems Framework Hadoop HDFS K-means MapReduce Natural language processing Python Language |
| Title | Mining tweets of Moroccan users using the framework Hadoop, NLP, K-means and basemap |
| URI | https://ieeexplore.ieee.org/document/8054907 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEA21ePCk0opalRw8Nu1-ZJvNUYpFsS0Fi_RWJpsJ9tDd0m79_U62HyJ48RaSQGBC8l6S9zKMPSqNMlIohcKeE9KFtObCFAWR1zDuYWAyqLKWDNV4nM5melJj7aMXBhEr8Rl2fLF6y7dFtvVXZd2U-IX21vETpdTOq3XwwQS6-_r-1P_wYi3V2Xf8lTGlAozB-f-GumDNH-cdnxwx5ZLVMG-w6ajK4sC9oqrc8MLxUUF9KCjc3zFsuBevU_MncnfQWnHaUopi1ebj4aTN38QSCZM45JZ73FrCqsmmg-dp_0XssyGIhQ5KYfwDF9CaAZCpUeiMiWyELlMxneGSWINnMw5SGRrUWoIFiTIltuOCxBBoX7F6XuR4zXhiXRypBBFcJhMX6sSGEWSgrSU6Y3s3rOEDMl_t_ruY72Nx-3d1i535mO_ULHesXq63eM9Os69ysVk_VJP0DX8clQ8 |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3Pa8MgGJXSDbbTNtqx3_OwY9PGRGs8jrLS0jQUFkZvReMn66FJaNP9_dP0xxjsspuoIHyi76nv-SH0wgXQgAP1OPSNRw2xa45E4FnySsI--CqTddaSmCdJNJ-LWQN1jl4YAKjFZ9B1xfotXxfZ1l2V9SLLL4Szjp8wSgOyc2sdnDC-6I3fXwcfTq7Fu_uuv3Km1JAxvPjfYJeo_eO9w7MjqlyhBuQtlE7rPA7YaaqqDS4Mnha2jw0LdrcMG-zk67b5E7A5qK2w3VSKouzgJJ518MRbgUUlLHONHXKtZNlG6fAtHYy8fT4Ebyn8ylPuiUvaVSMljRQHo1SgAzAZD-0pjoVCOj5jZESJAiGo1JICjSzfMT5TFravUTMvcrhBmGkTBpwBSJNRZohgmgQyk0JrS2h0_xa1XEAW5e7Hi8U-Fnd_Vz-js1E6jRfxOJnco3MX_5225QE1q_UWHtFp9lUtN-unesK-AW7HmFY |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2017+Intelligent+Systems+and+Computer+Vision+%28ISCV%29&rft.atitle=Mining+tweets+of+Moroccan+users+using+the+framework+Hadoop%2C+NLP%2C+K-means+and+basemap&rft.au=El+Abdouli%2C+Abdeljalil&rft.au=Hassouni%2C+Larbi&rft.au=Anoun%2C+Houda&rft.date=2017-04-01&rft.pub=IEEE&rft.spage=1&rft.epage=7&rft_id=info:doi/10.1109%2FISACV.2017.8054907&rft.externalDocID=8054907 |