Transcoding unicode characters with AVX‐512 instructions

Intel includes in its recent processors a powerful set of instructions capable of processing 512‐bit registers with a single instruction (AVX‐512). Some of these instructions have no equivalent in earlier instruction sets. We leverage these instructions to efficiently transcode strings between the m...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Software, practice & experience Ročník 53; číslo 12; s. 2430 - 2462
Hlavní autoři: Clausecker, Robert, Lemire, Daniel
Médium: Journal Article
Jazyk:angličtina
Vydáno: Bognor Regis Wiley Subscription Services, Inc 01.12.2023
Témata:
ISSN:0038-0644, 1097-024X
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Intel includes in its recent processors a powerful set of instructions capable of processing 512‐bit registers with a single instruction (AVX‐512). Some of these instructions have no equivalent in earlier instruction sets. We leverage these instructions to efficiently transcode strings between the most common formats: UTF‐8 and UTF‐16. With our novel algorithms, we are often twice as fast as the previous best solutions. For example, we transcode Chinese text from UTF‐8 to UTF‐16 at more than 5 GiB using fewer than 2 CPU instructions per character. To ensure reproducibility, we make our software freely available as an open‐source library. Our library is part of the popular Node.js JavaScript runtime.
AbstractList Intel includes in its recent processors a powerful set of instructions capable of processing 512‐bit registers with a single instruction (AVX‐512). Some of these instructions have no equivalent in earlier instruction sets. We leverage these instructions to efficiently transcode strings between the most common formats: UTF‐8 and UTF‐16. With our novel algorithms, we are often twice as fast as the previous best solutions. For example, we transcode Chinese text from UTF‐8 to UTF‐16 at more than 5 GiB using fewer than 2 CPU instructions per character. To ensure reproducibility, we make our software freely available as an open‐source library. Our library is part of the popular Node.js JavaScript runtime.
Intel includes in its recent processors a powerful set of instructions capable of processing 512‐bit registers with a single instruction (AVX‐512). Some of these instructions have no equivalent in earlier instruction sets. We leverage these instructions to efficiently transcode strings between the most common formats: UTF‐8 and UTF‐16. With our novel algorithms, we are often twice as fast as the previous best solutions. For example, we transcode Chinese text from UTF‐8 to UTF‐16 at more than 5 GiB s−1$$ {\mathrm{s}}^{-1} $$ using fewer than 2 CPU instructions per character. To ensure reproducibility, we make our software freely available as an open‐source library. Our library is part of the popular Node.js JavaScript runtime.
Author Clausecker, Robert
Lemire, Daniel
Author_xml – sequence: 1
  givenname: Robert
  surname: Clausecker
  fullname: Clausecker, Robert
  organization: Zuse Institute Berlin Germany
– sequence: 2
  givenname: Daniel
  surname: Lemire
  fullname: Lemire, Daniel
  organization: DOT‐Lab Research Center Université du Québec (TELUQ) Montréal Canada
BookMark eNotkM1KAzEYRYNUcFoFH2HAjZupX34mk3FXilWh4KZKdyFNE5uiyZhkEHc-gs_okzilru5dHO6FM0YjH7xB6BLDFAOQm9SZKSUcn6ACQ9tUQNh6hAoAKirgjJ2hcUp7AIxrwgt0u4rKJx22zr-WvXdDM6Xeqah0NjGVny7vytnL-vf7p8akdD7l2Ovsgk_n6NSqt2Qu_nOCnhd3q_lDtXy6f5zPlpUmNeRKgG61to0xmBhNeN00RBurLFMUlLB0Y4FT2ChqmW23DJhqad1YTgSuTaPoBF0dd7sYPnqTstyHPvrhUhIhOKaYCj5Q10dKx5BSNFZ20b2r-CUxyIMZOZiRBzP0D_IeWFk
Cites_doi 10.1002/spe.3036
10.17487/rfc2781
10.1145/3524059.3532396
10.17487/rfc3629
10.1145/3297858.3304062
10.1145/1345206.1345222
10.1002/spe.2920
10.1145/3458336.3465293
ContentType Journal Article
Copyright 2023. This article is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2023. This article is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID AAYXX
CITATION
7SC
8FD
F28
FR3
JQ2
L7M
L~C
L~D
DOI 10.1002/spe.3261
DatabaseName CrossRef
Computer and Information Systems Abstracts
Technology Research Database
ANTE: Abstracts in New Technology & Engineering
Engineering Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Engineering Research Database
Advanced Technologies Database with Aerospace
ANTE: Abstracts in New Technology & Engineering
Computer and Information Systems Abstracts Professional
DatabaseTitleList CrossRef
Technology Research Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1097-024X
EndPage 2462
ExternalDocumentID 10_1002_spe_3261
GroupedDBID -~X
.3N
.4S
.DC
.GA
.Y3
05W
0R~
10A
123
1L6
1OB
1OC
31~
33P
3EH
3R3
3SF
3WU
4.4
4ZD
50Y
50Z
51W
51X
52M
52N
52O
52P
52S
52T
52U
52W
52X
5VS
66C
702
7PT
8-0
8-1
8-3
8-4
8-5
85S
8UM
8WZ
930
9M8
A03
A6W
AAESR
AAEVG
AAHQN
AAMMB
AAMNL
AANHP
AANLZ
AAONW
AASGY
AAXRX
AAYCA
AAYXX
AAZKR
ABCQN
ABCUV
ABDPE
ABEFU
ABEML
ABIJN
ABLJU
ABUFD
ACAHQ
ACBWZ
ACCZN
ACFBH
ACGFS
ACIWK
ACNCT
ACPOU
ACRPL
ACSCC
ACXBN
ACXQS
ACYXJ
ADBBV
ADEOM
ADIZJ
ADKYN
ADMGS
ADMLS
ADMXK
ADNMO
ADOZA
ADXAS
ADZMN
AEFGJ
AEIGN
AEIMD
AENEX
AEUYR
AEYWJ
AFBPY
AFFPM
AFGKR
AFWVQ
AFZJQ
AGHNM
AGQPQ
AGXDD
AGYGG
AHBTC
AIDQK
AIDYY
AIQQE
AITYG
AIURR
AJXKR
ALAGY
ALMA_UNASSIGNED_HOLDINGS
ALVPJ
AMBMR
AMYDB
ARCSS
ASPBG
ATUGU
AUFTA
AVWKF
AZBYB
AZFZN
AZVAB
BAFTC
BDRZF
BFHJK
BHBCM
BMNLL
BNHUX
BROTX
BRXPI
BY8
CITATION
CS3
CWDTD
D-E
D-F
D0L
DCZOG
DPXWK
DR2
DRFUL
DRSTM
DU5
EBS
EJD
F00
F01
F04
FEDTE
G-S
G.N
GNP
GODZA
H.T
H.X
HBH
HF~
HGLYW
HHY
HVGLF
HZ~
IX1
J0M
JPC
KQQ
LATKE
LAW
LC2
LC3
LEEKS
LH4
LITHE
LOXES
LP6
LP7
LUTES
LW6
LYRES
M61
MEWTI
MK4
MRFUL
MRSTM
MSFUL
MSSTM
MXFUL
MXSTM
N04
N05
N9A
NF~
NNB
O66
O8X
O9-
OIG
P2P
P2W
P2X
P4D
PALCI
PQQKQ
PZZ
Q.N
Q11
QB0
QRW
R.K
RIWAO
RJQFR
ROL
RX1
RXW
RYL
S10
SAMSI
SUPJJ
TAE
TUS
TWZ
UB1
V2E
W8V
W99
WBKPD
WH7
WIB
WIH
WIK
WOHZO
WQJ
WXSBR
WYISQ
WZISG
XG1
XPP
XV2
YYP
ZCA
ZY4
ZZTAW
~02
~IA
~WT
7SC
8FD
ALUQN
F28
FR3
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c250t-80c9ccf7ee12ec265772cefaf4a30a8f3bf0630ba3f4f9d404a9357f62815e7a3
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001119191700001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0038-0644
IngestDate Fri Jul 25 12:12:29 EDT 2025
Sat Nov 29 04:02:39 EST 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 12
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c250t-80c9ccf7ee12ec265772cefaf4a30a8f3bf0630ba3f4f9d404a9357f62815e7a3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
OpenAccessLink https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/spe.3261
PQID 2886131386
PQPubID 1046349
PageCount 33
ParticipantIDs proquest_journals_2886131386
crossref_primary_10_1002_spe_3261
PublicationCentury 2000
PublicationDate 2023-12-00
20231201
PublicationDateYYYYMMDD 2023-12-01
PublicationDate_xml – month: 12
  year: 2023
  text: 2023-12-00
PublicationDecade 2020
PublicationPlace Bognor Regis
PublicationPlace_xml – name: Bognor Regis
PublicationSubtitle Practice & Experience
PublicationTitle Software, practice & experience
PublicationYear 2023
Publisher Wiley Subscription Services, Inc
Publisher_xml – name: Wiley Subscription Services, Inc
References e_1_2_13_13_1
e_1_2_13_14_1
Intel (e_1_2_13_12_1) 2022
e_1_2_13_15_1
International Business Machines Corporation (e_1_2_13_4_1) 2022
e_1_2_13_16_1
e_1_2_13_10_1
e_1_2_13_20_1
e_1_2_13_11_1
e_1_2_13_8_1
e_1_2_13_7_1
Inoue H (e_1_2_13_9_1) 2008; 1
e_1_2_13_6_1
e_1_2_13_5_1
e_1_2_13_3_1
e_1_2_13_2_1
e_1_2_13_17_1
e_1_2_13_18_1
e_1_2_13_19_1
References_xml – ident: e_1_2_13_5_1
  doi: 10.1002/spe.3036
– ident: e_1_2_13_3_1
  doi: 10.1002/spe.3036
– ident: e_1_2_13_6_1
  doi: 10.17487/rfc2781
– ident: e_1_2_13_10_1
– ident: e_1_2_13_17_1
– ident: e_1_2_13_13_1
– ident: e_1_2_13_19_1
– volume-title: z/Architecture Principles of Operation
  year: 2022
  ident: e_1_2_13_4_1
– ident: e_1_2_13_16_1
  doi: 10.1145/3524059.3532396
– ident: e_1_2_13_7_1
  doi: 10.17487/rfc3629
– ident: e_1_2_13_11_1
– volume-title: 64 and IA‐32 Architectures Software Developer's Manual
  year: 2022
  ident: e_1_2_13_12_1
– ident: e_1_2_13_14_1
  doi: 10.1145/3297858.3304062
– ident: e_1_2_13_15_1
– volume: 1
  start-page: 1
  year: 2008
  ident: e_1_2_13_9_1
  article-title: Accelerating UTF‐8 decoding using SIMD instructions (in Japanese)
  publication-title: Inform Process Soc Jpn Trans Program
– ident: e_1_2_13_8_1
  doi: 10.1145/1345206.1345222
– ident: e_1_2_13_20_1
– ident: e_1_2_13_2_1
  doi: 10.1002/spe.2920
– ident: e_1_2_13_18_1
  doi: 10.1145/3458336.3465293
SSID ssj0011526
Score 2.3570757
Snippet Intel includes in its recent processors a powerful set of instructions capable of processing 512‐bit registers with a single instruction (AVX‐512). Some of...
SourceID proquest
crossref
SourceType Aggregation Database
Index Database
StartPage 2430
SubjectTerms Algorithms
Libraries
Title Transcoding unicode characters with AVX‐512 instructions
URI https://www.proquest.com/docview/2886131386
Volume 53
WOSCitedRecordID wos001119191700001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVWIB
  databaseName: Wiley Online Library Full Collection 2020
  customDbUrl:
  eissn: 1097-024X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0011526
  issn: 0038-0644
  databaseCode: DRFUL
  dateStart: 19960101
  isFulltext: true
  titleUrlDefault: https://onlinelibrary.wiley.com
  providerName: Wiley-Blackwell
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3LToQwFG10dOHGt_ExGkzcolAKFHcTdeLCGKOjmR0ppU1cyExkfCz9BL_RL_GWtoDjZly4IdAECD3l3tv23HMROoLmnBORuVWtVJL58M8RKV1BiJo-cBJWYjoPV_H1NR0OkxtDay6rcgJxUdD392T8r1BDG4CtUmf_AHf9UGiAcwAdjgA7HGcDXnkfPqqSVVTqxygXKr1XyzKbZLbew7BmOYAvVox0qyNbtsPVOzDSb0wvU9t8qmqwiFohuUX0eSmFJWlownbD9nkyS-A6o7291ICDFm3DmE-wjhDD6BUAoS2mp-RcsaZZWpOq9X_t0MFtA0nMNowwl9oW_zLkWhi2HItjiC_9xlnZDfopH1YzC7UKM07hzlTdOY8WcBwmtIMWzm_791f1DhPELZGW7NSfZIWJPXxi3_ozVPnpqavwY7CKls28welpvNfQnCjW0YqtyeEYE72BTlvwOwZ-p4HfUfA7AP_XxycA77SB30T3_YvB2aVr6mO4HALXCQQXPOFcxkL4WHAchTBT4kIySVjgMSqDTCpFtYwFksgkJx5hSRDGMsLUD0XMgi3UKUaF2EZOmJAoB08X-llMZCYzVbcsYiyMg8zLKdlBh7Yr0rGWQUmnu3oHdW0fpeZ_KFNMKQSMfkCj3RkesYeWmnHXRR3oAbGPFvnr5LF8PjAQfgNKolug
linkProvider Wiley-Blackwell
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Transcoding+unicode+characters+with+AVX%E2%80%90512+instructions&rft.jtitle=Software%2C+practice+%26+experience&rft.au=Clausecker%2C+Robert&rft.au=Lemire%2C+Daniel&rft.date=2023-12-01&rft.issn=0038-0644&rft.eissn=1097-024X&rft.volume=53&rft.issue=12&rft.spage=2430&rft.epage=2462&rft_id=info:doi/10.1002%2Fspe.3261&rft.externalDBID=n%2Fa&rft.externalDocID=10_1002_spe_3261
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0038-0644&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0038-0644&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0038-0644&client=summon