Local Mail Docs
  • Overview
  • Reference
    • Common Error Codes
  • ℹ️Localmail Insights - Lookup
    • Email Lookup Service
    • Phone Number Lookup Service
    • IP Lookup Service
  • Messaging
    • Learn more about SMS
      • SMS Character Limit
      • UCS-2
      • GSM-7
      • Alphanumeric Sender ID prefixes for SMS
      • support for International Alphanumeric Sender ID
    • SMPP Specification
      • Submitting Messages Through SMPP
      • TLV Configuration for SMPP DLT Implementation
      • SMPP error codes
  • Voice
    • Porting Number Into Local Mail
  • Regulations
    • DLT Guide - India
      • DLT Template Registration Process
      • Content template creation guidelines
      • Consent and Content Templates Registration
      • Letter of Authorization (LOA) for DLT?
      • Sending messages on behalf of another company?
      • Frequenty Asked Questions
  • Legal
    • Vulnerability Reporting
    • Terms of Service
    • Privacy Policy
Powered by GitBook
On this page
  • The Basics of UCS-2 Encoding and SMS Messages
  • How Localmail Encodes Your Messages
  • How Do I Check My Message Can Be Encoded in GSM-7?
  • How Can I Avoid My Messages Being Split When I Expect Them to be in GSM-7?
  • Why is UCS-2 Used on the GSM Networks when GSM-7 is the Default Alphabet?
  • Is UCS-2 Encoding Obsolete?
  • Need More Help?

Was this helpful?

  1. Messaging
  2. Learn more about SMS

UCS-2

PreviousSMS Character LimitNextGSM-7

Last updated 3 years ago

Was this helpful?

UCS-2 is a character encoding standard in which characters are represented by a fixed-length 16 bits (2 bytes). It is used as a fallback on many GSM networks when a message cannot be encoded using or when a language requires more than 128 characters to be rendered.

The Basics of UCS-2 Encoding and SMS Messages

UCS-2 and the other UCS standards are defined by the International Organization for Standardization (ISO) in . UCS-2 represents a possible maximum of 65,536 characters, or in from 0000h - FFFFh (2 bytes). The characters in UCS-2 to the in Unicode.

Character is an overloaded term, so it is actually more correct to refer to . Code points allow abstraction from the character term, and are the atomic unit of storage of information in an encoding.

UCS-2 is a fixed-width encoding; each encoded code point will take exactly 2 bytes. As a SMS message is transmitted in 140 octets, a message which is encoded in UCS-2 has a maximum of 70 characters (really, code points): (140*8) / (2*8) = 70.

How Localmail Encodes Your Messages

When sending SMS messages with localmail, we'll automatically send messages in the most compact encoding possible. If you include any non characters in your message body, we will automatically fall back to UCS-2 encoding (which will limit message bodies to 70 characters each). Additionally, localmail prepends a of 6 Bytes (this instructs the receiving device on how to assemble messages), leaving 153 GSM-7 characters or 67 UCS-2 characters for your message.

Note that this may cause more messages to be sent than you expect - a body with 152 GSM-7-compatible characters and a single unicode character will be split into 3 messages when encoded in UCS-2. This will incur charges for 3 outgoing messages against your account.

How Do I Check My Message Can Be Encoded in GSM-7?

contains an interactive tool which can check if encoding your message in GSM-7 is possible, or if UCS-2 is needed.

How Can I Avoid My Messages Being Split When I Expect Them to be in GSM-7?

Unfortunately, GSM-7 is not a supported character encoding in many text editors. Even setting encoding to ASCII (or US_ASCII) will not guarantee that text you write will be limited to GSM-7. You can use the above linked tool to quickly check the number of segments - that is, total messages - some text will be divided into.

If you are writing in an editor with Unicode support you'll need to be particularly careful. Text editors designed for writing might automatically add angled smart quotes, non-standard spaces, or punctuation which looks similar to GSM-7 but is a different Unicode character.

Why is UCS-2 Used on the GSM Networks when GSM-7 is the Default Alphabet?

In some languages, more than 128 symbols are commonly used, so a larger universe of potential characters is needed. UCS-2 has been implemented in many GSM networks and on many mobile devices, and is considered the defacto standard fallback.

Is UCS-2 Encoding Obsolete?

UTF-16 is the successor to UCS-2. and has the ability to address Base and 16 Supplementary planes, for a total maximum number of characters of 10FFFFh, or 1,114,112 code points.

Need More Help?

If you need any further Help During your Integration or questions related to technical know-how, you can reach out to our support team at support@localmail.io

By the Unicode standard, UCS-2 is an obsolete encoding because it wasn't designed to allow characters in the so-called . Plane 0, the Basic Multilingual Plane, contains character encodings for what are believed to be the most commonly used characters in modern languages. UCS-2 is limited to FFFFh , or 65,536 possible characters.

GSM-7
ISO 10646
hexadecimals
are synchronized
Basic Multilingual Plane
code points
GSM-7
User Data Header
This page
supplementary or 'astral' planes in Unicode
code points