Voice over IP or VoIP

What is VoIP or Voice over IP

Voice over Internet Protocol (VoIP), often known as IP telephony, is a method and set of technologies that allows voice conversations and multimedia sessions to be delivered via Internet Protocol (IP) networks such as the Internet. The phrases Internet telephony, broadband telephony, and broadband phone service explicitly relate to the delivery of communications services (voice, fax, SMS, voice-messaging) through the Internet as opposed to the public switched telephone network (PSTN), also known as plain old telephone service (POTS).


Signaling, channel setup, digitization of analog speech signals, and encoding are all procedures and concepts involved in starting VoIP phone conversations, which are comparable to classical digital telephony. Instead of being sent via a circuit-switched network, digital data is packetized and sent as IP packets over a packet-switched network. They transmit media streams utilizing specific media delivery protocols that encode audio and video with audio and video codecs. There are many codecs that optimize the media stream based on application needs and network capacity; some implementations rely on narrowband and compressed speech, while others offer high-fidelity stereo codecs.

The linear predictive coding (LPC) and modified discrete cosine transform (MDCT) compression techniques are the most frequently utilized voice coding standards in VoIP. Popular codecs include the MDCT-based AAC-LD (used in FaceTime), the LPC/MDCT-based Opus (used in WhatsApp), the LPC-based SILK (used in Skype), -law and A-law versions of G.711, G.722, and an open source voice codec known as iLBC, and G.729.

Early voice-over-IP service providers employed business strategies and technological solutions that mimicked the architecture of the conventional telephone network. Second-generation providers, like Skype, established closed networks for private user bases, providing the benefit of free calls and convenience while potentially charging for access to other communication networks, such as the PSTN.This limited users’ ability to mix and match third-party hardware and applications. Federated VoIP was used by third-generation services such as Google Talk. When a user desires to place a call, these solutions generally offer dynamic connectivity between users in any two domains of the Internet.

VoIP is also accessible on many personal computers and other Internet access devices, in addition to VoIP phones. Wi-Fi or the carrier’s mobile data network can be used to send calls and SMS text messages. VoIP provides a foundation for integrating all current communication technologies into a single unified communications system.


In applications such as VoIP phones, mobile apps, and web-based communications, voice over IP has been deployed using proprietary protocols as well as protocols based on open standards.

To implement VoIP communication, a number of functions are required. Some protocols serve several purposes, while others serve only a handful and must be utilized in tandem. Among these functions are:

  • Network and transport – Creating reliable transmission via untrustworthy protocols, which may include acknowledging data reception and retransmitting data that was not received.
  • Session management –Creating and maintaining a session (also known as a “call”), which is a link between two or more peers that serves as a framework for additional communication.
  • Signaling – Registration (advertising one’s presence and contact information) and discovery (locating someone and obtaining their contact information), dialing (including call progress reporting), negotiating capabilities, and call control (such as hold, mute, transfer/forwarding, dialing DTMF keys during a call [e.g. to interact with an automated attendant or IVR], and so on).
  • Media description – Choosing the sort of material to transfer (audio, video, etc. ), how to encode/decode it, and how to send/receive it (IP addresses, ports, etc.).
  • Media – Transferring the call’s real material, such as audio, video, text messages, files, and so on.
  • Quality of service – Out-of-band material or input regarding the media, such as synchronization, statistics, and so on.
  • Security – Using access control, validating the identification of other participants (computers or humans), and encrypting data to safeguard the privacy and integrity of media contents and/or control messages.
H.232 Version 4 supports H.245 over UDP/TCP and Q.931 over UDP/TCP and RAS over UDP. SIP supports TCP and UDP

Protocols used in VoIP include:

  • Session Initiation Protocol (SIP), The Internet Engineering Task Force (IETF) created a connection management protocol.
  • H.323, one of the earliest VoIP call signaling and control protocols to see broad use H.323 deployments are increasingly confined to transporting current long-haul network traffic due to the introduction of newer, less complicated protocols such as MGCP and SIP.
  • Media Gateway Control Protocol (MGCP), Connection administration for media gateways
  • H.248, Control protocol for media gateways in a convergent internetwork comprising classic PSTN and contemporary packet networks.
  • Real-time Transport Protocol (RTP), Real-time audio and video data transfer protocol
  • Real-time Transport Control Protocol (RTCP), RTP’s sibling protocol that provides stream statistics and status information.
  • Secure Real-time Transport Protocol (SRTP), encrypted version of RTP
  • Session Description Protocol (SDP), a syntax for multi-media session start and announcement and WebSocket transports
  • Inter-Asterisk eXchange (IAX), Asterisk PBX instances utilize a protocol to communicate with one another.
  • Extensible Messaging and Presence Protocol (XMPP), instant messaging, presence data, and contact list upkeep
  • Jingle, for XMPP peer-to-peer session control
  • Skype protocol, proprietary Internet telephony protocol suite built on peer-to-peer technology


Consumer market

Mass-market VoIP services make advantage of current broadband Internet connections, allowing customers to place and receive phone calls in the same way they would on the PSTN. Full-service VoIP phone providers offer inbound and outbound service, as well as direct inbound dialing. For a set monthly membership cost, several companies provide unlimited domestic calling and, in some cases, international calling. When a flat-fee service is not available, phone calls between subscribers of the same carrier are generally free.

To connect to a VoIP service provider, you’ll need a VoIP phone. This can be done in a variety of ways:

  • Dedicated VoIP phones connect to the IP network directly using technologies such as wired Ethernet or Wi-Fi. These are generally styled after classic digital business telephones.
  • An analog telephone adapter connects to the network and implements the hardware and programming required to run a traditional analog telephone connected through a modular phone connector. This feature is incorporated into some home Internet gateways and cablemodems.
  • Softphone application software placed on a networked computer with a microphone, speaker, or headset. The program generally displays a dial pad and a display field to the user, allowing the user to operate the application with mouse clicks or keystrokes.

PSTN and mobile network providers

VoIP telephony over dedicated and public IP networks is increasingly being used as a backhaul by telecommunications carriers to link switching centers and interface with other telephony network providers; this is sometimes referred to as IP backhaul.

SIP clients on smartphones may be integrated into the firmware or available as an app download.

Corporate use

Businesses are moving from old copper-wire telephone systems to VoIP systems to decrease monthly phone bills because of the bandwidth efficiency and cheap prices that VoIP technology can deliver. In 2008, VoIP accounted for 80% of all new private branch exchange (PBX) lines installed globally. In the United States, for example, the Social Security Administration is transitioning its 63,000-person field offices from traditional phone installations to a VoIP infrastructure carried over its existing data network.

VoIP enables phone and data communications to be delivered via a single network, lowering infrastructure costs substantially. VoIP extensions are less expensive than PBX and key systems. VoIP switches can be operated on common hardware such as personal computers. These gadgets rely on standard interfaces rather than closed designs. Because VoIP devices offer simple, intuitive user interfaces, users may frequently make minor adjustments to the system setup. Dual-mode phones allow users to continue their conversations while switching between an external cellular service and an internal Wi-Fi network, eliminating the need to carry both a desktop phone and a mobile phone. Mainte

Business VoIP solutions have developed into unified communications systems, which consider all communications—phone calls, faxes, voice mail, e-mail, web conferencing, and more—as distinct units that may be delivered by any method and to any device, including cellphones. There are two types of service providers in this space: one focuses on VoIP for medium to big companies, while the other targets the small-to-medium company (SMB) sector.

Skype, which originally marketed itself as a service among friends, has begun to cater to businesses, providing free-of-charge connections between any users on the Skype network and connecting to and from ordinary PSTN telephones for a charge.

Delivery Mechanisms

In general, the supply of VoIP telephone systems to corporate or individual users may be split into two categories: private or on-premises solutions and externally hosted solutions supplied by third-party providers. On-premises distribution techniques resemble the traditional PBX deployment approach for connecting an office to local PSTN networks.

While numerous use cases continue to exist for private or on-premises VoIP systems, the market as a whole has been steadily migrating toward Cloud or Hosted VoIP solutions. Hosted solutions are also better suited to smaller or personal use VoIP deployments, when a private system may not be feasible.

Hosted VoIP Systems

Hosted or Cloud VoIP solutions entail a service provider or telecommunications carrier hosting the phone system as a software solution within their own infrastructure.

Typically, this will be one or more datacenters with geographic proximity to the system’s end-user(s). The service provider deploys and maintains this infrastructure, which is external to the system’s user.

Endpoints such as VoIP phones or softphone apps (apps that operate on a computer or mobile device) will connect to the VoIP service remotely. These connections are generally made through public internet connections, such as local fixed WAN breakout or mobile carrier service.

Private VoIP Systems

In the case of a private VoIP system, the core telephone system is housed within the end-user organization’s own infrastructure. Typically, the system will be implemented on-premises at a location under the organization’s direct control. This has a number of advantages in terms of QoS management (see below), cost scalability, and protecting the privacy and security of communications traffic. However, the end-user organization is primarily responsible for ensuring that the VoIP system stays performant and robust. With a Hosted VoIP service, this is not the case.

Private VoIP systems can be actual hardware PBX appliances that are converged with other infrastructure, or they might be software applications that are implemented. The last two alternatives will often take the shape of a distinct virtualized appliance. In certain cases, however, these systems are implemented on bare metal infrastructure or IoT devices. Companies can try to combine the benefits of hosted and private on-premises systems with some solutions, such as 3CX, by establishing their own private solution but inside an external environment. Datacenter collocation services, public cloud, and private cloud locations are some examples.

Local endpoints within the same site generally connect directly via the LAN for on-premises systems. The offered connectivity options for distant and external endpoints are similar to those of Hosted or Cloud VoIP systems.

However, VoIP traffic to and from on-premises systems is frequently routed through secure private lines. Personal VPN, site-to-site VPN, private networks such as MPLS and SD-WAN, and private SBCs are some examples (Session Border Controllers). While there are outliers and private peering alternatives, it is unusual for Hosted or Cloud VoIP companies to provide those private connection ways.

Quality of service

Because it lacks a network-based mechanism to ensure that data packets are not lost and are delivered in sequential order, communication on the IP network is viewed as less trustworthy than communication on the circuit-switched public telephone network. It is a best-effort network with no assurances of fundamental quality of service (QoS). All data, including voice, is sent in packets across IP networks with a set maximum capacity. In the presence of congestion, this system may be more prone to data loss[a] than traditional circuit switched systems; a circuit switched system with insufficient capacity will refuse new connections while carrying the remainder without impairment, while the quality of the remaining connections is unaffected.

Network routers handle traffic on a first-come, first-served basis by default. Fixed delays are uncontrollable since they are caused by the physical distance that packets must traverse. Because of the lengthy distance to a geostationary satellite and return, they are especially problematic when satellite circuits are involved; delays of 400–600 ms are usual. Latency can be reduced by designating voice packets as delay-sensitive using QoS approaches such as DiffServ.

Network routers on high-traffic connections may create delay that exceeds VoIP-allowable levels. Excessive demand on a connection can result in congestion, queueing delays, and packet loss. This instructs a transport protocol, such as TCP, to decrease its transmission rate in order to alleviate congestion. However, because retransmission causes too much delay while recovering from congestion, VoIP often utilizes UDP rather than TCP. As a result, QoS algorithms can avoid the undesired loss of VoIP packets by transmitting them immediately ahead of any queued bulk traffic on the same connection, even if the link is overloaded with bulk traffic.

Before sending new data, VoIP endpoints generally have to wait for prior packets to finish transmitting. Although it is possible to preempt (abort) a less important packet in the middle of transmission, this is uncommon, particularly on high-speed lines because transmission durations are short even for maximum-sized packets. On slower connections, such as dialup and digital subscriber line (DSL), decreasing the maximum transmission duration by lowering the maximum transmission unit provides an alternative to preemption. However, because protocol headers must be included in every packet, this raises relative header overhead for every connection traversed.

When IP packets come out of order, the receiver must resequence them and recover gracefully when packets arrive too late or not at all. Changes in queuing delay along a specific network channel caused by competition from other users for the same transmission lines cause packet delay variance. VoIP receivers accommodate this variance by temporarily holding incoming packets in a playout buffer, intentionally increasing delay to increase the likelihood that each packet will be available when the voice engine is ready to play it. The additional delay is therefore a balance between severe latency and extreme dropout, resulting in brief audio disruptions.

Although jitter is a random variable, it is the sum of several other random factors that are at least partly independent of one another: the individual queuing delays of the routers along the Internet line in question. Jitter may be represented as a Gaussian random variable using the central limit theorem. This implies constantly calculating the mean delay and its standard deviation and adjusting the playout delay so that only packets arriving several standard deviations above the mean arrive too late to be helpful. In practice, many Internet routes’ latency variance is dominated by a small number (typically one) of relatively slow and crowded bottleneck links. The majority of Internet backb

A number of protocols have been developed to facilitate the reporting of VoIP call quality of service (QoS) and quality of experience (QoE). RTP Control Protocol (RTCP) extended reports, SIP RTCP summary reports, H.460.9 Annex B (for H.323), H.248.30, and MGCP extensions are among them.

The RTCP’s Extensive Report During a live call, an IP phone or gateway generates a VoIP metrics block that contains information on packet loss rate, packet discard rate (due to jitter), packet loss/discard burst metrics (burst length/density, gap length/density), network delay, end system delay, signal/noise/echo level, mean opinion scores (MOS) and R factors, and configuration information. During a conversation, VoIP metrics reports are exchanged between IP endpoints on an as-needed basis, and an end-of-call message is provided through SIP RTCP summary report or one of the other signaling protocol extensions. The purpose of VoIP metrics reports is to aid in decision-making.


DSL modems generally provide Ethernet connections to local devices, although they might also be Asynchronous Transfer Mode (ATM) modems on the inside. They employ ATM Adaptation Layer 5 (AAL5) to divide each Ethernet packet into a sequence of 53-byte ATM cells for transmission, then reassemble them into Ethernet frames at the receiving end.

Using a distinct virtual circuit identifier (VCI) for audio over IP may minimize latency on shared connections. Because worst-case latency reduces with increasing link speed, ATM’s potential for latency reduction is highest on sluggish networks. At 128 kbit/s, a full-size (1500 byte) Ethernet frame takes 94 ms to transmit, but just 8 ms at 1.5 Mbit/s. If this is the bottleneck connection, the latency is likely to be low enough to provide adequate VoIP performance without the need for MTU reductions or additional ATM VCs. The most recent types of DSL, VDSL and VDSL2, offer Ethernet without the need for intermediary ATM/AAL5 layers, and they typically support IEEE 802.1p priority tagging, allowing VoIP to be prioritized over other traffic.

ATM has significant header overhead: 5/53 = 9.4%, which is almost double the entire header overhead of a 1500 byte Ethernet frame. This “ATM fee” is imposed on all DSL users, regardless of whether they utilize numerous virtual lines – which few do.

Layer 2

Several protocols are employed at the data link layer and physical layer for quality-of-service measures that allow VoIP applications to function properly even when the network is congested. Here are several examples:

  • IEEE 802.11e is an authorized amendment to the IEEE 802.11 standard that provides a set of quality-of-service enhancements for wireless LAN applications via MAC layer changes. The standard is seen as important for time-sensitive applications such as telephony over wireless IP.
  • IEEE 802.1p specifies eight distinct classes of service (including one for voice traffic) for layer-2 wired Ethernet traffic.
  • The ITU-T G.hn standard, which enables the creation of a high-speed (up to one gigabit per second) Local area network (LAN) utilizing existing house wiring (power lines, phone lines and coaxial cables). G.hn delivers QoS using Contention-Free Transmission Opportunities (CFTXOPs), which are assigned to flows (such as a VoIP call) that require QoS and have a contract with the network controllers.

Performance metrics

Several indicators that may be monitored by network components and user agent hardware or software define the quality of speech transmission. Network packet loss, packet jitter, packet latency (delay), post-dial delay, and echo are examples of such measurements. VoIP performance testing and monitoring establish the metrics.

PSTN integration

A VoIP media gateway controller (aka Class 5 Softswitch) collaborates with a media gateway (aka IP Business Gateway) to link the digital media stream, completing the voice and data route. Interfaces for connecting to conventional PSTN networks are included in gateways. Ethernet ports are also incorporated in contemporary systems, which are specifically built to connect VoIP conversations.

E.164 is a worldwide numbering system that applies to both the PSTN and public land mobile networks (PLMN). Most VoIP systems support E.164, which enables calls to and from VoIP subscribers and the PSTN/PLMN to be routed. Other identifying mechanisms can also be utilized with VoIP systems. Skype, for example, allows subscribers to select Skype identities (usernames), but SIP implementations can utilize Uniform Resource Identifiers (URIs), which are comparable to email addresses. VoIP systems frequently use techniques for converting non-E.164 identifiers to E.164 numbers and vice versa, such as Skype’s Skype-In service and IMS and SIP’s E.164 number to URI mapping (ENUM) service.

Echo can also be a problem with PSTN integration. Impedance mismatches in analog circuitry and an acoustic route from the receive to transmit signal at the receiving end are two common sources of echo.

Number portability

Local number portability (LNP) and mobile number portability (MNP) also have an influence on the VoIP industry. Number portability is a service that allows a customer to switch to a different telephone carrier without having to obtain a new phone number. Typically, it is the previous carrier’s obligation to “map” the old number to the unknown number issued by the new carrier. This is accomplished by keeping a database of numbers. When a called number is received by the original carrier, it is instantly redirected to the new carrier. Even if the subscriber returns to the original carrier, several porting references must be kept. The FCC requires carriers to comply with certain consumer-protection regulations.

If a VoIP call is routed to a mobile phone number on a traditional mobile carrier, it will also encounter least-cost routing (LCR) problems to reach its destination. LCR is based on checking the destination of each telephone call as it is made and then routing the call via the network at the lowest cost to the customer. Given the intricacy of call routing generated by number portability, this grade is open to significant dispute. LCR providers may no longer rely on the network root prefix to determine how to route a call now that MNP is in place. Instead, before routing the call, they must now determine the real network of each number.

As a result, while routing a voice call, VoIP systems must additionally manage MNP. In countries where there is no central database, such as the United Kingdom, it may be required to query the mobile network to determine which home network a mobile phone number belongs to. Because LCR solutions are increasing the adoption of VoIP in business markets, VoIP must guarantee a particular level of dependability when processing calls.

Emergency calls

A telephone linked to a land line has a direct linkage between a telephone number and a physical location, which is kept by the telephone company and made available to emergency responders in the form of emergency subscriber lists via national emergency response service centers. When a center receives an emergency call, its location is immediately identified from its databases and shown on the operator interface.

There is no such direct relationship between location and communications end point in IP telephony. Even a provider with wired infrastructure, such as a DSL provider, may only know the device’s approximate location based on the IP address assigned to the network link

IP connectivity enables device mobility. For example, a home broadband connection may be used to connect to a business entity’s virtual private network, in which case the IP address used for customer interactions may belong to the corporation rather than the residential ISP. Off-premises extensions may appear as a component of an upstream IP PBX. On mobile devices, such as a 3G handset or a USB wireless broadband adapter, the IP address has no link with any physical location known to the telephony service provider, because a mobile user might be anywhere in a network-covered zone, even when roaming with another cellular carrier.

At the VoIP level, a phone or gateway can be identified by the credentials it has with a Session Initiation Protocol (SIP) registrar. In such circumstances, the Internet telephony service provider (ITSP) simply knows that a certain user’s equipment is operational. Service providers frequently provide emergency response services in accordance with an agreement with the user who registers a physical location and agrees that if an emergency number is dialed from an IP device, emergency services will be given exclusively to that address.

In the United States, VoIP providers provide such emergency services through a system known as Enhanced 911 (E911), which is based on the Wireless Communications and Public Safety Act of 1999. The VoIP E911 emergency-calling system connects a physical address with the phone number of the calling party. All VoIP providers that provide access to the public switched telephone network must integrate E911,[35] a service that may be paid to the customer. “Customers may not be able to “opt out” of 911 service through VoIP companies.”

A static table lookup underpins the VoIP E911 system. Unlike cellular phones, where the location of an E911 call may be identified via aided GPS or other means, VoIP E911 information is only correct if users, who bear legal responsibility, maintain their emergency address information up to date.

Fax support

Fax over IP refers to the practice of sending faxes via VoIP networks (FoIP). Transmission of fax documents was difficult in early VoIP implementations since most speech digitization and compression codecs are tuned for human voice representation, and correct timing of modem signals cannot be ensured in a packet-based, connection-less network. The T.38 protocol is a standards-based method for successfully transmitting fax-over-IP.

The T.38 protocol is intended to compensate for the discrepancies between traditional packet-less communications over analog lines and packet-based transmissions that serve as the foundation for IP communications. The fax machine can be a conventional device linked to an analog telephone adapter (ATA), or it can be a software program or dedicated network device that operates over an Ethernet interface. T.38 was originally intended to employ UDP or TCP communication techniques over an IP network. Because of the “no recovery rule,” when a UDP packet is lost or an error occurs during transmission, UDP delivers near real-time characteristics.

Some modern high-end fax machines include T.38 capabilities and may be linked directly to a network switch or router. Each packet in T.38 comprises a part of the data stream sent in the preceding packet. To lose data integrity, two consecutive packets must be lost.

Power necessities

Telephones for conventional home analog service are often linked directly to phone company phone lines, which supply direct current to operate the majority of basic analog handsets independently of locally available electrical power.

IP phones and VoIP phone adapters link to routers or cable modems, which are generally powered by mains energy or locally produced power. In the event of a local power outage, some VoIP service providers employ customer premises equipment (e.g., cable modems) with battery-backed power supply to provide uninterrupted operation for many hours. These battery-powered gadgets are generally intended for use with analog phones.

In the case that the customer’s network device is unable to terminate the connection, some VoIP service providers offer services to redirect calls to alternative telephone services of the subscriber, such as a cellular phone.

Even in regions where many consumers purchase contemporary telephone units that work with wireless handsets to a base station or that have other modern phone capabilities, such as built-in voicemail or phone book features, the vulnerability of phone service to power interruptions is a widespread concern.


VoIP telephone systems share the same security risks as other Internet-connected devices. This implies that hackers who are aware of VoIP vulnerabilities can launch denial-of-service attacks, collect consumer data, record conversations, and corrupt voicemail messages. An attacker with a compromised VoIP user account or session credentials may incur significant expenses from third-party services such as long-distance or international calling.

Many VoIP protocols’ technical intricacies make routing VoIP traffic through firewalls and network address translators, which are used to connect to transit networks or the Internet, difficult. To permit VoIP calls to and from secured networks, private session border controllers are frequently used. Other ways to get around NAT systems include using assistive protocols like STUN and Interactive Connectivity Establishment (ICE).

Though many consumer VoIP solutions do not support encryption of the signaling path or the media, securing a VoIP phone is conceptually easier to implement than on traditional telephone circuits. A result of the lack of encryption is that it is relatively easy to eavesdrop on VoIP calls when access to the data network is possible. Free open-source solutions, such as Wireshark, facilitate capturing VoIP conversations.

Secure Real-time Transport Protocol (SRTP) and the ZRTP protocol for analog telephone adapters, as well as several softphones, include standards for safeguarding VoIP. By utilizing opportunistic encryption, IPsec can protect point-to-point VoIP at the transport level.

Voice over secure IP (VoSIP), secure voice over IP (SVoIP), and secure voice over secure IP are all security techniques used by government and military organizations to protect VoIP communications (SVoSIP). The difference is whether encryption is used at the telephone endpoint or in the network. Encrypting the media with protocols such as SRTP and ZRTP can be used to achieve secure voice over secure IP. On a classified network, such as SIPRNet, secure voice over IP employs Type 1 encryption. Public Secure VoIP is also accessible with free GNU software and using libraries such as ZRTP in many popular commercial VoIP applications.

Caller ID

Caller ID support is provided via Voice over IP protocols and equipment that is interoperable with the PSTN. Callers may also create personalized caller ID information with several VoIP service providers.

Hearing aid compatibility

Wireline telephones produced in, imported to, or intended for use with Voice over IP service in the United States on or after February 28, 2020, must fulfill the Federal Communications Commission’s hearing aid compatibility standards.

Operational cost

By sharing network infrastructure between data and voice, VoIP has significantly lowered communication costs. A single broad-band link may carry more than one phone call. Secure calls are made utilizing established protocols such as Secure Real-time Transport Protocol, because most of the capabilities for establishing a secure telephone connection over traditional phone lines, such as digitization and digital transmission, are already in place with VoIP. All that is required is to encrypt and authenticate the current data stream. A virtual PBX, for example, may remove the requirement for humans to welcome and switch incoming calls.


The early development of packet network designs by Paul Baran and other researchers was motivated by a desire for a higher degree of circuit redundancy and network availability in the face of infrastructure failures than was possible in mid-twentieth-century telecommunications circuit-switched networks. Danny Cohen showed packet speech for the first time in 1973 as part of a flight simulator program that ran on the early ARPANET.

With uncompressed pulse-code modulation (PCM) digital speech packets with a bit rate of 64 kbps, far larger than the 2.4 kbps bandwidth of early modems, real-time voice communication was not conceivable on the early ARPANET. This challenge was solved by linear predictive coding (LPC), a voice coding data reduction method introduced in 1966 by Fumitada Itakura of Nagoya University and Shuzo Saito of Nippon Telegraph and Telephone (NTT). LPC was capable of voice compression down to 2.4 kbps, allowing Culler-Harrison Incorporated in Goleta, California, and MIT Lincoln Laboratory in Lexington, Massachusetts, to have the first successful real-time conversation via ARPANET in 1974. Since then, LPC has been the most frequently used speech coding method. Manfred R. Schroeder and Bishnu S. Atal invented code-excited linear prediction (CELP), a kind of LPC method, in 1985. LPC algorithms are still used for audio coding in current VoIP technologies.

Various kinds of packet telephony were developed during the next two decades, and industry interest groups were created to promote the new technology. Following the termination of the ARPANET project and the expansion of the Internet for commercial traffic, IP telephony was tested and deemed infeasible for commercial use until the introduction of VocalChat in the early 1990s, and then the official release of Internet Phone (or iPhone for short) commercial software by VocalTec in February 1995, based on the Audio Transceiver patent by Lior Haramaty and Alon Shamir. Soon after, it became a well-established topic of interest in the major IT companies’ commercial labs. By the late 1990s, the first softswitches were available, and new protocols including H.323, MGCP, and the Session Initiation Protocol (SIP) were gaining popularity. The development of high-bandwidth always-on Internet connections to private homes and companies in the early 2000s generated an industry of Internet telephony service providers (ITSPs). The emergence of open-source telephony software, such as Asterisk PBX, stimulated significant interest and entrepreneurship in voice-over-IP services, which applied new Internet technology paradigms to telephony, such as cloud services.

In 1999, the Siren codec, which is used in the G.722.1 wideband audio coding standard, introduced a modified discrete cosine transform (MDCT) discrete cosine transform (DCT) audio data compression method. The MDCT was adapted the same year into the LD-MDCT speech coding method, which is used for the AAC-LD format and is designed to greatly enhance audio quality in VoIP applications. Since then, MDCT has been widely utilized in VoIP applications such as the G.729.1 wideband codec launched in 2006, Apple’s Facetime (which uses AAC-LD) introduced in 2010, the CELT codec introduced in 2011, the Opus codec announced in 2012, and WhatsApp’s voice calling feature introduced in 2015.


    • 1966: Fumitada Itakura of Nagoya University and Shuzo Saito of Nippon Telegraph and Telephone devised linear predictive coding (LPC) (NTT).
    • 1973: Danny Cohen’s packet voice application.
    • 1974: The article “A Protocol for Packet Network Interconnection” is published by the Institute of Electrical and Electronics Engineers (IEEE).
    • 1974: In August 1974, Network Audio Protocol (NVP) was tried via ARPANET, transmitting barely audible 16 kpbs CVSD encoded voice.
    • 1974: The first successful real-time communication via ARPANET occurred between Culler-Harrison Incorporated in Goleta, California, and MIT Lincoln Laboratory in Lexington, Massachusetts, utilizing 2.4 kpbs LPC.
    • 1977: Danny Cohen and Jon Postel of the USC Information Sciences Institute, along with Vint Cerf of the Defense Advanced Research Projects Agency (DARPA), reach an agreement to split IP from TCP and develop UDP for transporting real-time communication.
    • 1981: RFC 791 is a specification for IPv4.
    • 1985: NSFNET is commissioned by the National Science Foundation.
    • 1985: Manfred R. Schroeder and Bishnu S. Atal created the code-excited linear prediction (CELP) algorithm, which is a kind of LPC algorithm.
    • 1986: Proposals for Voice over ATM from different standards bodies, as well as commercial packet voice systems from firms such as StrataCom
    • 1991: Speak Freely, a Voice-over-IP program, has been made available to the public.
    • 1992: The Frame Relay Forum is in charge of developing standards for Voice over Frame Relay.
    • 1992: InSoft Inc. introduces and releases Communique, a desktop conferencing software that includes VoIP and video. The firm is recognized with creating the first generation of commercial, US-based VoIP, Internet media streaming, and real-time Internet telephony/collaborative software and protocols, which served as the foundation for the Real Time Streaming Protocol (RTSP).
    • 1993: VocalChat, a commercial packet network PC voice communication program from VocalTec, is now available.
    • 1994: MTALK is a freeware Linux LAN VoIP program.
    • 1995: VocalTec releases Internet Phone commercial Internet phone software.
      • Beginning in 1995, Intel, Microsoft and Radvision initiated standardization activities for VoIP communications system.
  • 1996:
      • With the H.323 standard, the ITU-T began the development of standards for the transmission and signaling of voice communications over Internet Protocol networks.
      • Telecommunications firms in the United States have petitioned the United States Congress to prohibit the use of Internet phone technology.
      • The CELP (LPC) algorithm was used to create the G.729 speech codec.
    • 1997: In 1998, Level 3 began developing its first softswitch, a word they created.
  • 1999:
      • RFC 2543, the Session Initiation Protocol (SIP) standard, has been published.
      • Digium’s Mark Spencer creates the first open source private branch exchange (PBX) software (Asterisk).
      • The Siren codec, which is utilized in the G.722.1 wideband audio coding standard, uses a discrete cosine transform (DCT) version known as the modified discrete cosine transform (MDCT).
      • The MDCT is modified into the AAC-LD standard’s LD-MDCT algorithm.
    • 2004: Commercial VoIP service providers abound.
    • 2006: The MDCT and CELP (LPC) algorithms were used to create the G.729.1 wideband codec.
    • 2007: VoIP device makers and dealers are thriving throughout Asia, particularly in the Philippines, which is home to many families of foreign workers.
    • 2009: The SILK codec was introduced, which uses the LPC algorithm and is utilized for voice calling in Skype.
    • 2010: FaceTime, introduced by Apple, employs the LD-MDCT-based AAC-LD codec.
  • 2011:
    • The emergence of WebRTC technology, which enables VoIP directly in browsers.
    • The MDCT algorithm was used to create the CELT codec.
  • 2012: Opus codec is introduced, which employs MDCT and LPC algorithms.

Roosho is a Telecommunication engineer with more than 10 years of experience in VoIP and Unified Communications. His expertise has helped him complete more than 100 projects for Feds, Public Universities, Large Group of Companies in his 10 years of experience, and he is still growing with the industry. He loves to share his ideas about his experience and expertise with the world. That’s why VoIP Bible has made him the lead technical content writer of VoIP Bible.

Exit mobile version