TCP 音频流媒体和播放器（IP 语音聊天）

Banjoo

4.93/5 (92投票s)

2012年10月23日

CPOL

5分钟阅读

406395

37513

流式传输 TCP 音频数据（IP 语音聊天）。

引言

这是一个专有的 VoIP 项目，通过 TCP 发送和接收音频数据。它是我的第一篇文章的扩展播放或捕获音频声音。作为组播（RTP）发送和接收。此应用程序通过 TCP 流式传输音频数据，而不是通过组播。因此，您可以确信没有数据丢失，并且可以传输到子网和路由器之外。音频编解码器为 U-Law。采样率可在 5000 到 44100 之间选择。

服务器可以在您的本地 PC 上运行。您可以通过运行cmd.exe并输入“ipconfig”来获取当前的 IPv4 地址。您应该使用静态 IP 地址，以便可能的客户端在重新连接几天后不必更改其设置。客户端必须连接到正在运行的服务器上配置的 IPv4 地址和端口。服务器可以以静默模式运行（无输入，无输出），仅在所有连接的客户端之间传输音频数据。

选择一个未被其他应用程序使用的免费端口（不要使用 80 或其他保留端口）。您可以在局域网或 Internet 中连接。要进行 Internet 聊天，您可以在路由器上配置端口转发。

注意！！！这是一个专有项目。您不能将我的服务器或客户端与任何其他标准化服务器或客户端一起使用。我不使用 RTCP 或 SDP 等标准。

背景

由于网络流量和时间同步的差异，您必须使用抖动缓冲区来补偿数据传输。您可以为每个服务器设置抖动缓冲区，这样所有客户端都将使用相同的数量。一个抖动缓冲区代表一个数据包，包含在 TCP 流中。当抖动缓冲区达到最大值的一半时，服务器开始播放。您可以在每个客户端显示的progressbar中看到这一点。设置的抖动缓冲区越多，延迟就越大。您可以将TCPStreamer作为客户端或服务器运行。一个服务器可以处理一个或多个客户端。

TCPStreamer 作为客户端

作为客户端运行时，您可以连接到服务器实例。选择您的麦克风和收听设备。单击麦克风或扬声器按钮进行静音。连接客户端后，扬声器组合框会更改为显示传入数据值的进度条。使用的SamplesPerSecond（质量）取决于服务器配置。客户端的抖动缓冲区仅对传入数据的延迟很重要。

TCPStreamer 作为服务器

作为服务器运行时，您可以等待一个或多个客户端。如果需要，选择您的麦克风和收听设备，但您可以在不收听或不说话的情况下运行服务器，这样客户端之间就可以相互交谈。每个客户端都可以单独静音（扬声器和麦克风）。IP 地址必须是您计算机的地址。端口号不应被其他应用程序使用。服务器端的抖动缓冲区值对于所有连接客户端的延迟很重要。使用尽可能低的值。服务器必须混合所有客户端的数据，因此您应该选择一个性能良好的工作站来运行服务器。语音质量取决于SamplesPerSecond值。

Using the Code

有以下程序集

TCPStreamer.exe（主应用程序）
TCPClient.dll（TCP 客户端包装器助手）
TCPServer.dll（TCP 服务器包装器助手）
WinSound.dll（声音录制和播放）

我可以直接从声卡发送数据到网络。但我决定先将它们放入抖动缓冲区，因为有些声卡（尤其是笔记本电脑上的）无法以相等的周期获取声音数据。使用抖动缓冲区，我确保每 20 毫秒发送一次数据。但缺点是延迟更大。这个不可配置的抖动缓冲区的大小是8，为了降低延迟，您可以在源代码（RecordingJitterBufferCount）中减小该值。但请观察您的声卡几分钟，看看它是否能处理。

//------------------------------------------------------------------------------------------
//
//Recording datas from Soundcard and put into Jitter Buffer
//
//------------------------------------------------------------------------------------------
private void OnDataReceivedFromSoundcard_Server(Byte[] data)
{
    //Split datas in smaller equal pieces
    int bytesPerInterval = WinSound.Utils.GetBytesPerInterval((uint)
      m_Config.SamplesPerSecondServer, 
            m_Config.BitsPerSampleServer, m_Config.ChannelsServer);
    int count = data.Length / bytesPerInterval;
    int currentPos = 0;
    for (int i = 0; i < count; i++)
    {
        //Cast to RTP packet (Linear to U-Law)
        Byte[] partBytes = new Byte[bytesPerInterval];
        Array.Copy(data, currentPos, partBytes, 0, bytesPerInterval);
        currentPos += bytesPerInterval;
        WinSound.RTPPacket rtp = ToRTPPacket(partBytes, 
           m_Config.BitsPerSampleServer, m_Config.ChannelsServer);

        //Put RTP packet into Jitter Buffer
        m_JitterBufferServerRecording.AddData(rtp);
    }
}

创建 RTP 数据包时，CSRC 计数或版本等大多数信息都相同。每发送一个 RTP 数据包后，我只需增加SequenceNumber和Timestamp。在此之前，我将线性数据转换为压缩的 U-Law 格式，以避免网络流量。

//------------------------------------------------------------------------------------------
//
//Creating a RTP packet from linear data
//
//------------------------------------------------------------------------------------------
private WinSound.RTPPacket ToRTPPacket(Byte[] linearData, int bitsPerSample, int channels)
{
    //Convert linear to Mulaw
    Byte[] mulaws = WinSound.Utils.LinearToMulaw(linearData, bitsPerSample, channels);

    //Create new RTP Packet
    WinSound.RTPPacket rtp = new WinSound.RTPPacket();

    //Init base values
    rtp.Data = mulaws;
    rtp.CSRCCount = m_CSRCCount;
    rtp.Extension = m_Extension;
    rtp.HeaderLength = WinSound.RTPPacket.MinHeaderLength;
    rtp.Marker = m_Marker;
    rtp.Padding = m_Padding;
    rtp.PayloadType = m_PayloadType;
    rtp.Version = m_Version;
    rtp.SourceId = m_SourceId;

    //Update RTP header with SequenceNumber and Timestamp
    try
    {
        rtp.SequenceNumber = Convert.ToUInt16(m_SequenceNumber);
        m_SequenceNumber++;
    }
    catch (Exception)
    {
        m_SequenceNumber = 0;
    }
    try
    {
        rtp.Timestamp = Convert.ToUInt32(m_TimeStamp);
        m_TimeStamp += mulaws.Length;
    }
    catch (Exception)
    {
        m_TimeStamp = 0;
    }

    //Ready
    return rtp;
}  
//------------------------------------------------------------------------------------------
//
//Send sound datas (U-Law) over network
//
//------------------------------------------------------------------------------------------
private void OnJitterBufferServerDataAvailable(Object sender, WinSound.RTPPacket rtp)
{
  //Convert RTP packt to bytes
  Byte[] rtpBytes = rtp.ToBytes();

  //For all clients connected
  List<NF.ServerThread> list = new List<NF.ServerThread>(m_Server.Clients);
  foreach (NF.ServerThread client in list)
  {
      //If not mute
      if (client.IsMute == false)
      {
        //Send
        client.Send(m_PrototolClient.ToBytes(rtpBytes));
      }
  }             
}

为了通过 TCP 发送和接收数据，我使用了一个简单的专有协议。在每个数据块之前，我写入一个 32 位数据长度信息。这样，稍后当我接收数据流时，我就知道如何解释数据。

//------------------------------------------------------------------------------------------
//
//Convert bytes to a proprietary protocol format
//
//------------------------------------------------------------------------------------------
public Byte[] ToBytes(Byte[] data)
{
   //Get length of the data block
   Byte[] bytesLength = BitConverter.GetBytes(data.Length);

   //Copy all together
   Byte[] allBytes = new Byte[bytesLength.Length + data.Length];
   Array.Copy(bytesLength, allBytes, bytesLength.Length);
   Array.Copy(data, 0, allBytes, bytesLength.Length, data.Length);

   //ready
   return allBytes;
}

反向路径是获取网络数据，在这种情况下，为每个连接的客户端获取数据。在第一步中，我必须借助我自己的协议从整个流中提取数据包。

//------------------------------------------------------------------------------------------
//
//Get datas over network
//
//------------------------------------------------------------------------------------------
private void OnServerDataReceived(NF.ServerThread st, Byte[] data)
{
    //If client existing
    if (m_DictionaryServerDatas.ContainsKey(st))
    {
        //Get protocol 
        ServerThreadData stData = m_DictionaryServerDatas[st];
        if (stData.Protocol != null)
        {
           //Dispatch data over protocol    
           stData.Protocol.Receive_LH(st, data);
        }
    }
}

借助长度信息，我知道数据包何时开始和结束。

//------------------------------------------------------------------------------------------
//
//Get RTP datas with help of a proprietary protocol 
//
//------------------------------------------------------------------------------------------
public void Receive_LH(Object sender, Byte[] data)
{
   //Add datas to buffer
   m_DataBuffer.AddRange(data);

   //Check buffer overflow
   if (m_DataBuffer.Count > m_MaxBufferLength)
   {
       m_DataBuffer.Clear();
   }

   //Get the length of received datas (16 Bit value)
   Byte[] bytes = m_DataBuffer.Take(4).ToArray();
   int length = (int)BitConverter.ToInt32(bytes.ToArray(), 0);

   //Check maximum length
   if (length > m_MaxBufferLength)
   {
       m_DataBuffer.Clear();
   }

   //For each complete data packet (check by the length)
   while (m_DataBuffer.Count >= length + 4)
   {
       //Get data
       Byte[] message = m_DataBuffer.Skip(4).Take(length).ToArray();

       //Raise event
       if (DataComplete != null)
       {
           DataComplete(sender, message);
       }
       //Remove handled datas from buffer
       m_DataBuffer.RemoveRange(0, length + 4);

       //As long as complete datas are available
       if (m_DataBuffer.Count > 4)
       {
           //Get next length
           bytes = m_DataBuffer.Take(4).ToArray();
           length = (int)BitConverter.ToInt32(bytes.ToArray(), 0);
       }
   } 
}

在将数据播放到声卡之前，我将其放入另一个抖动缓冲区。由于网络流量不规则，尤其是在互联网上，这是必要的。抖动缓冲区的大小越大，延迟就越大。

//------------------------------------------------------------------------------------------
//
//Put network datas into Jitter Buffer
//
//------------------------------------------------------------------------------------------
private void OnProtocolDataComplete(Object sender, Byte[] bytes)
{
   //Convert bytes to RTP packet
   WinSound.RTPPacket rtp = new WinSound.RTPPacket(bytes);
  
   //When RTP packet correct
   if (rtp.Data != null)
   {
     //Add RTP packet to Jitter Buffer    
     JitterBuffer.AddData(rtp); 
   }          
}

最后，数据已准备好播放到声卡。在此之前，我将 U-Law 数据转换回线性数据，因为声卡只能播放线性数据。

//------------------------------------------------------------------------------------------
//
//Play datas to soundcard
//
//------------------------------------------------------------------------------------------
private void OnJitterBufferDataAvailable(Object sender, WinSound.RTPPacket rtp)
{
   //If not muted
   if (IsMuteAll == false && IsMute == false)
   {
       //Convert U-Law to linear
       Byte[] linearBytes = WinSound.Utils.MuLawToLinear(rtp.Data, BitsPerSample, Channels);
       //Play to soundcard
       Player.PlayData(linearBytes, false);
   }
}

我实现了自己的抖动缓冲区，作为一个 RTP 数据包队列。数据可以添加，然后由高频率的计时器函数（20 毫秒）处理。

//------------------------------------------------------------------------------------------
//
// Adding datas to Jitter Buffer
//
//------------------------------------------------------------------------------------------
public void AddData(RTPPacket packet)
{
    //Check overflow
    if (m_Overflow == false)
    {
        //Check maximum size
        if (m_Buffer.Count <= m_MaxRTPPackets)
        {
        //Adding data
            m_Buffer.Enqueue(packet);
        }
        else
        {
            //Overflow
            m_Overflow = true;
        }
    }
}

抖动缓冲区每 20 毫秒处理一次数据。要获得如此精确的计时器，您无法使用常规的 .NET 计时器。所以我使用了 Win32 kernel32 和 Winmm 库中的计时器函数。在启动计时器之前，我将精度设置为系统能提供的最佳值。这可能在 1 到几毫秒之间。使用 Windows 无法实现比 1 毫秒更好的精度。

[DllImport("Kernel32.dll", EntryPoint = "QueryPerformanceCounter")]
public static extern bool QueryPerformanceCounter(out long lpPerformanceCount);

[DllImport("Kernel32.dll", EntryPoint = "QueryPerformanceFrequency")]
public static extern bool QueryPerformanceFrequency(out long lpFrequency);

[DllImport("winmm.dll", SetLastError = true, EntryPoint = "timeSetEvent")]
public static extern UInt32 TimeSetEvent(UInt32 msDelay, UInt32 msResolution, 
       TimerEventHandler handler, ref UInt32 userCtx, UInt32 eventType);

[DllImport("winmm.dll", SetLastError = true, EntryPoint = "timeKillEvent")]
public static extern UInt32 TimeKillEvent(UInt32 timerId);

[DllImport("kernel32.dll", EntryPoint = "CreateTimerQueue")]
public static extern IntPtr CreateTimerQueue();

[DllImport("kernel32.dll", EntryPoint = "DeleteTimerQueue")]
public static extern bool DeleteTimerQueue(IntPtr TimerQueue);

[DllImport("kernel32.dll", EntryPoint = "CreateTimerQueueTimer")]
public static extern bool CreateTimerQueueTimer(out IntPtr phNewTimer, IntPtr TimerQueue, 
  DelegateTimerProc Callback, IntPtr Parameter, uint DueTime, uint Period, uint Flags);

[DllImport("kernel32.dll")]
public static extern bool DeleteTimerQueueTimer(IntPtr TimerQueue, 
              IntPtr Timer, IntPtr CompletionEvent);

[DllImport("winmm.dll", SetLastError = true, EntryPoint = "timeGetDevCaps")]
public static extern MMRESULT TimeGetDevCaps(ref TimeCaps timeCaps, UInt32 sizeTimeCaps);

[DllImport("winmm.dll", SetLastError = true, EntryPoint = "timeBeginPeriod")]
public static extern MMRESULT TimeBeginPeriod(UInt32 uPeriod);

[DllImport("winmm.dll", SetLastError = true, EntryPoint = "timeEndPeriod")]
public static extern MMRESULT TimeEndPeriod(UInt32 uPeriod);

抖动缓冲区设计用于在达到最大值的一半时处理数据。在溢出或欠载后，缓冲区会尝试恢复到该值。

//------------------------------------------------------------------------------------------
//
// Jitter Buffer Timer main function
//
//------------------------------------------------------------------------------------------
private void OnTimerTick()
{
  if (DataAvailable != null)
  {
      //When datas existing
      if (m_Buffer.Count > 0)
      {
          //When overflow
          if (m_Overflow)
          {
              //Wait until buffer is half of maximum
              if (m_Buffer.Count <= m_MaxRTPPackets / 2)
              {
                  m_Overflow = false;
              }
          }

          //When underflow
          if (m_Underflow)
          {
              //Wait until buffer is half of maximum
              if (m_Buffer.Count < m_MaxRTPPackets / 2)
              {
                  return;
              }
              else
              {
                  m_Underflow = false;
              }
          }

          //Get data and raise event
          m_LastRTPPacket = m_Buffer.Dequeue();
          DataAvailable(m_Sender, m_LastRTPPacket);
      }
      else
      {
          //No overflow
          m_Overflow = false;

          //Whenn buffer is empty
          if (m_LastRTPPacket != null && m_Underflow == false)
          {
              if (m_LastRTPPacket.Data != null)
              {
                  //Underflow 
                  m_Underflow = true;
              }
          }
      }
  } 
}

此项目不使用臃肿的库或扩展，因此可用于学习操作声音数据和网络操作的基础知识。请随时根据您的需要进行扩展和改进。

历史

2012 年 5 月 31 日 - 添加
2013 年 5 月 3 日 - 添加了双工连接。移除了文件播放器
2013 年 5 月 9 日 - 将提示更改为文章
2013 年 12 月 12 日 - 添加了所有客户端之间的通信
2013 年 12 月 18 日 - 修复了一些错误
2014 年 4 月 22 日 - 解决了潜在的稳定性问题