下面将展示如何使用Java解析音频文件的ID3v2标签信息
1. 概述
由于ID3v1
只能存储固定128字节的歌曲信息,所以存储的信息有限,如果想要插入其他的信息,比如封面等,这时候就可以使用ID3v2
了,ID3v2
目前最常用的是ID3v2.3
版本,这里也是针对这个版本进行介绍。
2. 位置
ID3v2.3
标签位于音频文件的开头,相比于ID3v1
,存储的内容更多,可以存储图片等,同时支持用户自定义帧。
3. 结构
ID3v2的结构如下所示
其中,帧(frame)可以分为帧头(frame header)和帧内容(frame content)。
Header
Header的结构如下
ID3v2/file identifier 3 "ID3"
ID3v2 version 2 $03 00
ID3v2 flags 1 %abc00000
ID3v2 size 4 4 * %0xxxxxxx
Header一共有10bytes
,其中,
-
前3
bytes
始终为ID3
,表明这是一个ID3v2
标签 -
第4和第5
byte
标识了ID3的版本(主版本和修订版本),在这里版本就是3.0(03表示3,00表示0),由于这是ID3v2
才有的结构,所以版本全称就是ID3v2.3.0
-
第6
byte
为标志位,只使用高三位,其他位始终为0,这里需要注意的是b,也就是位6- 当b为0时,表明不存在扩展头,当b为1时,存在扩展头
-
第7到10
byte
为大小,规定了这个ID3v2
标签除了Header外的总长度(包括扩展头),每一个byte第7位始终为0,因此size的计算方式如下-
0xxxxxxx 0xxxxxxx 0xxxxxxx 0xxxxxxx => xxxxxxx xxxxxxx xxxxxxx xxxxxxx 共28位 4x7 用buffer数组表示Header,注意数组索引从0开始 size = (buffer[6] & 0x7f) << 21 | (buffer[7] & 0x7f) << 14 | (buffer[8] & 0x7f) << 7 | (buffer[9] & 0x7f); 或 size = ((buffer[6] & 0x7f) << 21) + ((buffer[7] & 0x7f) << 14) + ((buffer[8] & 0x7f) << 7) + (buffer[9] & 0x7f); 注意0x7f = 111 1111 这里用&主要是为了解决负数的问题,因为在Java中,byte可能为负,这时候为了将其表示为0~255范围内的数值,就需要&0x7f
- 这里需要注意的一点是,size只是大致范围,Header以及Extended Header、Frame的size总和
<=
size,多余的部分用0补齐
-
一个ID3v2
标签符合下面的格式
49 44 33 yy yy xx zz zz zz zz
由于Extended Header大多数情况下都没有,所以这里就不在赘述。
Frame
Frame包括Frame Header和Frame Content,其中Frame Content的大小至少为1byte,否则就不是有效的Frame
Frame Header
大小为10bytes
结构
Frame ID $xx xx xx xx (four characters)
Size $xx xx xx xx
Flags $xx xx
关于Frame ID,在id3v2.3.0 - ID3.org可找到详细的定义。
Size计算方式
xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx
=>
xxxxxxx xxxxxxx xxxxxxx xxxxxxx 共32位 4x8
用buffer数组表示Header,注意数组索引从0开始
size = (buffer[4] & 0x7f) << 24 |
(buffer[5] & 0xff) << 16 |
(buffer[6] & 0xff) << 8 |
(buffer[7] & 0xff);
或
size = ((buffer[4] & 0x7f) << 24) +
((buffer[5] & 0xff) << 16) +
((buffer[6] & 0xff) << 8) +
(buffer[7] & 0xff);
0xff = 0000 0000
这里&0xff的理由同上,主要是针对java的byte
关于Flags,这里也是只取每一个字节的高3位,其他位为0,这样共有6个标志位,由于Flags用处不大,所以这里不做太多的介绍。
Frame Content
T
由于不同Frame ID对应的Frame Content结构不同,这里只介绍几个比较常用的Frame ID里面的内容
以T开头的帧都是文本标识,如TRCK、TALB、TIT2等,它们的Frame Content结构类似,通常为两部分或三部分
结构如下
<Header for 'Text information frame', ID: "T000" - "TZZZ", excluding "TXXX" described in 4.2.2.>
Text encoding $xx
Information <text string according to encoding>
第一部分1byte,其值为0-3,不同的值代表不同的编码
0-> ISO-8859-1
00 ...
1-> UTF-16 如果是这个 后面还需要两个字节标识编码 FF FE -> UTF-16LE
01 FF FE ...
2-> UTF-16BE 如果是这个 后面还需要两个字节标识编码 FE FF -> UTF-16BE
02 FE FF ...
3-> UTF-8 如果是这个,后面还需要跟3个字节标识编码 EF BB BF
03 EF BB BF ...
之后的内容,就根据前两(一)部分解析的字符编码进行解析,长度就是frame内容的长度
TXXX
TXXX为用户自定义的文本标识,结构如下
<Header for 'User defined text information frame', ID: "TXXX">
Text encoding $xx
Description <text string according to encoding> $00 (00)
Value <text string according to encoding>
解析方式和USLT类似,唯一的区别就是没有language字段,但是实际上,它的解析与以T开头的标签类似(很多ID3v2标签都把Description给去掉了,所以也就没有$00结束符),所以可以直接按照以T为开头的标签解析方式解析,示例程序中用的就是这种方法
USLT
USLT为非同步歌词,结构如下
<Header for 'Unsynchronised lyrics/text transcription', ID: "USLT">
Text encoding $xx
Language $xx xx xx
Content descriptor <text string according to encoding> $00 (00)
Lyrics/text <full text string according to encoding>
Text encoding和T开头的标签含义一致,Language为语言,通常为eng,Content descriptor编码根据Text encoding的值来决定,也是字节标识编码+内容+0
,这里要注意的是,如果Content descriptor没有任何内容,只有结束符0,那么对于不同的编码,结构不一样,详情参考以下示例
USLT所有内容,descriptor为空和不为空不同情况下示例
encoding
0 ->
0 eng 0 text || 0 eng descriptor 0 text
1byte + 3bytes + 1bytes + text || 1byte + 3bytes + descriptor + 1bytes + text
1 ->
1 eng 0xFF 0xFE 00 0xFF 0xFE text || 1 eng 0xFF 0xFE descriptor 00 0xFF 0xFE text
1byte + 3bytes + 2bytes + 2bytes + text || 1byte + 3bytes + 2bytes + descriptor + 2bytes + text
2 ->
2 eng 0xFE 0xFF 00 0xFE 0xFF text || 1 eng 0xFE 0xFF descriptor 00 0xFE 0xFF text
1byte + 3bytes + 2bytes + 2bytes + text || 1byte + 3bytes + 2bytes + descriptor + 2bytes + text
3 ->
3 eng 0xEF 0xBB 0xBF 000 0xEF 0xBB 0xBF text || 1 eng 0xEF 0xBB 0xBF descriptor 000 0xEF 0xBB 0xBF text
1byte + 3bytes + 3bytes + 3bytes + text || 1byte + 3bytes + 3bytes + descriptor + 3bytes + text
ISO-8859-1,用一个0表示结束
UTF16,用两个0表示结束
UTF8,用三个0表示结束
APIC
APIC为图片,内容结构如下
Text encoding $xx
MIME type <text string> $00
Picture type $xx
Description <text string according to encoding> $00 (00)
Picture data <binary data>
-
Text encoding结合Description使用,使用方式与T开头标签相同
-
MIME type值通常为
image/jpeg
,image/png
等,以0x00结尾 -
Picture type为图片所代表的内容,主要有以下类型
-
$00 Other $01 32x32 pixels 'file icon' (PNG only) $02 Other file icon $03 Cover (front) $04 Cover (back) $05 Leaflet page $06 Media (e.g. lable side of CD) $07 Lead artist/lead performer/soloist $08 Artist/performer $09 Conductor $0A Band/Orchestra $0B Composer $0C Lyricist/text writer $0D Recording Location $0E During recording $0F During performance $10 Movie/video screen capture $11 A bright coloured fish $12 Illustration $13 Band/artist logotype $14 Publisher/Studio logotype
-
-
Description为图片描述,以0x00结尾
-
Picture data为二进制的图片内容,可以直接截取这一段内容,将其写入文件,即可得到图片
4. 实现代码
package com.flywinter;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
/**
* Created by IntelliJ IDEA
* User:Zhang Xingkun
* Date:2022/4/29 23:13
* Description:
* 10 bytes
* ID3v2/file identifier 3 "ID3"
* ID3v2 version 2 $03 00
* ID3v2 flags 1 %abc00000
* ID3v2 size 4 4 * %0xxxxxxx 不包含标签头
* 注意0x7f,0xff等,主要是为了把负数byte转化为正数,因为0x80-0xff在java的byte里面为负数
*/
public class ID3v2Parser {
private static long tmpPos = 0;
private static long totalSize;
public static void main(String[] args) throws IOException {
// final var filePath = "./Evalia - 壊れたピエロ (崩坏小丑).mp3";
final var filePath = "AuRa - Ghost (Acoustic)(1).mp3";
// final var filePath = "筷子兄弟 - 老男孩 (Live).mp3";
try (var randomAccessFile = new RandomAccessFile(filePath, "r")) {
final var buffer = new byte[10];
randomAccessFile.read(buffer);
final var tag = new String(buffer, 0, 3);
if (tag.equals("ID3")) {
tmpPos += 10;
System.out.println(getHeaderMap(buffer));
System.out.println(getFrame(randomAccessFile));
}
}
}
private static Map<String, String> getHeaderMap(byte[] buffer) {
Map<String, String> headerMap = new HashMap<>();
headerMap.put("Version", "2.%s.%s".formatted(String.valueOf(buffer[3]), String.valueOf(buffer[4])));
final String formatted = getTitleFlags(buffer);
headerMap.put("flag", formatted);
var size = getTotalSize(buffer);
totalSize = size;
headerMap.put("Size", String.valueOf(size));
return headerMap;
}
/**
* Frame ID $xx xx xx xx (four characters)
* Size $xx xx xx xx 不包括帧头的帧大小
* Flags $xx xx %abc00000 %ijk00000
* 关于flags a 标记更改保存 b文件更改保存 c只读 i 压缩 j 加密 k分组标识
* 帧内容最少为1byte
* APIC Content
* <Header for 'Attached picture', ID: "APIC">
* Text encoding $xx
* MIME type <text string> $00
* Picture type $xx
* Description <text string according to encoding> $00 (00)
* Picture data <binary data>
*
* @param randomAccessFile
* @throws IOException
*/
private static Map<String, String> getFrame(RandomAccessFile randomAccessFile) throws IOException {
Map<String, String> map = new HashMap<>();
var frameBuffer = new byte[10];
do {
randomAccessFile.seek(tmpPos);
frameBuffer = new byte[10];
randomAccessFile.read(frameBuffer);
var frameID = new String(frameBuffer, 0, 4);
var frameSize = getFrameSize(frameBuffer);
//实际上,所有frame的size和最终结果和标签头设置的size并不一定相等,这就导致最后会填充部分的零
//所有这时候如果检测到size为0,代表已经没有frame了,退出,因为按照标准,size最小为1
if (frameSize == 0) {
break;
}
tmpPos += 10;
randomAccessFile.seek(tmpPos);
frameBuffer = new byte[frameSize];
randomAccessFile.read(frameBuffer);
var content = "";
if (frameID.startsWith("T")) {
content = getTStartContent(frameBuffer, frameSize);
if (map.containsKey(frameID)) {
content = map.get(frameID) + ";" + content;
}
map.put(frameID, content);
} else if (frameID.equals("APIC")) {
map.put(frameID, getApicImg(frameBuffer, frameSize));
} else if (frameID.equals("USLT")) {
content = getLyrics(frameBuffer, frameSize);
if (map.containsKey(frameID)) {
content = map.get(frameID) + ";" + content;
}
map.put(frameID, content);
} else {
map.put(frameID, String.valueOf(frameSize));
}
tmpPos += frameSize;
} while (tmpPos < totalSize + 10);
return map;
}
private static String getApicImg(byte[] frameBuffer, int frameSize) throws IOException {
int end = 0;
end = getEnd(1, frameBuffer, end);
int desEnd = 0;
desEnd = getEnd(end + 2, frameBuffer, desEnd);
final var textEncoding = String.valueOf(frameBuffer[0]);
final var mimeType = new String(frameBuffer, 1, end - 1);
final var pictureType = getPictureType(frameBuffer[end + 1]);
final var description = new String(frameBuffer, end + 2, desEnd - end - 2);
var result = "Text encoding:%s;MIME type:%s;Picture type:%s;Description:%s;"
.formatted(
textEncoding,
mimeType,
pictureType,
description);
var pictureData = new byte[frameSize - desEnd];
copyPictureData(frameBuffer, desEnd, pictureData);
final var apicPath = Path.of("./tmp.%s".formatted(new String(frameBuffer, 1, end - 1).split("/", 2)[1]));
Files.write(apicPath, pictureData);
return result;
}
private static void copyPictureData(byte[] frameBuffer, int desEnd, byte[] pictureData) {
for (int i = desEnd + 1; i < frameBuffer.length; i++) {
pictureData[i - desEnd - 1] = frameBuffer[i];
}
}
private static int getEnd(int start, byte[] frameBuffer, int end) {
for (int i = start; i < frameBuffer.length; i++) {
if (frameBuffer[i] == 0) {
end = i;
break;
}
}
return end;
}
/**
* %abc00000
* a Unsynchronisation
* b Extended header
* c Experimental indicator
* if b=1,那么存在扩展表头
* Extended header size $xx xx xx xx 6或10个字节,不包含本身
* Extended Flags $xx xx
* Size of padding $xx xx xx xx
*
* @param buffer
* @return abc
*/
private static String getTitleFlags(byte[] buffer) {
//标志字段 a b c
final var flag = buffer[5] & 0xff;
var a = (flag & 0x80) == 0 ? 0 : 1;
var b = (flag & 0x40) == 0 ? 0 : 1;
var c = (flag & 0x20) == 0 ? 0 : 1;
return "%d%d%d".formatted(a, b, c);
}
/**
* @param buffer
* @return Flags $xx xx %abc00000 %ijk00000
* 关于flags a 标记更改保存 b文件更改保存 c只读 i 压缩 j 加密 k分组标识
*/
private static String getFrameFlags(byte[] buffer) {
//标志字段 a b c
final var flagF = buffer[8] & 0xff;
var a = (flagF & 0x80) == 0 ? 0 : 1;
var b = (flagF & 0x40) == 0 ? 0 : 1;
var c = (flagF & 0x20) == 0 ? 0 : 1;
final var flagL = buffer[9] & 0xff;
var i = (flagL & 0x80) == 0 ? 0 : 1;
var j = (flagL & 0x40) == 0 ? 0 : 1;
var k = (flagL & 0x20) == 0 ? 0 : 1;
return "%d%d%d %d%d%d".formatted(a, b, c, i, j, k);
}
/**
* 2^21=(2^4)^5*2
* 2^14=(2^4)^3*4
* 2^7=(2^4)^1*8
* 大小 bytes 包含前10 bytes
* 0xxxxxxx 0xxxxxxx 0xxxxxxx 0xxxxxxx
* =>
* xxxxxxx xxxxxxx xxxxxxx xxxxxxx
* total 27
*
* @param buffer
* @return (frameBuffer[x] & 0x7f) * 0x200000 +
* (frameBuffer[x1] & 0x7f) * 0x4000 +
* (frameBuffer[x2] & 0x7f) * 0x80 +
* (frameBuffer[x3] & 0x7f);
*/
private static int getTotalSize(byte[] buffer) {
return (buffer[6] & 0x7f) << 21 |
(buffer[7] & 0x7f) << 14 |
(buffer[8] & 0x7f) << 7 |
(buffer[9] & 0x7f);
}
/**
* xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx
* total 32
*
* @param frameBuffer
* @return (frameBuffer[x] & 0xff) * 0x1000000 +
* (frameBuffer[x1] & 0xff) * 0x10000 +
* (frameBuffer[x2] & 0xff) * 0x100 +
* (frameBuffer[x3] & 0xff);
*/
private static int getFrameSize(byte[] frameBuffer) {
return ((frameBuffer[4] & 0xff) << 24) +
((frameBuffer[5] & 0xff) << 16) +
((frameBuffer[6] & 0xff) << 8) +
(frameBuffer[7] & 0xff);
}
/**
* @param frameBuffer
* @param frameSize
* @return <Header for 'Unsynchronised lyrics/text transcription', ID: "USLT">
* Text encoding $xx
* Language $xx xx xx
* Content descriptor <text string according to encoding> $00 (00)
* Lyrics/text <full text string according to encoding>
*/
private static String getLyrics(byte[] frameBuffer, int frameSize) {
final var start = 4;
var frameContent = "";
var descriptor = "";
switch (frameBuffer[0]) {
case 0 -> {
int tmp = getISODescriptorEndIndex(frameBuffer, frameSize, start);
descriptor = (tmp - start) == 0 ? "" : new String(frameBuffer, start, tmp - start, StandardCharsets.ISO_8859_1);
frameContent = "descriptor:" + descriptor + "\nlyrics:\n" + new String(frameBuffer, tmp, frameSize - tmp, StandardCharsets.ISO_8859_1);
}
case 1 -> {
frameContent = getContentByEncoder(frameBuffer, start, frameSize, 0xff, 0xfe, descriptor, StandardCharsets.UTF_16LE, frameContent);
}
case 2 -> {
frameContent = getContentByEncoder(frameBuffer, start, frameSize, 0xfe, 0xff, descriptor, StandardCharsets.UTF_16BE, frameContent);
}
default -> {
// int typeH = frameBuffer[start] & 0xff;
// int typeM = frameBuffer[start + 1] & 0xff;
// int typeL = frameBuffer[start + 2] & 0xff;
int tmp = getUTF8DescriptorEndIndex(frameBuffer, frameSize, start);
// if (typeH == 0xef && typeM == 0xbb && typeL == 0xbf) {
descriptor = (tmp - start - 5) == 0 ? "" : new String(frameBuffer, start + 3, tmp - start - 2, StandardCharsets.UTF_8);
// }
final var contentStart = tmp + 1;
// typeH = frameBuffer[contentStart] & 0xff;
// typeM = frameBuffer[contentStart + 1] & 0xff;
// typeL = frameBuffer[contentStart + 2] & 0xff;
// if (typeH == 0xef && typeM == 0xbb && typeL == 0xbf) {
frameContent = "descriptor:" + descriptor + "\nlyrics:\n" + new String(frameBuffer, contentStart + 3, frameSize - contentStart - 3, StandardCharsets.UTF_8);
// }
}
}
return frameContent;
}
private static int getUTF8DescriptorEndIndex(byte[] frameBuffer, int frameSize, int start) {
int tmp = 0;
for (int i = start + 5; i < frameSize; i = i + 3) {
if (frameBuffer[i] == 0 && frameBuffer[i - 1] == 0 && frameBuffer[i - 2] == 0) {
tmp = i;
break;
}
}
return tmp;
}
private static int getISODescriptorEndIndex(byte[] frameBuffer, int frameSize, int start) {
int tmp = 0;
for (int i = start; i < frameSize; i++) {
if (frameBuffer[i] == 0) {
tmp = i;
}
}
return tmp;
}
private static String getContentByEncoder(byte[] frameBuffer, int start, int frameSize, int x, int x1, String descriptor, Charset utf16le, String frameContent) {
// int typeH = frameBuffer[start] & 0xff;
// int typeL = frameBuffer[start + 1] & 0xff;
int tmp = 0;
tmp = getUTF16DescriptorEndIndex(start, frameSize, frameBuffer, tmp);
// if (typeH == x && typeL == x1) {
descriptor = (tmp - start - 3) == 0 ? "" : new String(frameBuffer, start + 2, tmp - start - 1, utf16le);
// }
final var contentStart = tmp + 1;
// typeH = frameBuffer[contentStart] & 0xff;
// typeL = frameBuffer[contentStart + 1] & 0xff;
// if (typeH == x && typeL == x1) {
frameContent = "descriptor:" + descriptor + "\nlyrics:\n" + new String(frameBuffer, contentStart + 2, frameSize - contentStart - 2, utf16le);
// }
return frameContent;
}
private static int getUTF16DescriptorEndIndex(int start, int frameSize, byte[] frameBuffer, int tmp) {
for (int i = start + 3; i < frameSize; i = i + 2) {
if (frameBuffer[i] == 0 && frameBuffer[i - 1] == 0) {
tmp = i;
break;
}
}
return tmp;
}
private static String getTStartContent(byte[] frameBuffer, int frameSize) {
return getStringContent(frameBuffer, frameSize, 1);
}
/**
* 紧跟flag后面的第1byte决定字符编码
* 0-> ISO-8859-1
* 1-> UTF-16 如果是这个 后面还需要两个字节标识编码 FF FE -> UTF-16LE FE FF -> UTF-16BE
* 2-> UTF-16BE 如果是这个 后面还需要两个字节标识编码 FE FF -> UTF-16BE
* 3-> UTF-8 如果是这个,后面还需要跟3个字节标识编码 EF BB BF
*
* @param frameBuffer frameBuffer
* @param frameSize frameSize
* @return frame Content
*/
private static String getStringContent(byte[] frameBuffer, int frameSize, int start) {
var frameContent = "";
switch (frameBuffer[0]) {
case 0 -> frameContent = new String(frameBuffer, 0, frameSize, StandardCharsets.ISO_8859_1);
case 1 -> {
int typeH = frameBuffer[start] & 0xff;
int typeL = frameBuffer[start + 1] & 0xff;
if (typeH == 0xff && typeL == 0xfe) {
frameContent = new String(frameBuffer, 3, frameSize - 3, StandardCharsets.UTF_16LE);
}
}
case 2 -> {
int typeH = frameBuffer[start] & 0xff;
int typeL = frameBuffer[start + 1] & 0xff;
if (typeH == 0xfe && typeL == 0xff) {
frameContent = new String(frameBuffer, 3, frameSize - 3, StandardCharsets.UTF_16BE);
}
}
default -> {
int typeH = frameBuffer[start] & 0xff;
int typeM = frameBuffer[start + 1] & 0xff;
int typeL = frameBuffer[start + 2] & 0xff;
if (typeH == 0xef && typeM == 0xbb && typeL == 0xbf) {
frameContent = new String(frameBuffer, 4, frameSize - 4, StandardCharsets.UTF_8);
}
}
}
return frameContent;
}
/**
* $00 Other
* $01 32x32 pixels 'file icon' (PNG only)
* $02 Other file icon
* $03 Cover (front)
* $04 Cover (back)
* $05 Leaflet page
* $06 Media (e.g. lable side of CD)
* $07 Lead artist/lead performer/soloist
* $08 Artist/performer
* $09 Conductor
* $0A Band/Orchestra
* $0B Composer
* $0C Lyricist/text writer
* $0D Recording Location
* $0E During recording
* $0F During performance
* $10 Movie/video screen capture
* $11 A bright coloured fish
* $12 Illustration
* $13 Band/artist logotype
* $14 Publisher/Studio logotype
*
* @param index index
* @return Picture type
*/
private static String getPictureType(int index) {
final var list = List.of(
"Other",
"32x32 pixels 'file icon' (PNG only)",
"Other file icon",
"Cover(front)",
"Cover(back)",
"Leaflet page",
"Media(e.g.lable side of CD)",
"Lead artist / lead performer / soloist",
"Artist / performer",
"Conductor",
"Band / Orchestra",
"Composer",
"Lyricist / text writer",
"Recording Location",
"During recording",
"During performance",
"Movie / video screen capture",
"A bright coloured fish",
"Illustration",
"Band / artist logotype",
"Publisher / Studio logotype"
);
return list.get(index);
}
}
参考资料
