Difference between revisions of "STRG (Metroid Prime 3)"

From Retro Modding Wiki
Jump to: navigation, search
(Created page with "''See STRG (File Format) for the other revisions of this format.'' The '''STRG format''' in Metroid Prime 3 is another update to the STRG format, used both in Prime 3 as...")
 
Line 1: Line 1:
 
''See [[STRG (File Format)]] for the other revisions of this format.''
 
''See [[STRG (File Format)]] for the other revisions of this format.''
  
The '''STRG format''' in Metroid Prime 3 is another update to the STRG format, used both in Prime 3 as well as Donkey Kong Country Returns. The most significant change from the Echoes STRG format is that the string encoding was changed from UTF-16 to UTF-8. There was also a more robust system for string offsets implemented, allowing for the same string of text to be reused for multiple languages if the string is identical in both.
+
The '''STRG format''' in Metroid Prime 3 is another update to the STRG format, used both in Prime 3 as well as Donkey Kong Country Returns; the only difference between the two is that DKCR supports more languages. The most significant change from the Echoes STRG format is that the string encoding was changed from UTF-16 to UTF-8. There was also a more robust system for string offsets implemented, allowing for the same string of text to be reused for multiple languages if the string is identical in both.
  
 
__TOC__
 
__TOC__
Line 7: Line 7:
 
== Format ==
 
== Format ==
  
The header should be familiar to you if you've worked with the Prime 1/2 STRG format in the past. It remains unchanged.
+
The initial header is identical to Prime 1/2. The differences start after that; the name table now precedes the language table, and the language tables and string tables are structured differently.
  
 
{| class="wikitable"
 
{| class="wikitable"
 
! Offset
 
! Offset
 
! Type
 
! Type
! Size
+
! Count
! Description
+
! Name
 +
! Notes
 
|-
 
|-
 
| 0x0
 
| 0x0
 
| u32
 
| u32
| 4
+
| 1
| '''Magic'''; always 0x87654321
+
| '''Magic'''
 +
| Always <code>0x87654321</code>.
 
|-
 
|-
 
| 0x4
 
| 0x4
 
| u32
 
| u32
| 4
+
| 1
| '''Version'''; see [[STRG (File Format)|hub article]]
+
| '''Version'''
 +
| Always 3. See [[STRG (File Format)|hub article]] for a list of possible version numbers.
 
|-
 
|-
 
| 0x8
 
| 0x8
 
| u32
 
| u32
| 4
+
| 1
| '''Language count'''
+
| '''Language Count'''
 +
| Number of languages that this table has strings for.
 
|-
 
|-
 
| 0xC
 
| 0xC
 
| u32
 
| u32
| 4
+
| 1
| '''String count'''
+
| '''String Count'''
 +
| Number of strings contained in the file per language.
 
|-
 
|-
 
| 0x10
 
| 0x10
| colspan=3 {{unknown|End of header}}
+
| [[#Name Table|Name Table]]
 +
| 1
 +
| '''Name Table'''
 +
| Associates each string in the file with a name.
 +
|-
 +
| {{none}}
 +
| char
 +
| 4 &times; ''Language Count''
 +
| '''Language ID Array'''
 +
| Array of fourCCs that defines which languages have strings included in the file. See below for a list of possible language codes.
 +
|-
 +
| {{none}}
 +
| [[#Language|Language]]
 +
| ''Language Count''
 +
| '''Language Table'''
 +
| Table that defines the languages that are present in the file. Each element in the array corresponds to the language in the ''Language ID Array'' at the same index.
 +
|-
 +
| {{none}}
 +
| [[#String|String]]
 +
| Varies
 +
| '''String Array'''
 +
| Contains the actual string data. The reason the count varies is because if a string is identical between multiple languages then the same string data will be used for all of them, so there's no value that can tell you the real string count directly.
 +
|-
 +
| colspan=5 {{unknown|End of file}}
 
|}
 
|}
  
=== String Names ===
+
Possible language codes:
 
+
The next part of the file is a table that allows names to be assigned to strings. It's identical to the name table structure from Echoes.
+
  
 
{| class="wikitable"
 
{| class="wikitable"
! Offset
+
! ID
! Size
+
! Language
! Description
+
! MP3
 +
! DKCR
 
|-
 
|-
| 0x0
+
| <code>ENGL</code>
| 4
+
| English
| '''Name count'''
+
| {{check}}
 +
| {{check}}
 
|-
 
|-
| 0x4
+
| <code>GERM</code>
| 4
+
| German
| '''Name table size'''
+
| {{check}}
 +
| {{check}}
 
|-
 
|-
| 0x8
+
| <code>FREN</code>
| colspan=2 {{unknown|Name entries begin}}
+
| French
 +
| {{check}}
 +
| {{check}}
 +
|-
 +
| <code>SPAN</code>
 +
| Spanish
 +
| {{check}}
 +
| {{check}}
 +
|-
 +
| <code>ITAL</code>
 +
| Italian
 +
| {{check}}
 +
| {{check}}
 +
|-
 +
| <code>DUTC</code>
 +
| Dutch
 +
| {{check}}
 +
| {{check}}
 +
|-
 +
| <code>JAPN</code>
 +
| Japanese
 +
| {{check}}
 +
| {{check}}
 +
|-
 +
| <code>SCHN</code>
 +
| {{unknown}}
 +
| {{nocheck}}
 +
| {{check}}
 +
|-
 +
| <code>TCHN</code>
 +
| {{unknown}}
 +
| {{nocheck}}
 +
| {{check}}
 +
|-
 +
| <code>UKEN</code>
 +
| U.K. English
 +
| {{nocheck}}
 +
| {{check}}
 +
|-
 +
| <code>KORE</code>
 +
| Korean
 +
| {{nocheck}}
 +
| {{check}}
 +
|-
 +
| <code>NAFR</code>
 +
| North American French
 +
| {{nocheck}}
 +
| {{check}}
 +
|-
 +
| <code>NASP</code>
 +
| North American Spanish
 +
| {{nocheck}}
 +
| {{check}}
 
|}
 
|}
  
Each entry is structured as follows:
+
Notes:
 +
* In DKCR, Japanese appears after English instead of after Italian.
 +
* The languages <code>DUTC</code>, <code>SCHN</code>, and <code>TCHN</code> are unused and don't actually appear in any STRG file. However, their fourCCs can be found in the dol alongside the other language codes, so presumably the game supports them.
 +
 
 +
=== Name Table ===
 +
 
 +
This part of the file is a table that allows names to be assigned to strings. It's identical to the structure from Echoes, so check [[STRG (Metroid Prime)#Name Table|the Echoes documentation]] for details.
 +
 
 +
=== Language ===
 +
 
 +
This is a small structure that defines where the strings for a particular language are located. It appears once per language.
  
 
{| class="wikitable"
 
{| class="wikitable"
 
! Offset
 
! Offset
! Size
+
! Type
! Description
+
! Count
 +
! Name
 +
! Notes
 
|-
 
|-
 
| 0x0
 
| 0x0
| 4
+
| u32
| '''Name offset''' (relative to after the name table size value)
+
| 1
 +
| '''Strings Size'''
 +
| This is the combined size of each string this language uses.
 
|-
 
|-
 
| 0x4
 
| 0x4
| 4
+
| u32
| '''String index''' - this is the string number that the name is associated with
+
| ''String Count''
 +
| '''String offsets'''
 +
| Relative to the start of the first string.
 
|-
 
|-
| 0x8
+
| colspan=5 {{unknown|End of language definition}}
| colspan=2 {{unknown|End of entry}}
+
 
|}
 
|}
  
After every name entry comes all the names in the form of a large UTF-8 string array. The names are zero-terminated, and they're sorted in alphabetical order; the sorting is case-sensitive, so 'Z' will appear before 'a'.
+
=== String ===
 
+
=== Languages ===
+
 
+
Next is a segment that defines which languages are included by the file and where their strings are. This section of the file starts with one fourCC for each supported language. Then this small structure repeats per language:
+
  
 
{| class="wikitable"
 
{| class="wikitable"
 
! Offset
 
! Offset
 
! Type
 
! Type
! Size
+
! Name
! Description
+
! Notes
 
|-
 
|-
 
| 0x0
 
| 0x0
 
| u32
 
| u32
| 4
+
| '''String Size'''
| '''Language size'''; this is the combined size of each string this language uses
+
| Size of the string data in bytes.
 
|-
 
|-
 
| 0x4
 
| 0x4
| u32[]
+
| string
| 4 &times; string count
+
| '''String'''
| '''String offsets'''; relative to the start of the first string
+
| Zero-terminated string encoded with UTF-8 Unicode.
 +
|-
 +
| colspan=4 {{unknown|End of string}}
 
|}
 
|}
 
=== Strings ===
 
 
Finally, the actual string data. Each string is composed of a 32-bit size followed by UTF-8 Unicode string data.
 
  
 
[[Category:File Formats]]
 
[[Category:File Formats]]
 
[[Category:Metroid Prime 3: Corruption]]
 
[[Category:Metroid Prime 3: Corruption]]
 
[[Category:Donkey Kong Country Returns]]
 
[[Category:Donkey Kong Country Returns]]

Revision as of 01:41, 27 May 2016

See STRG (File Format) for the other revisions of this format.

The STRG format in Metroid Prime 3 is another update to the STRG format, used both in Prime 3 as well as Donkey Kong Country Returns; the only difference between the two is that DKCR supports more languages. The most significant change from the Echoes STRG format is that the string encoding was changed from UTF-16 to UTF-8. There was also a more robust system for string offsets implemented, allowing for the same string of text to be reused for multiple languages if the string is identical in both.

Format

The initial header is identical to Prime 1/2. The differences start after that; the name table now precedes the language table, and the language tables and string tables are structured differently.

Offset Type Count Name Notes
0x0 u32 1 Magic Always 0x87654321.
0x4 u32 1 Version Always 3. See hub article for a list of possible version numbers.
0x8 u32 1 Language Count Number of languages that this table has strings for.
0xC u32 1 String Count Number of strings contained in the file per language.
0x10 Name Table 1 Name Table Associates each string in the file with a name.
char 4 × Language Count Language ID Array Array of fourCCs that defines which languages have strings included in the file. See below for a list of possible language codes.
Language Language Count Language Table Table that defines the languages that are present in the file. Each element in the array corresponds to the language in the Language ID Array at the same index.
String Varies String Array Contains the actual string data. The reason the count varies is because if a string is identical between multiple languages then the same string data will be used for all of them, so there's no value that can tell you the real string count directly.
End of file

Possible language codes:

ID Language MP3 DKCR
ENGL English
GERM German
FREN French
SPAN Spanish
ITAL Italian
DUTC Dutch
JAPN Japanese
SCHN Unknown
TCHN Unknown
UKEN U.K. English
KORE Korean
NAFR North American French
NASP North American Spanish

Notes:

  • In DKCR, Japanese appears after English instead of after Italian.
  • The languages DUTC, SCHN, and TCHN are unused and don't actually appear in any STRG file. However, their fourCCs can be found in the dol alongside the other language codes, so presumably the game supports them.

Name Table

This part of the file is a table that allows names to be assigned to strings. It's identical to the structure from Echoes, so check the Echoes documentation for details.

Language

This is a small structure that defines where the strings for a particular language are located. It appears once per language.

Offset Type Count Name Notes
0x0 u32 1 Strings Size This is the combined size of each string this language uses.
0x4 u32 String Count String offsets Relative to the start of the first string.
End of language definition

String

Offset Type Name Notes
0x0 u32 String Size Size of the string data in bytes.
0x4 string String Zero-terminated string encoded with UTF-8 Unicode.
End of string