(Outdated wiki content from 2007.)

Personas Described (Frank, Phil, Ratna, Gina), and why they need Solid

So who is 'the user' for Solid? Well, there are multiple personas who might use Solid, but we have identified three primary personas: their names are Frank, Phil, and Ratna. Frank is a field linguist who has been creating SFM files for many years, for the Banau language* in Sulawesi, Indonesia. He is working closely with a team of native speakers to produce a vernacular dictionary, and Ratna is the primary typist for that team. Once Frank has set Solid up for her, Ratna needs to run it regularly (every few days) in order to validate the edits and new data that the team has been putting into the SFM file. Phil works in mainland Southeast Asia, supporting other people who are similar to Frank. In addition to the problems Frank faces (below), all of Phil's clients use complex non-Roman scripts (such as Thai), as well as Roman (English and IPA), so Phil needs full support for multiple writing systems per language. Fortunately, Phil has other tools for converting data into unicode, so Solid does not need to help him to do this.

*Fictitious

In addition to Frank, Ratna and Phil, there are secondary personas who may also need to use Solid, but Solid is not primarily designed for them. For example, Gina is an IT consultant who will be preparing schemas and transformations specific to the SFM file format used in Mainland Southeast Asia. Gina has counterparts in Africa, the Philippines, etc.

More about Frank

Frank has collected and interlinearized (in Toolbox) about 40 natural texts, and has collected an MDF dictionary of about 2,000 records. He intends to eventually publish a dictionary of Banau, organized alphabetically with definitions in Indonesian and English, and with alphabetized reversal indexes in the back for Indonesian and English. His files use only ASCII characters (identical UTF-8) dictionary records contain a wide sampling of MDF tags, but most of them are fairly simple; over 80% consist of the following tags (in nearly any order, except that \lx comes first): lx, ps, ge, gn, de, dn, rf, xv, xn, dt. The following examples are representative:

\lx naketi
\gn kecil
\ge small
\de Small in size.
\ps adj
\dt 27/Feb/2004

\lx ngingkandi
\ps
\gn makan
\ge eat
\de To eat, or to eat something.
\dn Makan, atau memakan sesuatu.
\dt 21/Mar/2004
\xv Sina ngingkandi mandut.
\xn Dia memakan ayam.
\rf notebook01p23

Very recently, Frank organized a DDP (Dictionary Development Process) workshop and collected 14,000 new 'lexemes' consisting of the following fields: lx, gn, nt, is, sde, sdn, sdv. Some of these fields (gn, nt, sdv) may be empty at times, but they all exist and are in consistent (100%) order, since they were entered into a columnar grid (using an OpenOffice.org DDP 'template'; Frank manually converted into SFM). The only inconsistency is the occasional typist mistake (entering gn into lx and lx into gn), which Frank's dictionary team can manually correct over time. With the help of a consultant with CC tables, Frank has merged those 14,000 records with his original 2,000, then merged homographs into single records with multiple senses, multiple semantic domains per sense, or both. Instead of 16,000 records, he now has about 9,000.

Frank has gradually come to understand that MDF has a tree-like hierarchy, and that he has been violating it often by entering his data in semi-random, semi-flat order. He has also learned that in order to successfully publish his dictionary (whether to the web or on paper), he will need to make his data more consistent. Frank would also like to test out FLEX, but he is not ready to commit to switching over. He is putting off importing into FLEX because he's heard that it is difficult to do if one's data is inconsistent. And even if it were easy, the fact that round-tripping SFM data between Toolbox and FLEX is not really supported causes him to hesitate.

Frank would like to periodically submit his dictionary-in-process to an archive, so that if something were to happen to him, others could later come along and make use of his work. So far, his archiving process has been to maintain a simple readme.txt file describing his data in prose, to put it in a zip file with all of his Toolbox folders, and to send it to the archive's data manager. The data manager has told Frank that he should use a more repurposable format such as XML, so Frank has started to export from Toolbox to XML and to place the XML file in the zip file too. But he doesn't know how to create a DTD or XmlSchema, and if he did it could only be used to validate the XML file, not his 'real' data file. So, even this approach to archival is incomplete.

Why use Solid?

Given this state of affairs, we can make a clear case for creating a tool like Solid:
  • Import to Flex: Frank wants to import his Toolbox data into Flex, without a lot of hassle and uncertainty over what Flex will do to his data.
  • Archival: Frank needs to archive his data in a fully described, repurposable file format. Ideally, this format should also be the native format of his dictionary-editing tool (currently Toolbox).
  • Reduce confusion: Ratna doesn't fully understand the hierarchical nature of the MDF tags in the files she is editing, but given consistent SFM data, she can simply use existing data as her model for inputting new data.
  • Reduce risk: Phil (and occasionally Frank) uses other tools such as CC to apply bulk edits to SFM files, but if the data is inconsistent, he can accidentally damage it.
  • Import/Export Round-tripping: Frank would like to round-trip his data between Toolbox and Flex, Toolbox and Wesay, etc. Solid could enable him to do this by by generating unique IDs (GUIDs) for his Toolbox records, and then mapping his SFM structures to the LIFT schema. He could use Flex to do his data merging, and then export as MDF to bring the data back into Toolbox. (Ideally, he would actually export as LIFT, then use a Solid plug-in to convert from LIFT into Frank's own custom SFM schema.)