now that I've got a stable (and super fast!) internet connection, I want to get the MTG oracle text updates done, so I've been devising a strategy for both short-term and long-term OCTGN set maintenance.
I mentioned earlier that the biggest obstacle is the GUIDs. Generating new ones is fine if we want to export newly-released sets, but when it comes to updating the rules text on pre-existing sets, we have to make sure the GUIDs are the same. If we generate new GUIDs for cards in a set each time we want to release a patch, then people will essentially have to recreate their decks every single time (and it'll mess up a lot of the autoscripts as well) a patch is released.
I have two ideas that would fix this solution, unfortunately both of them involve changing the OCTGN export settings in the extractor:
1. I publish a "GUID database" of every card from every set released on OCTGN to the internet. This would consist of a simple text webpage for each set, and lists the names and corresponding GUIDs for each of the cards. What the extractor would need to do is download these text files, and when it would generate a GUID for a card, instead it extracts the card's preexisting GUID from the list.
2. Instead of exporting to the OCTGN XML format, we have an option to export as an OCTGN excel spreadsheet, arranging the information into cell rows instead of the XML structure (but maintaining all the special rules I mentioned like
for line breaks, " for quotations, etc). I can then quickly copy-paste the data into a spreadsheet on my computer which already lists card names and GUIDs, then simply write a =CONCATENATE function to compile each card into the XML format. Then I can simply copy the XML lines and replace the ones in the current set files.
I would like to see #1 happen because I've already gotten several requests by other developers to publish an online GUID database (for things like deck converters), so its a foundation that will already exist. However, exporting as an excel spreadsheet would also work and I'm sure several non-OCTGN users would like this feature. In fact, if we had an excel export option, then all the OCTGN-related stuff wouldn't be necessary anymore (except for the special text rules mentioned above, but those could be amalgamated as "export options")
Also, for the OCTGN export stuff, it'd be really useful if we had the ability to export multiple sets at once, it gets a little tedious having to do them one-at-a-time.
I'm OK for recoding the OCTGN exportation subroutine.
May you please link me the GUID database ? (I do prefer solution #1, which seems to be the cleaner one). I'll recode it as soon as possible. I'll add the multiple set exportation too, as requested.
Wait - I'm a bit confused due to the last two posts. I had assumed the assigned GUID's were the same across all users, i.e. the card "Lu Bu, Master-at-Arms", from the Prelease Events set, printing #6 would have GUID of XXXX, and anybody else who is using GathererExtractor would also see it having the same XXXX numbers. Is my understanding incorrect?! Or is the discussion above just pertaining to OCTGN export, which I don't use? I just use the XML export directly.
The reason I ask is that I'm in the process of upgrading my app for sorting collections, and it is of great importance that each unique version of a card should have a unique 'id', right now I think that can be accomplished using the GathererExtractor GUID + Language, i.e. these two fields can form a primary key for which there are no possible duplicates.
@Dresden:
I was talking about GUIDs specifically for OCTGN's use. I'm not aware of any other currently-available MTG applications that use GUIDs in a similar fashion, nor do I know of any GUID database of MTG cards available. You're welcome to use ours once I get the database published.
The issue is that the extractor currently has no way of knowing what the correct pre-determined GUIDs are for sets that already exist in OCTGN. So instead, it simply generates new ones. This works fine when we want to use the extractor to generate brand-new sets, but for updating existing sets we don't want to create new GUIDs.
@Chaudakh:
I don't have the GUID list ready yet, I wanted to check with you first to see if it was a good idea. In order for it to work, I have to make sure my list has the correct spellings for all the cards, and the easiest way is to extract gatherer data, so we're in a bit of a paradox here. I think we can bypass this if you add some sort of verification tool to the extraction process: when it tries to obtain a GUID, if the card doesn't exist in the GUID list (i.e. spelling or punctuation is wrong), it gives an alert message. This way I can update the GUID list as I go along, and eventually I'll be able to find all the mistakes.
While we're on the topic of the GUID list, do you have any preferences on the formatting of the files? I want to make sure they are formatted as efficiently as possible, taking into account things like server strain and bandwidth as well. Would it be easier to have one massive file, or break them up into individual sets? My current plan is to have individual sets, and format them as so:
NPH.txt
[noparse]
New Phyrexia|2566f33a-8472-4ee7-b37b-abbd24a1a5a5
Karn Liberated|001d9638-76f4-4827-85d1-adfb1e4ef95e
Apostle's Blessing|aee11b03-23c3-4512-be24-e55c36436ba9
[/noparse]
I prefer individual sets as well. It is much more suitable for the extractor because it gets data by set. I'm OK to code alerts when card name is misspelt, but I suggest you something else in what follows. Downloading a txt file like in your example should be pretty fast, as the list size should be as big as the text spoiler size, ie about 100 Ko.
I suggest to add a GUID table download after having downloaded the spoiler list from Wizards' gatherer, it should take about max 2 seconds. I will add a GUID column (hidden or not) to store the corresponding GUID for each entry in the dataset (displayed in the datagridview). Instead of an alert, I can add a filter in the filter combobox to show only cards without GUIDs. This would be an easy way for you to detect misspellings in your database.
What about new sets ? Should I keep the GUIDs generator or users will have to wait for you to update your online GUID database, in order not to mess everything up with your own GUID database ?
actually, I was thinking of creating a "sets.txt" file which stores information regarding the sets themselves (full name, 3-letter codename, set GUID, packaging data), so that the extractor can figure out if the set has land packs, if there's mythics, etc.
So maybe, for each set it could look on this file to see if the set already exists... if it does, then it'll load the correct set's GUID database and start matching GUIDs. If it doesn't, then it'll assume a brand-new set and generate new GUIDs. (Maybe it can generate the GUID database text file for that set as well, so I can quickly upload it afterwards).
I've run into an issue with compiling the text files though. There are cards that exist in multiple versions in sets (like the basic lands, and stuff like the Urza lands). We'll need some way to differentiate the different versions of these cards. For that reason, I'm going to add the multiverseID to the text file as well, so it'll be "1|Karn Liberated|XXXXXXX". The issue is that our old gatherer extractor had issues with MultiverseIDs for a lot of the old sets, as such we don't have them for anything in the old card frame. So I'm not sure what strategy I'm going to use to obtain all these...
EDIT: I'm doing collector number instead of multiverseID, since many promo cards dont have IDs.
Chaudakh,
I've been following the last few posts between yourself and brine, but I think most of it is OCTGN specific so I'll just ask my question directly - will your GUID's ever change or are all the GUID's you use right now stable across the forseeable future releases? Or are we going to use brine's GUID's in the future? It doesn't matter to me who provides the GUID's as long as they are consistent and don't change
Right now GathererExtractor provides a GUID based on card name, edition and printing (example is Lu Bu, Master-at-Arms), which has two printings in P3K and different GUID's for each. Too bad 'language' isn't an additional input into the GUID, i.e. Force of Will in English and Forza di Volonta (which doesn't exist in the database) share the same GUID but this is something I can work with, no problem
I just noticed that the line breaks are being extracted as +#xd;+#xa; when they should be &'s instead of +'s
scratch that, I realized this was my fault and got it to work properly.
Some additional things I thought of though:
1) Is there a way to extract the tokens from a set along with the cards? In our sets, we include the tokens in the XML file, and in the RELS the location looks like "/tokens/T1.jpg" (basically we number the tokens as T1, T2, T3, etc)
2) When you choose to save the XML/RELS sets, can you make the default filename be the card's 3-letter code instead of the full name? The reason being that sets with spaces and special characters don't work well with OCTGN.
3) the rules text on Level Up creatures are extracted a little weird... I was hoping they'd look a little similar to Planeswalker texts, formatted like so:
Level Up {3} (__reminder text__)
[Level 2-4]: First Strike (4 / 4)
[Level 5+]: Double Strike (5 / 5)
4) If possible, I'd like the text for the double faced cards in Innistrad (and the flip cards from kamigawa) to extract exactly like Split cards do, where it looks like:
I was expecting problems with double-faced cards, as I didn't know how they would be implemented in the gatherer. Now the implementation has been revealed, it will be much easier to extract data. I won't add entry for transformed card. I will add a column which collect multiverseID of the transformed card, and separate all data in capa with a splitter mark. I was using // for flipped card. I'll find something else for transformed card, in order not to be confused with flipped cards.
I need now to fix the extractor. New release probably on Monday, as I won't be available for coding this week-end.
you can keep the // for the OCTGN stuff, with our conventions, the Split cards (invasion), Flip cards (kamigawa), and double-faced cards (innistrad) will all have the same text structure:
I noticed both ISD lists from the Gatherer and magiccards.info contain the 2 faces of the double-faced cards in 2 different entries, with 2 different multiverseID.
I make easy the download of data and cardscans. But I don't know now if you guys do want a single entry for both side of double-faced cards or not.
I find it much more easy to add a column that store the other face's multiverseID, and let the 2 entries for the 2 faces. It is much more general in this way. I would prefer adding an option to ask if double-faced cards must be gathered or not.
The Extractor update is almost ready. Does anyone have an idea about extracting the back-face colors. It seems they are missing both on the gatherer and magiccards.info. examples :
New release of the 3.3 version. Double-faced cards are now supported : www.mediafire.com/?3akc6mtapnar2
If you find somehow bugs, feel free to report them here ^^
I uploaded the M12 GUID list and the master sets list if you wanted to test the GUID extraction stuff, they're going to be located here: http://octgn.gamersjudgement.com/OCTGN/
The individual set lists are going to be arranged in 3 columns: number | name | GUID. The filename will always be the 3-letter code for the set.
Now the sets.txt file is a little more complex. This is going to be a master list of all sets in OCTGN, and contains the data necessary to construct the top part of a set file (set GUIDs, names, packaging info). The 3-letter code in this file will match the name of the individual set lists.
Each set entry's got many properties, I'm going to list them in order:
1) 3-letter code for the set
2) The set's full name
3) The set's GUID
4) The set's recommended version number
5) GUID of the main booster pack
6) Frequency of opening a mythic rare
7) Frequency of opening a rare
8) Number of uncommons in the booster
9) Number of commons in the booster
10) Number of basic lands in the booster
11) GUID of the unlimited basic land pack (if the set has one)
Not sure if it's a fluke, but I've tried to 'download all data' with the latest version, and it hasn't completed yet Did this work for you? I restarted once yesterday because I think it was stuck, if it doesn't finish by tomorrow I'll check back in with you.
Edit: Yup, I've waited at least 12 hours on two tries for the GathererExtractor software to finish with 'download all data', and so far it's still stuck around 60%. Is it just me?
New release of the 3.3 version. Double-faced cards are now supported : www.mediafire.com/?3akc6mtapnar2
If you find somehow bugs, feel free to report them here ^^
Not sure about this just yet, but in the meantime I found something else Chaudukah, the xml exported file does not have the back_id row (but it exists in the .csv file that the database is saved as), can you check the xml export function to see if it needs to be updated to export this extra column?
The problem is from your xml_export_cards.cfg in the data/settings folder. The id_back column is not mentioned in the old versions of this file. You can delete it and a clean one will be regenerated, or you can reinstall the program. I've just added clean xml_export_cards.cfg in the settings folder in the current release.
Thanks chaudakh, I got it working. Btw, echoing what somebody else said on the thread, is it possible to have an option to just download prices, nothing else? Kind of makes for a more convenient update
it will be done soon.
I will add something to compute card legality in every format (including non-official format such as Peasant Magic, Pauper Magic etc)
Private Mod Note
():
Rollback Post to RevisionRollBack
To post a comment, please login or register a new account.
I mentioned earlier that the biggest obstacle is the GUIDs. Generating new ones is fine if we want to export newly-released sets, but when it comes to updating the rules text on pre-existing sets, we have to make sure the GUIDs are the same. If we generate new GUIDs for cards in a set each time we want to release a patch, then people will essentially have to recreate their decks every single time (and it'll mess up a lot of the autoscripts as well) a patch is released.
I have two ideas that would fix this solution, unfortunately both of them involve changing the OCTGN export settings in the extractor:
1. I publish a "GUID database" of every card from every set released on OCTGN to the internet. This would consist of a simple text webpage for each set, and lists the names and corresponding GUIDs for each of the cards. What the extractor would need to do is download these text files, and when it would generate a GUID for a card, instead it extracts the card's preexisting GUID from the list.
2. Instead of exporting to the OCTGN XML format, we have an option to export as an OCTGN excel spreadsheet, arranging the information into cell rows instead of the XML structure (but maintaining all the special rules I mentioned like for line breaks, " for quotations, etc). I can then quickly copy-paste the data into a spreadsheet on my computer which already lists card names and GUIDs, then simply write a =CONCATENATE function to compile each card into the XML format. Then I can simply copy the XML lines and replace the ones in the current set files.
I would like to see #1 happen because I've already gotten several requests by other developers to publish an online GUID database (for things like deck converters), so its a foundation that will already exist. However, exporting as an excel spreadsheet would also work and I'm sure several non-OCTGN users would like this feature. In fact, if we had an excel export option, then all the OCTGN-related stuff wouldn't be necessary anymore (except for the special text rules mentioned above, but those could be amalgamated as "export options")
Also, for the OCTGN export stuff, it'd be really useful if we had the ability to export multiple sets at once, it gets a little tedious having to do them one-at-a-time.
May you please link me the GUID database ? (I do prefer solution #1, which seems to be the cleaner one). I'll recode it as soon as possible. I'll add the multiple set exportation too, as requested.
Thanks
The reason I ask is that I'm in the process of upgrading my app for sorting collections, and it is of great importance that each unique version of a card should have a unique 'id', right now I think that can be accomplished using the GathererExtractor GUID + Language, i.e. these two fields can form a primary key for which there are no possible duplicates.
I was talking about GUIDs specifically for OCTGN's use. I'm not aware of any other currently-available MTG applications that use GUIDs in a similar fashion, nor do I know of any GUID database of MTG cards available. You're welcome to use ours once I get the database published.
The issue is that the extractor currently has no way of knowing what the correct pre-determined GUIDs are for sets that already exist in OCTGN. So instead, it simply generates new ones. This works fine when we want to use the extractor to generate brand-new sets, but for updating existing sets we don't want to create new GUIDs.
@Chaudakh:
I don't have the GUID list ready yet, I wanted to check with you first to see if it was a good idea. In order for it to work, I have to make sure my list has the correct spellings for all the cards, and the easiest way is to extract gatherer data, so we're in a bit of a paradox here. I think we can bypass this if you add some sort of verification tool to the extraction process: when it tries to obtain a GUID, if the card doesn't exist in the GUID list (i.e. spelling or punctuation is wrong), it gives an alert message. This way I can update the GUID list as I go along, and eventually I'll be able to find all the mistakes.
While we're on the topic of the GUID list, do you have any preferences on the formatting of the files? I want to make sure they are formatted as efficiently as possible, taking into account things like server strain and bandwidth as well. Would it be easier to have one massive file, or break them up into individual sets? My current plan is to have individual sets, and format them as so:
NPH.txt
I prefer individual sets as well. It is much more suitable for the extractor because it gets data by set. I'm OK to code alerts when card name is misspelt, but I suggest you something else in what follows. Downloading a txt file like in your example should be pretty fast, as the list size should be as big as the text spoiler size, ie about 100 Ko.
I suggest to add a GUID table download after having downloaded the spoiler list from Wizards' gatherer, it should take about max 2 seconds. I will add a GUID column (hidden or not) to store the corresponding GUID for each entry in the dataset (displayed in the datagridview). Instead of an alert, I can add a filter in the filter combobox to show only cards without GUIDs. This would be an easy way for you to detect misspellings in your database.
What about new sets ? Should I keep the GUIDs generator or users will have to wait for you to update your online GUID database, in order not to mess everything up with your own GUID database ?
So maybe, for each set it could look on this file to see if the set already exists... if it does, then it'll load the correct set's GUID database and start matching GUIDs. If it doesn't, then it'll assume a brand-new set and generate new GUIDs. (Maybe it can generate the GUID database text file for that set as well, so I can quickly upload it afterwards).
I've run into an issue with compiling the text files though. There are cards that exist in multiple versions in sets (like the basic lands, and stuff like the Urza lands). We'll need some way to differentiate the different versions of these cards. For that reason, I'm going to add the multiverseID to the text file as well, so it'll be "1|Karn Liberated|XXXXXXX". The issue is that our old gatherer extractor had issues with MultiverseIDs for a lot of the old sets, as such we don't have them for anything in the old card frame. So I'm not sure what strategy I'm going to use to obtain all these...
EDIT: I'm doing collector number instead of multiverseID, since many promo cards dont have IDs.
I've been following the last few posts between yourself and brine, but I think most of it is OCTGN specific so I'll just ask my question directly - will your GUID's ever change or are all the GUID's you use right now stable across the forseeable future releases? Or are we going to use brine's GUID's in the future? It doesn't matter to me who provides the GUID's as long as they are consistent and don't change
Right now GathererExtractor provides a GUID based on card name, edition and printing (example is Lu Bu, Master-at-Arms), which has two printings in P3K and different GUID's for each. Too bad 'language' isn't an additional input into the GUID, i.e. Force of Will in English and Forza di Volonta (which doesn't exist in the database) share the same GUID but this is something I can work with, no problem
I just noticed that the line breaks are being extracted as +#xd;+#xa; when they should be &'s instead of +'sscratch that, I realized this was my fault and got it to work properly.
Some additional things I thought of though:
1) Is there a way to extract the tokens from a set along with the cards? In our sets, we include the tokens in the XML file, and in the RELS the location looks like "/tokens/T1.jpg" (basically we number the tokens as T1, T2, T3, etc)
2) When you choose to save the XML/RELS sets, can you make the default filename be the card's 3-letter code instead of the full name? The reason being that sets with spaces and special characters don't work well with OCTGN.
3) the rules text on Level Up creatures are extracted a little weird... I was hoping they'd look a little similar to Planeswalker texts, formatted like so:
4) If possible, I'd like the text for the double faced cards in Innistrad (and the flip cards from kamigawa) to extract exactly like Split cards do, where it looks like:
Also, Gatherer takes feedback here: http://gatherer.wizards.com/Pages/Feedback.aspx If enough of us mention that Cheap Ass is missing the 1/2 mana glyph, maybe they'll fix that.
I need now to fix the extractor. New release probably on Monday, as I won't be available for coding this week-end.
TEXT A
//
TEXT B
I make easy the download of data and cardscans. But I don't know now if you guys do want a single entry for both side of double-faced cards or not.
I find it much more easy to add a column that store the other face's multiverseID, and let the 2 entries for the 2 faces. It is much more general in this way. I would prefer adding an option to ask if double-faced cards must be gathered or not.
http://gatherer.wizards.com/Pages/Card/Details.aspx?multiverseid=226749
http://magiccards.info/isd/en/51b.html
If you find somehow bugs, feel free to report them here ^^
The individual set lists are going to be arranged in 3 columns: number | name | GUID. The filename will always be the 3-letter code for the set.
Now the sets.txt file is a little more complex. This is going to be a master list of all sets in OCTGN, and contains the data necessary to construct the top part of a set file (set GUIDs, names, packaging info). The 3-letter code in this file will match the name of the individual set lists.
Each set entry's got many properties, I'm going to list them in order:
1) 3-letter code for the set
2) The set's full name
3) The set's GUID
4) The set's recommended version number
5) GUID of the main booster pack
6) Frequency of opening a mythic rare
7) Frequency of opening a rare
8) Number of uncommons in the booster
9) Number of commons in the booster
10) Number of basic lands in the booster
11) GUID of the unlimited basic land pack (if the set has one)
Not sure if it's a fluke, but I've tried to 'download all data' with the latest version, and it hasn't completed yet Did this work for you? I restarted once yesterday because I think it was stuck, if it doesn't finish by tomorrow I'll check back in with you.
Edit: Yup, I've waited at least 12 hours on two tries for the GathererExtractor software to finish with 'download all data', and so far it's still stuck around 60%. Is it just me?
After the feedback I left, WotC added a "color indicator" for back-face cards. I'll add the color extraction to the new release ^^
EDIT : it should be fixed now : http://www.mediafire.com/?3akc6mtapnar2
I will add something to compute card legality in every format (including non-official format such as Peasant Magic, Pauper Magic etc)