encoding question

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

encoding question

Allin Cottrell
Sorry, this is quite ticklish but I'll try to explain it as best I
can.

I'm not sure, from reading the gnuplot help on "encoding", of the
exact scope and effect of giving a "set encoding XXX" command in a
plot file.

Here's the context: my program writes a gnuplot command file,
designed to produce PNG output via the pngcairo "terminal", and
among the users of the program are people working on Windows in
Russian. There are two possible non-ASCII elements in the plot file:

1) the name of the output file (as in "set output 'OOO'"), which for
MS Windows in Russian will be encoded in CP1251; and

2) strings occurring in titles, labels or whatever in the body of
the plot: by default these will be in UTF-8, which is what pngcairo
expects.

At present I'm sticking a line into the plot file:

set encoding utf8

which I hope is going to tell gnuplot, "Whatever you might think
based on the fact that you're working on Windows in Russian, please
interpret titles/labels as being in UTF-8."

So here's the question: given that the output filename is in CP1251,
is my "set encoding" line liable to interfere with gnuplot's output
routine (for example, such that output cannot be written because
some non-ASCII component of the path is non-existent, if the bytes
are interpreted as UTF-8), or is gnuplot's I/O mechanism separate
and insulated from "set encoding"?

As you might expect, this is not merely hypothetical: I'm getting an
error report from a Russian Windows user, and I wonder if the fact
that wgnuplot.exe is exiting with a non-zero code when trying to
process a command file written by my program might have something to
do with a text encoding issue.

--
Allin Cottrell
Department of Economics
Wake Forest University

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
gnuplot-beta mailing list
[hidden email]
Membership management via: https://lists.sourceforge.net/lists/listinfo/gnuplot-beta
Reply | Threaded
Open this post in threaded view
|

Re: encoding question

sfeam
On Thursday, 23 March 2017 08:09:14 PM Allin Cottrell wrote:

> Sorry, this is quite ticklish but I'll try to explain it as best I
> can.
>
> I'm not sure, from reading the gnuplot help on "encoding", of the
> exact scope and effect of giving a "set encoding XXX" command in a
> plot file.
>
> Here's the context: my program writes a gnuplot command file,
> designed to produce PNG output via the pngcairo "terminal", and
> among the users of the program are people working on Windows in
> Russian. There are two possible non-ASCII elements in the plot file:
>
> 1) the name of the output file (as in "set output 'OOO'"), which for
> MS Windows in Russian will be encoded in CP1251; and
>
> 2) strings occurring in titles, labels or whatever in the body of
> the plot: by default these will be in UTF-8, which is what pngcairo
> expects.
>
> At present I'm sticking a line into the plot file:
>
> set encoding utf8
>
> which I hope is going to tell gnuplot, "Whatever you might think
> based on the fact that you're working on Windows in Russian, please
> interpret titles/labels as being in UTF-8."

That much is fine.  It also has the effect, for the png terminal and
some others, that when you specify a font by name it will try to find
a version of it that uses your specified encoding.

> So here's the question: given that the output filename is in CP1251,
> is my "set encoding" line liable to interfere with gnuplot's output
> routine (for example, such that output cannot be written because
> some non-ASCII component of the path is non-existent, if the bytes
> are interpreted as UTF-8), or is gnuplot's I/O mechanism separate
> and insulated from "set encoding"?

Gnuplot does not care what is in the string used as a file name.
Linux/unix also does not care what is in the string used as a file name.
Any sequence of bytes is a legal filename even if is not printable.
Windows - I'm not so sure.  There are two ways that it might go wrong
on windows that I have heard of, and I suppose they might interact
badly.
Caveat: I don't use Windows myself, so I'm only repeating what I have
seen mentioned elsewhere.
   
(1) Windows filesystems only allow certain encodings for file
names, and UTF-8 is not one of the allowed encodings.
https://msdn.microsoft.com/en-us/library/windows/desktop/dd317748(v=vs.85).aspx

(2) At least some incarnations of Windows used a magic byte sequence
known as BOM to indicate the encoding used by a text file.  If your gnuplot
script file contains UTF-8 anything, some Windows machines are unhappy
if it does not start with BOM.  On the other hand if it _does_ start with BOM
then strings in the script file that are really CP1251 rather than UTF-8
might (I am guessing) be converted inappropriately.

So I think your question is actually a Windows + script file format question
rather than anything specific to gnuplot.  I doubt that "set encoding"
matters, but mixing UTF-8 and CP1251 in the same script file may
be intrinsically problematic on Windows.

> As you might expect, this is not merely hypothetical: I'm getting an
> error report from a Russian Windows user, and I wonder if the fact
> that wgnuplot.exe is exiting with a non-zero code when trying to
> process a command file written by my program might have something to
> do with a text encoding issue.

Does the same script work if the file names it refers to are strictly ascii?

        Ethan

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
gnuplot-beta mailing list
[hidden email]
Membership management via: https://lists.sourceforge.net/lists/listinfo/gnuplot-beta
Reply | Threaded
Open this post in threaded view
|

Re: encoding question

Allin Cottrell
On Thu, 23 Mar 2017, sfeam wrote:

> On Thursday, 23 March 2017 08:09:14 PM Allin Cottrell wrote:
>> Sorry, this is quite ticklish but I'll try to explain it as best I
>> can.
>>
>> I'm not sure, from reading the gnuplot help on "encoding", of the
>> exact scope and effect of giving a "set encoding XXX" command in a
>> plot file.
>>
>> Here's the context: my program writes a gnuplot command file,
>> designed to produce PNG output via the pngcairo "terminal", and
>> among the users of the program are people working on Windows in
>> Russian. There are two possible non-ASCII elements in the plot file:
>>
>> 1) the name of the output file (as in "set output 'OOO'"), which for
>> MS Windows in Russian will be encoded in CP1251; and
>>
>> 2) strings occurring in titles, labels or whatever in the body of
>> the plot: by default these will be in UTF-8, which is what pngcairo
>> expects.
>>
>> At present I'm sticking a line into the plot file:
>>
>> set encoding utf8
>>
>> which I hope is going to tell gnuplot, "Whatever you might think
>> based on the fact that you're working on Windows in Russian, please
>> interpret titles/labels as being in UTF-8."
>
> That much is fine.  It also has the effect, for the png terminal and
> some others, that when you specify a font by name it will try to find
> a version of it that uses your specified encoding.
OK so far!

>> So here's the question: given that the output filename is in CP1251,
>> is my "set encoding" line liable to interfere with gnuplot's output
>> routine (for example, such that output cannot be written because
>> some non-ASCII component of the path is non-existent, if the bytes
>> are interpreted as UTF-8), or is gnuplot's I/O mechanism separate
>> and insulated from "set encoding"?
>
> Gnuplot does not care what is in the string used as a file name.
> Linux/unix also does not care what is in the string used as a file name.
> Any sequence of bytes is a legal filename even if is not printable.
> Windows - I'm not so sure.  There are two ways that it might go wrong
> on windows that I have heard of, and I suppose they might interact
> badly.
> Caveat: I don't use Windows myself, so I'm only repeating what I have
> seen mentioned elsewhere.
>
> (1) Windows filesystems only allow certain encodings for file
> names, and UTF-8 is not one of the allowed encodings.
> https://msdn.microsoft.com/en-us/library/windows/desktop/dd317748(v=vs.85).aspx
>
> (2) At least some incarnations of Windows used a magic byte sequence
> known as BOM to indicate the encoding used by a text file.  If your gnuplot
> script file contains UTF-8 anything, some Windows machines are unhappy
> if it does not start with BOM.  On the other hand if it _does_ start with BOM
> then strings in the script file that are really CP1251 rather than UTF-8
> might (I am guessing) be converted inappropriately.
>
> So I think your question is actually a Windows + script file format question
> rather than anything specific to gnuplot.  I doubt that "set encoding"
> matters, but mixing UTF-8 and CP1251 in the same script file may
> be intrinsically problematic on Windows.
I ran an experiment to try to assess this. Booted Windows 8 (ugh) and
created a directory named Beauté (that's with an e-acute) on my
Desktop. I then created two copies of a simple gnuplot script to
produce a PNG file. Each included the line

set output 'c:/users/cottrell/desktop/Beauté/test.png'

(encoded in cp1251). The two files were identical except that one of
them included the line

set encoding utf8

before the "set output" line. (And the accented character in the
output filename was the only non-ASCII character in the files.)

I then called wgnuplot.exe on the two scripts from the command line in
a cmd.exe window. The one without "set encoding utf8" worked to
produce the PNG, the other didn't. To see what was happening I then
tried opening wgnuplot interactively and using the "load" command to
run the scripts. The variant without "set encoding" again worked fine;
the other one gave:

set output 'c:/users/cottrell/desktop/Beaut?/test.png'
  cannot open file; output not changed

(note that in gnuplot's error message echoing the "set output" line
the e-acute has been changed to a question mark, actually not an
ASCII question mark but an "unrecognized glyph" symbol).

It therefore seems that "set encoding" has somehow altered gnuplot's
reading of the bytes in the output filename. (Once again, those bytes
are identical in the two files.) If gnuplot had simply passed the
incoming cp1251 bytes to the OS, surely the output file would have
been opened OK in both cases.

Allin Cottrell
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
gnuplot-beta mailing list
[hidden email]
Membership management via: https://lists.sourceforge.net/lists/listinfo/gnuplot-beta
Reply | Threaded
Open this post in threaded view
|

Re: encoding question

sfeam
On Friday, 24 March, 2017 14:57:11 Allin Cottrell wrote:

> On Thu, 23 Mar 2017, sfeam wrote:
>  
> >> So here's the question: given that the output filename is in CP1251,
> >> is my "set encoding" line liable to interfere with gnuplot's output
> >> routine (for example, such that output cannot be written because
> >> some non-ASCII component of the path is non-existent, if the bytes
> >> are interpreted as UTF-8), or is gnuplot's I/O mechanism separate
> >> and insulated from "set encoding"?
> >
> > Gnuplot does not care what is in the string used as a file name.
> > Linux/unix also does not care what is in the string used as a file name.
> > Any sequence of bytes is a legal filename even if is not printable.
> > Windows - I'm not so sure.  There are two ways that it might go wrong
> > on windows that I have heard of, and I suppose they might interact
> > badly.
> > Caveat: I don't use Windows myself, so I'm only repeating what I have
> > seen mentioned elsewhere.
> >
> > (1) Windows filesystems only allow certain encodings for file
> > names, and UTF-8 is not one of the allowed encodings.
> > https://msdn.microsoft.com/en-us/library/windows/desktop/dd317748(v=vs.85).aspx
> >
> > (2) At least some incarnations of Windows used a magic byte sequence
> > known as BOM to indicate the encoding used by a text file.  If your gnuplot
> > script file contains UTF-8 anything, some Windows machines are unhappy
> > if it does not start with BOM.  On the other hand if it _does_ start with BOM
> > then strings in the script file that are really CP1251 rather than UTF-8
> > might (I am guessing) be converted inappropriately.
> >
> > So I think your question is actually a Windows + script file format question
> > rather than anything specific to gnuplot.  I doubt that "set encoding"
> > matters, but mixing UTF-8 and CP1251 in the same script file may
> > be intrinsically problematic on Windows.
>
> I ran an experiment to try to assess this. Booted Windows 8 (ugh) and
> created a directory named Beauté (that's with an e-acute) on my
> Desktop. I then created two copies of a simple gnuplot script to
> produce a PNG file. Each included the line
>
> set output 'c:/users/cottrell/desktop/Beauté/test.png'
>
> (encoded in cp1251). The two files were identical except that one of
> them included the line
>
> set encoding utf8
>
> before the "set output" line. (And the accented character in the
> output filename was the only non-ASCII character in the files.)
>
> I then called wgnuplot.exe on the two scripts from the command line in
> a cmd.exe window. The one without "set encoding utf8" worked to
> produce the PNG, the other didn't. To see what was happening I then
> tried opening wgnuplot interactively and using the "load" command to
> run the scripts. The variant without "set encoding" again worked fine;
> the other one gave:
>
> set output 'c:/users/cottrell/desktop/Beaut?/test.png'
>   cannot open file; output not changed
>
> (note that in gnuplot's error message echoing the "set output" line
> the e-acute has been changed to a question mark, actually not an
> ASCII question mark but an "unrecognized glyph" symbol).
>
> It therefore seems that "set encoding" has somehow altered gnuplot's
> reading of the bytes in the output filename.

No, I don't think that is what is happening.

> (Once again, those bytes
> are identical in the two files.) If gnuplot had simply passed the
> incoming cp1251 bytes to the OS, surely the output file would have
> been opened OK in both cases.

What seems to be happening is that in syscfg.h on Windows it says
/* The unicode/encoding support requires translation of file names */
    #define fopen win_fopen

and wmain.c:win_fopen() indeed tries to translate the name from the
current gnuplot encoding into Windows Unicode text.
I think the comment is wrong. File names should *not* be translated,
as you are finding out.  The current gnuplot encoding is a separate
thing from the encoding used in the sourcecode of the script.

I only see this code in the development version, not in the source
for 5.0.5 or 5.0.6.  So I guess your bug report is specifically for
the development version?

I'll defer to the Windows crowd here, but my tentative diagnosis
is that addition of a win_fopen() wrapper for fopen() in 5.1 should
be reverted.

Of course if you are seeing this same problem with 5.0 then my
diagnosis is wrong :-/  

        Ethan

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
gnuplot-beta mailing list
[hidden email]
Membership management via: https://lists.sourceforge.net/lists/listinfo/gnuplot-beta
Reply | Threaded
Open this post in threaded view
|

Re: encoding question

Allin Cottrell
On Fri, 24 Mar 2017, Ethan A Merritt wrote:

> On Friday, 24 March, 2017 14:57:11 Allin Cottrell wrote:
>>
>> I ran an experiment to try to assess this. Booted Windows 8 (ugh) and
>> created a directory named Beauté (that's with an e-acute) on my
>> Desktop. I then created two copies of a simple gnuplot script to
>> produce a PNG file. Each included the line
>>
>> set output 'c:/users/cottrell/desktop/Beauté/test.png'
>>
>> (encoded in cp1251). The two files were identical except that one of
>> them included the line
>>
>> set encoding utf8
>>
>> before the "set output" line. (And the accented character in the
>> output filename was the only non-ASCII character in the files.)
>>
>> I then called wgnuplot.exe on the two scripts from the command line in
>> a cmd.exe window. The one without "set encoding utf8" worked to
>> produce the PNG, the other didn't. To see what was happening I then
>> tried opening wgnuplot interactively and using the "load" command to
>> run the scripts. The variant without "set encoding" again worked fine;
>> the other one gave:
>>
>> set output 'c:/users/cottrell/desktop/Beaut?/test.png'
>>   cannot open file; output not changed
>>
>> (note that in gnuplot's error message echoing the "set output" line
>> the e-acute has been changed to a question mark, actually not an
>> ASCII question mark but an "unrecognized glyph" symbol).
>>
>> It therefore seems that "set encoding" has somehow altered gnuplot's
>> reading of the bytes in the output filename.
>
> No, I don't think that is what is happening.
>
>> (Once again, those bytes
>> are identical in the two files.) If gnuplot had simply passed the
>> incoming cp1251 bytes to the OS, surely the output file would have
>> been opened OK in both cases.
>
> What seems to be happening is that in syscfg.h on Windows it says
> /* The unicode/encoding support requires translation of file names */
>    #define fopen win_fopen
>
> and wmain.c:win_fopen() indeed tries to translate the name from the
> current gnuplot encoding into Windows Unicode text.
> I think the comment is wrong. File names should *not* be translated,
> as you are finding out.  The current gnuplot encoding is a separate
> thing from the encoding used in the sourcecode of the script.
>
> I only see this code in the development version, not in the source
> for 5.0.5 or 5.0.6.  So I guess your bug report is specifically for
> the development version?
>
> I'll defer to the Windows crowd here, but my tentative diagnosis
> is that addition of a win_fopen() wrapper for fopen() in 5.1 should
> be reverted.
Aha, this is very interesting! Yes, I'm using the development
version on Windows so your diagnosis seems very plausible. But
actually, now I (think) I understand what's going on, I _like_ the
idea behind win_fopen.

If I've got this right, it would let me standardize on consistently
UTF-8 gnuplot script files (including representing Windows paths in
UTF-8), and let gnuplot take care of recoding paths on the fly as
needed for interaction with the OS.

It's ugly and error-prone to mix text encodings in a single file,
but I guess that's what you have to do with gnuplot 5.0 if you want
(a) to represent titles, labels and so on in UTF-8, but (b) to
include Windows filenames that contain non-ASCII characters. It
sounds like gnuplot 5.1 could improve on that. I can now try the
experiment of keeping "set encoding utf8" but recoding Windows paths
to UTF-8 when writing them into a gnuplot script. If that works, I'm
happy!

(But of course if the win_fopen wrapper is preserved the backward
incompatibility needs to be made clear -- though it probably affects
rather few people.)

Allin Cottrell



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
gnuplot-beta mailing list
[hidden email]
Membership management via: https://lists.sourceforge.net/lists/listinfo/gnuplot-beta
Reply | Threaded
Open this post in threaded view
|

Re: encoding question

Allin Cottrell
On Fri, 24 Mar 2017, Allin Cottrell wrote:

> On Fri, 24 Mar 2017, Ethan A Merritt wrote:
>
>> On Friday, 24 March, 2017 14:57:11 Allin Cottrell wrote:
>>>
>>> I ran an experiment to try to assess this. Booted Windows 8 (ugh) and
>>> created a directory named Beauté (that's with an e-acute) on my
>>> Desktop. I then created two copies of a simple gnuplot script to
>>> produce a PNG file. Each included the line
>>>
>>> set output 'c:/users/cottrell/desktop/Beauté/test.png'
>>>
>>> (encoded in cp1251). The two files were identical except that one of
>>> them included the line
>>>
>>> set encoding utf8
>>>
>>> before the "set output" line. (And the accented character in the
>>> output filename was the only non-ASCII character in the files.)
>>>
>>> I then called wgnuplot.exe on the two scripts from the command line in
>>> a cmd.exe window. The one without "set encoding utf8" worked to
>>> produce the PNG, the other didn't. To see what was happening I then
>>> tried opening wgnuplot interactively and using the "load" command to
>>> run the scripts. The variant without "set encoding" again worked fine;
>>> the other one gave:
>>>
>>> set output 'c:/users/cottrell/desktop/Beaut?/test.png'
>>>   cannot open file; output not changed
>>>
>>> (note that in gnuplot's error message echoing the "set output" line
>>> the e-acute has been changed to a question mark, actually not an
>>> ASCII question mark but an "unrecognized glyph" symbol).
>>>
>>> It therefore seems that "set encoding" has somehow altered gnuplot's
>>> reading of the bytes in the output filename.
>>
>> No, I don't think that is what is happening.
>>
>>> (Once again, those bytes
>>> are identical in the two files.) If gnuplot had simply passed the
>>> incoming cp1251 bytes to the OS, surely the output file would have
>>> been opened OK in both cases.
>>
>> What seems to be happening is that in syscfg.h on Windows it says
>> /* The unicode/encoding support requires translation of file names */
>>    #define fopen win_fopen
>>
>> and wmain.c:win_fopen() indeed tries to translate the name from the
>> current gnuplot encoding into Windows Unicode text.
>> I think the comment is wrong. File names should *not* be translated,
>> as you are finding out.  The current gnuplot encoding is a separate
>> thing from the encoding used in the sourcecode of the script.
>>
>> I only see this code in the development version, not in the source
>> for 5.0.5 or 5.0.6.  So I guess your bug report is specifically for
>> the development version?
>>
>> I'll defer to the Windows crowd here, but my tentative diagnosis
>> is that addition of a win_fopen() wrapper for fopen() in 5.1 should
>> be reverted.
>
> Aha, this is very interesting! Yes, I'm using the development
> version on Windows so your diagnosis seems very plausible. But
> actually, now I (think) I understand what's going on, I _like_ the
> idea behind win_fopen.
>
> If I've got this right, it would let me standardize on
> consistently UTF-8 gnuplot script files (including representing
> Windows paths in UTF-8), and let gnuplot take care of recoding
> paths on the fly as needed for interaction with the OS.
>
> It's ugly and error-prone to mix text encodings in a single file,
> but I guess that's what you have to do with gnuplot 5.0 if you
> want (a) to represent titles, labels and so on in UTF-8, but (b)
> to include Windows filenames that contain non-ASCII characters. It
> sounds like gnuplot 5.1 could improve on that. I can now try the
> experiment of keeping "set encoding utf8" but recoding Windows
> paths to UTF-8 when writing them into a gnuplot script. If that
> works, I'm happy!
The experiment was successful. I could create a clean UTF-8 encoded
gnuplot script (including a non-ASCII Windows path for "set
output"), and gnuplot's win_fopen handled interaction with the OS
correctly in the background. So I would definitely be in favor of
keeping win_fopen.

(Reminder for anyone trying to follow this: win_fopen is a special
facility in the development version of gnuplot. It has the effect of
recoding filenames in a gnuplot script from whatever is set via "set
encoding" to Windows-compatible 16-bit Unicode before they are
passed to the C-library function fopen().)

Allin Cottrell
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
gnuplot-beta mailing list
[hidden email]
Membership management via: https://lists.sourceforge.net/lists/listinfo/gnuplot-beta
Reply | Threaded
Open this post in threaded view
|

Re: encoding question

sfeam
On Saturday, 25 March 2017 07:42:32 PM Allin Cottrell wrote:

> On Fri, 24 Mar 2017, Allin Cottrell wrote:
>
> > On Fri, 24 Mar 2017, Ethan A Merritt wrote:
> >
> >> On Friday, 24 March, 2017 14:57:11 Allin Cottrell wrote:
> >>>
> >>> I ran an experiment to try to assess this. Booted Windows 8 (ugh) and
> >>> created a directory named Beauté (that's with an e-acute) on my
> >>> Desktop. I then created two copies of a simple gnuplot script to
> >>> produce a PNG file. Each included the line
> >>>
> >>> set output 'c:/users/cottrell/desktop/Beauté/test.png'
> >>>
> >>> (encoded in cp1251). The two files were identical except that one of
> >>> them included the line
> >>>
> >>> set encoding utf8
> >>>
> >>> before the "set output" line. (And the accented character in the
> >>> output filename was the only non-ASCII character in the files.)
> >>>
> >>> I then called wgnuplot.exe on the two scripts from the command line in
> >>> a cmd.exe window. The one without "set encoding utf8" worked to
> >>> produce the PNG, the other didn't. To see what was happening I then
> >>> tried opening wgnuplot interactively and using the "load" command to
> >>> run the scripts. The variant without "set encoding" again worked fine;
> >>> the other one gave:
> >>>
> >>> set output 'c:/users/cottrell/desktop/Beaut?/test.png'
> >>>   cannot open file; output not changed
> >>>
> >>> (note that in gnuplot's error message echoing the "set output" line
> >>> the e-acute has been changed to a question mark, actually not an
> >>> ASCII question mark but an "unrecognized glyph" symbol).
> >>>
> >>> It therefore seems that "set encoding" has somehow altered gnuplot's
> >>> reading of the bytes in the output filename.
> >>
> >> No, I don't think that is what is happening.
> >>
> >>> (Once again, those bytes
> >>> are identical in the two files.) If gnuplot had simply passed the
> >>> incoming cp1251 bytes to the OS, surely the output file would have
> >>> been opened OK in both cases.
> >>
> >> What seems to be happening is that in syscfg.h on Windows it says
> >> /* The unicode/encoding support requires translation of file names */
> >>    #define fopen win_fopen
> >>
> >> and wmain.c:win_fopen() indeed tries to translate the name from the
> >> current gnuplot encoding into Windows Unicode text.
> >> I think the comment is wrong. File names should *not* be translated,
> >> as you are finding out.  The current gnuplot encoding is a separate
> >> thing from the encoding used in the sourcecode of the script.
> >>
> >> I only see this code in the development version, not in the source
> >> for 5.0.5 or 5.0.6.  So I guess your bug report is specifically for
> >> the development version?
> >>
> >> I'll defer to the Windows crowd here, but my tentative diagnosis
> >> is that addition of a win_fopen() wrapper for fopen() in 5.1 should
> >> be reverted.
> >
> > Aha, this is very interesting! Yes, I'm using the development
> > version on Windows so your diagnosis seems very plausible. But
> > actually, now I (think) I understand what's going on, I _like_ the
> > idea behind win_fopen.
> >
> > If I've got this right, it would let me standardize on
> > consistently UTF-8 gnuplot script files (including representing
> > Windows paths in UTF-8), and let gnuplot take care of recoding
> > paths on the fly as needed for interaction with the OS.
> >
> > It's ugly and error-prone to mix text encodings in a single file,
> > but I guess that's what you have to do with gnuplot 5.0 if you
> > want (a) to represent titles, labels and so on in UTF-8, but (b)
> > to include Windows filenames that contain non-ASCII characters. It
> > sounds like gnuplot 5.1 could improve on that. I can now try the
> > experiment of keeping "set encoding utf8" but recoding Windows
> > paths to UTF-8 when writing them into a gnuplot script. If that
> > works, I'm happy!
>
> The experiment was successful. I could create a clean UTF-8 encoded
> gnuplot script (including a non-ASCII Windows path for "set
> output"), and gnuplot's win_fopen handled interaction with the OS
> correctly in the background. So I would definitely be in favor of
> keeping win_fopen.
>
> (Reminder for anyone trying to follow this: win_fopen is a special
> facility in the development version of gnuplot. It has the effect of
> recoding filenames in a gnuplot script from whatever is set via "set
> encoding" to Windows-compatible 16-bit Unicode before they are
> passed to the C-library function fopen().)
>
> Allin Cottrell

Can you suggest where in the documentation we could add this information?
Putting it under "encoding" will not help unless the poor user who hits it
has already diagnosed it as an encoding problem.
Where did you look when you first hit the original problem?

   Ethan

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
gnuplot-beta mailing list
[hidden email]
Membership management via: https://lists.sourceforge.net/lists/listinfo/gnuplot-beta
Reply | Threaded
Open this post in threaded view
|

Re: encoding question

Plotter-2
On 02/04/17 04:41, sfeam wrote:

> On Saturday, 25 March 2017 07:42:32 PM Allin Cottrell wrote:
>> On Fri, 24 Mar 2017, Allin Cottrell wrote:
>>
>>> On Fri, 24 Mar 2017, Ethan A Merritt wrote:
>>>
>>>> On Friday, 24 March, 2017 14:57:11 Allin Cottrell wrote:
>>>>>
>>>>> I ran an experiment to try to assess this. Booted Windows 8 (ugh) and
>>>>> created a directory named Beauté (that's with an e-acute) on my
>>>>> Desktop. I then created two copies of a simple gnuplot script to
>>>>> produce a PNG file. Each included the line
>>>>>
>>>>> set output 'c:/users/cottrell/desktop/Beauté/test.png'
>>>>>
>>>>> (encoded in cp1251). The two files were identical except that one of
>>>>> them included the line
>>>>>
>>>>> set encoding utf8
>>>>>
>>>>> before the "set output" line. (And the accented character in the
>>>>> output filename was the only non-ASCII character in the files.)
>>>>>
>>>>> I then called wgnuplot.exe on the two scripts from the command line in
>>>>> a cmd.exe window. The one without "set encoding utf8" worked to
>>>>> produce the PNG, the other didn't. To see what was happening I then
>>>>> tried opening wgnuplot interactively and using the "load" command to
>>>>> run the scripts. The variant without "set encoding" again worked fine;
>>>>> the other one gave:
>>>>>
>>>>> set output 'c:/users/cottrell/desktop/Beaut?/test.png'
>>>>>   cannot open file; output not changed
>>>>>
>>>>> (note that in gnuplot's error message echoing the "set output" line
>>>>> the e-acute has been changed to a question mark, actually not an
>>>>> ASCII question mark but an "unrecognized glyph" symbol).
>>>>>
>>>>> It therefore seems that "set encoding" has somehow altered gnuplot's
>>>>> reading of the bytes in the output filename.
>>>>
>>>> No, I don't think that is what is happening.
>>>>
>>>>> (Once again, those bytes
>>>>> are identical in the two files.) If gnuplot had simply passed the
>>>>> incoming cp1251 bytes to the OS, surely the output file would have
>>>>> been opened OK in both cases.
>>>>
>>>> What seems to be happening is that in syscfg.h on Windows it says
>>>> /* The unicode/encoding support requires translation of file names */
>>>>    #define fopen win_fopen
>>>>
>>>> and wmain.c:win_fopen() indeed tries to translate the name from the
>>>> current gnuplot encoding into Windows Unicode text.
>>>> I think the comment is wrong. File names should *not* be translated,
>>>> as you are finding out.  The current gnuplot encoding is a separate
>>>> thing from the encoding used in the sourcecode of the script.
>>>>
>>>> I only see this code in the development version, not in the source
>>>> for 5.0.5 or 5.0.6.  So I guess your bug report is specifically for
>>>> the development version?
>>>>
>>>> I'll defer to the Windows crowd here, but my tentative diagnosis
>>>> is that addition of a win_fopen() wrapper for fopen() in 5.1 should
>>>> be reverted.
>>>
>>> Aha, this is very interesting! Yes, I'm using the development
>>> version on Windows so your diagnosis seems very plausible. But
>>> actually, now I (think) I understand what's going on, I _like_ the
>>> idea behind win_fopen.
>>>
>>> If I've got this right, it would let me standardize on
>>> consistently UTF-8 gnuplot script files (including representing
>>> Windows paths in UTF-8), and let gnuplot take care of recoding
>>> paths on the fly as needed for interaction with the OS.
>>>
>>> It's ugly and error-prone to mix text encodings in a single file,
>>> but I guess that's what you have to do with gnuplot 5.0 if you
>>> want (a) to represent titles, labels and so on in UTF-8, but (b)
>>> to include Windows filenames that contain non-ASCII characters. It
>>> sounds like gnuplot 5.1 could improve on that. I can now try the
>>> experiment of keeping "set encoding utf8" but recoding Windows
>>> paths to UTF-8 when writing them into a gnuplot script. If that
>>> works, I'm happy!
>>
>> The experiment was successful. I could create a clean UTF-8 encoded
>> gnuplot script (including a non-ASCII Windows path for "set
>> output"), and gnuplot's win_fopen handled interaction with the OS
>> correctly in the background. So I would definitely be in favor of
>> keeping win_fopen.
>>
>> (Reminder for anyone trying to follow this: win_fopen is a special
>> facility in the development version of gnuplot. It has the effect of
>> recoding filenames in a gnuplot script from whatever is set via "set
>> encoding" to Windows-compatible 16-bit Unicode before they are
>> passed to the C-library function fopen().)
>>
>> Allin Cottrell
>
> Can you suggest where in the documentation we could add this information?
> Putting it under "encoding" will not help unless the poor user who hits it
> has already diagnosed it as an encoding problem.
> Where did you look when you first hit the original problem?
>
>    Ethan
>

If there are cross-platform issues, then maybe the doc should contain a
section on that, at least as a central point linking to more specific
information on individual issues.

Also "encoding" is a rather programmer's or solution derived perspective
not a user's perspective. The user probably does not think he is
"encoding" anything.

 From the user's point of view it is probably do with accented vowels,
natural language or non-English language filenames, labels or whatever.

I have commented on a few such cases in the past where the info is there
but you need to know the answer in order to find it because it is not
linked to anything related to the user's problem.


Gnuplot makes platform abstraction fairly transparent but there always
seems to be a few things which are not completely OS  agnostic.

Peter.





------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
gnuplot-beta mailing list
[hidden email]
Membership management via: https://lists.sourceforge.net/lists/listinfo/gnuplot-beta