Friday, May 22, 2020

What could cause the file command in Linux to report a text file as data?

What could cause the file command in Linux to report a text file as data?

🔥 Save unlimited web pages along with a full PDF snapshot of each page.
Unlock Premium →

Asked

Viewed 11k times

I have a couple of C++ source files (one .cpp and one .h) that are being reported as type data by the file command in Linux. When I run the file -bi command against these files, I'm given this output (same output for each file):

application/octet-stream; charset=binary

Each file is clearly plain-text (I can view them in vi). What's causing file to misreport the type of these files? Could it be some sort of Unicode thing? Both of these files were created in Windows-land (using Visual Studio 2005), but they're being compiled in Linux (it's a cross-platform application).

Any ideas would be appreciated.

Update: I don't see any null characters in either file. I found some extended characters in the .cpp file (in a comment block), removed them, but file still reports the same encoding. I've tried forcing the encoding in SlickEdit, but that didn't seem to have an effect. When I open the file in vim, I see a [converted] line as soon as I open the file. Perhaps I can get vim to force the encoding?

Vim tries very hard to make sense of whatever you throw at it without complaining. This makes it a relatively poor tool to use to diagnose file's output.

Vim's "[converted]" notice indicates there was something in the file that vim wouldn't expect to see in the text encoding suggested by your locale settings (LANG etc).

Others have already suggested

  • cat -v
  • xxd

You could try grepping for non-ASCII characters.

  • grep -P '[\x7f-\xff]' filename

The other possibility is non-standard line-endings for the platform (i.e. CRLF or CR) but I'd expect file to cope with that and report "DOS text file" or similar.

If you run file -D filename, file displays debugging information, including the tests it performs. Near the end, it will show what test was successful in determining the file type.

For a regular text file, it looks like this:

[31> 0 regex,=^package[ \t]+[0-9A-Za-z_:]+ *;,""]  1 == 0 = 0  ascmagic 1  filename.txt: ISO-8859 text, with CRLF line terminators

This will tell you what it found to determine it's that mime type.

I found the issue using binary search to locate the problematic lines.

head -n {1/2 line count} file.cpp > a.txt  tail -n {1/2 line count} file.cpp > b.txt

Running file against each half, and repeating the process, helped me locate the offending line. I found a Control+P (^P) character embedded in it. Removing it solved the problem. I'll write myself a Perl script to search for these characters (and other extended) in the future.

A big thanks to everyone who provided an answer for all the tips!

It could be that the files have been saved with a BOM at the beginning of them, although I would have thought a recent-ish version of the file binary should recognise that too.

Have you tried dumping them through something like "head -2 | xxd" and seeing if there's a BOM present?

*BOM = Byte Order Mark - sometimes present in unicode text files. http://en.wikipedia.org/wiki/Byte_order_mark

It probably is a non-ASCII character from Unicode or some other character set. Since you're using vi, which in most Linux distributions is some version of vim, you can search for that character by typing

/[<Ctrl-V>x80-<Ctrl-V>xff]

and hitting Enter, where <Ctrl-V> means typing v while pressing the Ctrl key. Similarly, you can search for nulls (as Mehrdad suggested) with this:

/<Ctrl-V>x00

Which charset/encoding/(codepage) are the files in?
Perhaps the files have stray character(s). typically from bad cross-encoding between different platforms. Invalid data in you files may be causing file to report as you have described. You can test the validity of a file for a particular charset encoding by testing it with recode (or iconv).

Follow the link for a list of Common character encodings

This script lists charset encodings (from $my_csets) which aren't valid for your file(s). You can list all charsets via: recode -l

file="$1"      my_csets="UTF-16 UTF-8 windows-1250 ASCII"    # Use the next lines to test all charsets  # =======================================  # all_csets=$(recode -l |sed -ne "/^[^:/]/p" | awk '{print $1}')  # my_csets=$all_csets    for cset in $my_csets ;do     <"$1" recode $cset.. &>/dev/null || echo  "$cset  ERROR: $?"  done 

Not the answer you're looking for? Browse other questions tagged or ask your own question.

Source: https://superuser.com/questions/411214/what-could-cause-the-file-command-in-linux-to-report-a-text-file-as-data

Upgrade to Premium Plan

✔ Save unlimited bookmarks.

✔ Get a complete PDF copy of each web page

✔ Save PDFs, DOCX files, images and Excel sheets as email attachments.

✔ Get priority support and access to latest features.

Upgrade now →

Wednesday, May 06, 2020

Installing PHP 7 and Composer on Windows 10, Using Ubuntu in WSL

Installing PHP 7 and Composer on Windows 10, Using Ubuntu in WSL

EmailThis Premium lets you save unlimited bookmarks, PDF, DOCX files, PPTs and images. It also gives you a PDF copy of every page that you save. Upgrade to Premium →

Note: If you want to install and use PHP 7 and Composer within Windows 10 natively, I wrote a guide for that, too!

Since Windows 10 introduced the Windows Subsystem for Linux (WSL), it has become far easier to work on Linux-centric software, like most PHP projects, within Windows.

To get the WSL, and in our case, Ubuntu, running in Windows 10, follow the directions in Microsoft's documentation: Install the Windows Subsystem for Linux on Windows 10, and download and launch the Ubuntu installer from the Windows Store.

Once it's installed, open an Ubuntu command line, and let's get started:

Install PHP 7 inside Ubuntu in WSL

Ubuntu has packages for PHP 7 already available, so it's just a matter of installing them with apt:

  1. Update the apt cache with sudo apt-get update
  2. Install PHP and commonly-required extensions: sudo apt-get install -y git php7.0 php7.0-curl php7.0-xml php7.0-mbstring php7.0-gd php7.0-sqlite3 php7.0-mysql.
  3. Verify PHP 7 is working: php -v.

If it's working, you should get output like:

PHP 7 running under Ubuntu under WSL on Windows 10

Install Composer inside Ubuntu in WSL

Following the official instructions for downloading and installing Composer, copy and paste this command into the CLI:

php -r "copy('https://getcomposer.org/installer', 'composer-setup.php');" && \
php -r "if (hash_file('SHA384', 'composer-setup.php') === '544e09ee996cdf60ece3804abc52599c22b1f40f4323403c44d44fdfdd586475ca9813a858088ffbc1f233e9b180f061') { echo 'Installer verified'; } else { echo 'Installer corrupt'; unlink('composer-setup.php'); } echo PHP_EOL;" && \
php composer-setup.php && \
php -r "unlink('composer-setup.php');"

To make Composer easier to use, run the following command to move Composer into your global path:

sudo mv composer.phar /usr/local/bin/composer 

Now you can run composer, and you should get the output:

Composer running under Ubuntu under WSL on Windows 10

That's it! Now you have PHP 7 and Composer running inside Ubuntu in WSL on your Windows 10 PC. Next up, dominate the world with some new PHP projects!

Source: https://www.jeffgeerling.com/blog/2018/installing-php-7-and-composer-on-windows-10-using-ubuntu-wsl

Upgrade to Premium Plan

✔ Save unlimited bookmarks.

✔ Save PDFs, DOCX files, images and Excel sheets as email attachments.

✔ Get priority support and access to latest features.

Upgrade to Premium