aboutsummaryrefslogtreecommitdiff
path: root/utf8.c
Commit message (Collapse)AuthorAge
* strbuf: add fixed-length version of add_wrapped_textJeff King2011-02-23
| | | | | | | | | | | | | | | | | The function strbuf_add_wrapped_text takes a NUL-terminated string. This makes it annoying to wrap strings we have as a pointer and a length. Refactoring strbuf_add_wrapped_text and all of its sub-functions to handle fixed-length strings turned out to be really ugly. So this implementation is lame; it just strdups the text and operates on the NUL-terminated version. This should be fine as the strings we are wrapping are generally pretty short. If it becomes a problem, we can optimize later. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'rs/optim-text-wrap'Junio C Hamano2010-03-02
|\ | | | | | | | | | | | | | | * rs/optim-text-wrap: utf8.c: speculatively assume utf-8 in strbuf_add_wrapped_text() utf8.c: remove strbuf_write() utf8.c: remove print_spaces() utf8.c: remove print_wrapped_text()
| * utf8.c: speculatively assume utf-8 in strbuf_add_wrapped_text()René Scharfe2010-02-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | is_utf8() works by calling utf8_width() for each character at the supplied location. In strbuf_add_wrapped_text(), we do that anyway while wrapping the lines. So instead of checking the encoding beforehand, optimistically assume that it's utf-8 and wrap along until an invalid character is hit, and when that happens start over. This pays off if the text consists only of valid utf-8 characters. The following command was run against the Linux kernel repo with git 1.7.0: $ time git log --format='%b' v2.6.32 >/dev/null real 0m2.679s user 0m2.580s sys 0m0.100s $ time git log --format='%w(60,4,8)%b' >/dev/null real 0m4.342s user 0m4.230s sys 0m0.110s And with this patch series: $ time git log --format='%w(60,4,8)%b' >/dev/null real 0m3.741s user 0m3.630s sys 0m0.110s So the cost of wrapping is reduced to 70% in this case. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * utf8.c: remove strbuf_write()René Scharfe2010-02-20
| | | | | | | | | | | | | | | | | | The patch before the previous one made sure that all callers of strbuf_add_wrapped_text() supply a strbuf. Replace all calls of strbuf_write() with regular strbuf functions and remove it. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * utf8.c: remove print_spaces()René Scharfe2010-02-20
| | | | | | | | | | | | | | | | | | | | | | The previous patch made sure that strbuf_add_wrapped_text() (and thus strbuf_add_indented_text(), too) always get a strbuf. Make use of this fact by adding strbuf_addchars(), a small helper that adds a char the specified number of times to a strbuf, and use it to replace print_spaces(). Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * utf8.c: remove print_wrapped_text()René Scharfe2010-02-20
| | | | | | | | | | | | | | | | | | | | | | strbuf_add_wrapped_text() is called only from print_wrapped_text() without a strbuf (in which case it writes its results to stdout). At its only callsite, supply a strbuf, call strbuf_add_wrapped_text() directly and remove the wrapper function. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | utf8.c: mark file-local function staticJunio C Hamano2010-01-12
|/ | | | Signed-off-by: Junio C Hamano <gitster@pobox.com>
* strbuf_add_wrapped_text(): skip over colour codesRené Scharfe2009-11-23
| | | | | | | | Ignore display mode escape sequences (colour codes) for the purpose of text wrapping because they don't have a visible width. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* strbuf_add_wrapped_text(): factor out strbuf_add_indented_text()René Scharfe2009-11-22
| | | | | | | | | | | | | | Add a new helper function, strbuf_add_indented_text(), to indent text without a width limit, and call it from strbuf_add_wrapped_text(). It respects both indent (applied to the first line) and indent2 (applied to the rest of the lines); indent2 was ignored by the indent-only path of strbuf_add_wrapped_text() before the patch. Two simple test cases are added, one exercising strbuf_add_wrapped_text() and the other strbuf_add_indented_text(). Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Teach --wrap to only indent without wrappingJunio C Hamano2009-10-22
| | | | | | | | When a zero or negative width is given to "shortlog -w<width>,<in1>,<in2>" and --format=%[wrap(w,in1,in2)...%], just indent the text by in1 without wrapping. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Add strbuf_add_wrapped_text() to utf8.[ch]Johannes Schindelin2009-10-19
| | | | | | | The newly added function can rewrap text according to a given first-line indent, other-indent and text width. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
* print_wrapped_text(): allow hard newlinesJohannes Schindelin2009-10-19
| | | | | | | | | | | | | | | print_wrapped_text() will insert its own newlines. Up until now, if the text passed to it contained newlines, they would not be handled properly (the wrapping got confused after that). The strategy is to replace a single new-line with a space, but keep double new-lines so that already-wrapped text with empty lines between paragraphs will be handled properly. However, single new-line characters are only handled this way if the character after it is an alphanumeric character, as per Linus' suggestion. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
* On Solaris choose the OLD_ICONV iconv() declaration based on the UNIX specBrandon Casey2009-06-06
| | | | | | | | OLD_ICONV is only necessary on Solaris until UNIX03. This is indicated by the private macro _XPG6 which is set in /usr/include/sys/feature_tests.h. Signed-off-by: Brandon Casey <drafnel@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* utf8: add utf8_strwidth()Geoffrey Thomas2009-02-04
| | | | | | | I'm about to use this pattern more than once, so make it a common function. Signed-off-by: Geoffrey Thomas <geofft@mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* utf8_width(): allow non NUL-terminated inputJunio C Hamano2008-01-06
| | | | | | | The original interface assumed that the input string is always terminated with a NUL, but that wasn't too useful. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* utf8: pick_one_utf8_char()Junio C Hamano2008-01-06
| | | | | | | | | utf8_width() function was doing two different things. To pick a valid character from UTF-8 stream, and compute the display width of that character. This splits the former to a separate function pick_one_utf8_char(). Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Remove unreachable statementsGuido Ostkamp2007-11-15
| | | | | | | Solaris Workshop Compiler found a few unreachable statements. Signed-off-by: Guido Ostkamp <git@ostkamp.fastmail.fm> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Style: place opening brace of a function definition at column 1Junio C Hamano2007-11-08
| | | | Signed-off-by: Junio C Hamano <gitster@pobox.com>
* wcwidth redeclarationAmos Waterland2007-05-07
| | | | | | | | | | | | Build fails for git 1.5.1.3 on AIX, with the message: utf8.c:66: error: conflicting types for 'wcwidth' /.../lib/gcc/powerpc-ibm-aix5.3.0.0/4.0.3/include/string.h:266: error: previous declaration of 'wcwidth' was here Fix this by renaming our static variant to our own name. Signed-off-by: Amos Waterland <apw@us.ibm.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
* Merge branch 'maint'Junio C Hamano2007-03-03
|\ | | | | | | | | | | | | | | | | | | | | | | | | * maint: Unset NO_C99_FORMAT on Cygwin. Fix a "pointer type missmatch" warning. Fix some "comparison is always true/false" warnings. Fix an "implicit function definition" warning. Fix a "label defined but unreferenced" warning. Document the config variable format.suffix git-merge: fail correctly when we cannot fast forward. builtin-archive: use RUN_SETUP Fix git-gc usage note
| * Fix a "pointer type missmatch" warning.Ramsay Jones2007-03-03
| | | | | | | | | | | | | | | | | | | | | | | | In particular, the second parameter in the call to iconv() will cause this warning if your library declares iconv() with the second (input buffer pointer) parameter of type const char **. This is the old prototype, which is none-the-less used by the current version of newlib on Cygwin. (It appears in old versions of glibc too). Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <junkio@cox.net>
| * Fix some "comparison is always true/false" warnings.Ramsay Jones2007-03-03
| | | | | | | | | | | | | | | | | | | | | | On Cygwin the wchar_t type is an unsigned short (16-bit) int. This results in the above warnings from the return statement in the wcwidth() function (in particular, the expressions involving constants with values larger than 0xffff). Simply replace the use of wchar_t with an unsigned int, typedef-ed as ucs_char_t. Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <junkio@cox.net>
* | print_wrapped_text: fix output for negative indentJohannes Schindelin2007-03-02
| | | | | | | | | | | | | | | | | | When providing a negative indent, it means that -indent columns were already printed. Fix a bug where the function ate the first character if already the first word did not fit into the first line. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
* | Actually make print_wrapped_text() usefulJohannes Schindelin2007-02-27
|/ | | | | | | | | | | Now, it returns the current column, does not add a newline, and you can pass a negative indent, to indicate that the indent was already printed. With this, you can actually continue in the middle of a paragraph, not having to print everything into a buffer first. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
* commit-tree: cope with different ways "utf-8" can be spelled.Junio C Hamano2006-12-30
| | | | | | | | People can spell config.commitencoding differently from what we internally have ("utf-8") to mean UTF-8. Try to accept them and treat them equally. Signed-off-by: Junio C Hamano <junkio@cox.net>
* Move encoding conversion routine out of mailinfo to utf8.cJunio C Hamano2006-12-26
| | | | | | | This moves the body of convert_to_utf8() routine used in mailinfo to the utf8.c i18n library. Signed-off-by: Junio C Hamano <junkio@cox.net>
* commit-tree: encourage UTF-8 commit messages.Johannes Schindelin2006-12-24
Introduce is_utf() to check if a text looks like it is encoded in UTF-8, utf8_width() to count display width, and implements print_wrapped_text() using them. git-commit-tree warns if the commit message does not minimally conform to the UTF-8 encoding when i18n.commitencoding is either unset, or set to "utf-8". Signed-off-by: Junio C Hamano <junkio@cox.net>